ZhangZhihui's Blog  

Concurrency allows your program to manage multiple operations simultaneously, leveraging the full power of modern processors. It’s akin to a chef preparing multiple dishes in parallel, each step orchestrated so that all dishes are ready at the same time. Asynchronous programming, on the other hand, lets your application move on to other tasks while waiting for operations to complete, such as sending a food order to the kitchen and serving other customers until the order is ready.

Technical requirements
• Faker, using pip install faker
• ReactiveX, using pip install reactivex

 

• The Thread Pool pattern

First, it’s important to understand what a thread is. In computing, a thread is the smallest unit of processing that can be scheduled by an operating system.

Threads are like tracks of execution that can run on a computer at the same time, which enables many activities to be done simultaneously and thus improve performance. They are particularly important in applications that need multitasking, such as serving multiple web requests or carrying out multiple computations.

Now, onto the Thread Pool pattern itself. Imagine you have many tasks to complete but starting each task (which means in this case, creating a thread) can be expensive in terms of resources and time. It’s like hiring a new employee every time you have a job to do and then letting them go when the job is done. This process can be inefficient and costly. By maintaining a collection, or a pool, of worker threads that can be created for once and then reused upon several jobs, the Thread Pool pattern helps reduce this inefficiency. When one thread finishes a task, it does not terminate but goes back to the pool, awaiting another task that it can be used again for.

What are worker threads?

A worker thread is a thread of execution of a particular task or set of tasks. Worker threads are used to offload processing tasks from the main thread, helping to keep applications responsive by performing time-consuming or resource-intensive tasks asynchronously.

In addition to faster application performance, there are two benefits:

• Reduced overhead: By reusing threads, the application avoids the overhead of creating and destroying threads for each task
• Better resource management: The thread pool limits the number of threads, preventing resource exhaustion that could occur if too many threads were created

Use cases for the Thread Pool pattern

• Batch processing: When you have many tasks that can be performed in parallel, a thread pool can distribute them among its worker threads
• Load balancing: Thread pools can be used to distribute workload evenly among worker threads, ensuring that no single thread takes on too much work
• Resource optimization: By reusing threads, the thread pool minimizes system resource usage, such as memory and CPU time

Implementing the Thread Pool pattern

First, let’s stop to break down how a thread pool, for a given application, works:

1.When the application starts, the thread pool creates a certain number of worker threads. This is the initialization. This number of threads can be fixed or dynamically adjusted based on the application’s needs.

2.Then, we have the task submission step. When there’s a task to be done, it’s submitted to the pool rather than directly creating a new thread. The task can be anything that needs to be executed, such as processing user input, handling network requests, or performing calculations.

3.The following step is the task execution. The pool assigns the task to one of the available worker threads. If all threads are busy, the task might wait in a queue until a thread becomes available.

4.Once a thread completes its task, it doesn’t die. Instead, it returns to the pool, ready to be assigned a new task.

For our example, let’s see some code where we create a thread pool with five worker threads to handle a set of tasks. We are going to use the ThreadPoolExecutor class from the concurrent.futures module.

from concurrent.futures import ThreadPoolExecutor
import time


def task(n):
    print(f"Executing task {n}")
    time.sleep(1)
    print(f"Task {n} completed")


with ThreadPoolExecutor(max_workers=5) as executor:
    for i in range(10):
        executor.submit(task, i)

 

Executing task 0
Executing task 1
Executing task 2
Executing task 3
Executing task 4
Task 0 completed
Task 4 completed
Task 3 completed
Task 1 completed
Executing task 6
Executing task 7
Executing task 8
Task 2 completed
Executing task 5
Executing task 9
Task 8 completed
Task 6 completed
Task 9 completed
Task 5 completed
Task 7 completed

We see that the tasks were completed in an order different from the order of submission. This shows that they were executed concurrently using the threads available in the thread pool.

• The Worker Model pattern

The idea behind the Worker Model pattern is to divide a large task or many tasks into smaller, manageable units of work, called workers, that can be processed in parallel. This approach to concurrency and parallel processing not only accelerates processing time but also enhances the application’s performance.

The workers could be threads within a single application (as we have just seen in the Thread Pool pattern), separate processes on the same machine, or even different machines in a distributed system.

The benefits of the Worker Model pattern are the following:

Scalability: Easily scales with the addition of more workers, which can be particularly beneficial in distributed systems where tasks can be processed on multiple machines
Efficiency: By distributing tasks across multiple workers, the system can make better use of available computing resources, processing tasks in parallel
Flexibility: The Worker Model pattern can accommodate a range of processing strategies, from simple thread-based workers to complex distributed systems spanning multiple servers

Real-world examples

Consider a delivery service where packages (tasks) are delivered by a team of couriers (workers). Each courier picks up a package from the distribution center (task queue) and delivers it. The number of couriers can vary depending on demand; more couriers can be added during busy periods and reduced when it’s quieter.

In big data processing, the Worker Model pattern is often employed where each worker is responsible for mapping or reducing a part of the data.

In systems such as RabbitMQ or Kafka, the Worker Model pattern is used to process messages from a queue concurrently.

We can also cite image processing services. Services that need to process multiple images simultaneously often use the Worker Model pattern to distribute the load among multiple workers.

Use cases for the Worker Model pattern

One use case for the Worker Model pattern is data transformation. When you have a large dataset that needs to be transformed, you can distribute the work among multiple workers.

Another one is task parallelism. In applications where different tasks are independent of each other, the Worker Model pattern can be very effective.

A third use case is distributed computing, where the Worker Model pattern can be extended to multiple machines, making it suitable for distributed computing environments.

Implementing the Worker Model pattern

Before discussing an implementation example, let’s understand how the Worker Model pattern works. Three components are involved in the Worker Model pattern: workers, a task queue, and, optionally, a dispatcher:

The workers: The primary actors in this model. Each worker can perform a piece of the task independently of the others. Depending on the implementation, a worker might process one task at a time or handle multiple tasks concurrently.
The task queue: A central component where tasks are stored awaiting processing. Workers typically pull tasks from this queue, ensuring that tasks are distributed efficiently among them. The queue acts as a buffer, decoupling task submission from task processing.
The dispatcher: In some implementations, a dispatcher component assigns tasks to workers based on availability, load, or priority. This can help optimize task distribution and resource utilization.

Let’s now see an example where we execute a function in parallel.

We start by importing what we need for the example, as follows:

from multiprocessing import Process, Queue
import time

Then, we create a worker() function that we are going to run tasks with. It takes as a parameter the task_queue object that contains the tasks to execute. The code is as follows:

def worker(task_queue):
    while not task_queue.empty():
        task = task_queue.get()
        print(f"Worker {task} is processing")
        time.sleep(1)
        print(f"Worker {task} completed")

In the main() function, we start by creating a queue of tasks, an instance of multiprocessing.Queue. Then, we create 10 tasks and add them to the queue:

def main():
    task_queue = Queue()

    for i in range(10):
        task_queue.put(i)

Five worker processes are then created, using the multiprocessing.Process class, and started. Each worker picks up a task from the queue, to execute it, and then picks up another until the queue is empty. Then, we start each worker process (using p.start()) in a loop, which means that the associated task will get executed concurrently. After that, we create another loop where we use the process’ .join() method so that the program waits for those processes to complete their work. That part of the code is as follows:

    processes = [
        Process(target=worker, args=(task_queue,))
        for _ in range(5)
    ]

    # Start the worker processes
    for p in processes:
        p.start()

    # Wait for all worker processes to finish
    for p in processes:
        p.join()
    print("All tasks completed.")

This pattern is particularly useful for scenarios where tasks are independent and can be processed in parallel.

• The Future and Promise pattern

In the asynchronous programming paradigm, a Future represents a value that is not yet known but will be provided eventually. When a function initiates an asynchronous operation, instead of blocking until the operation completes and a result is available, it immediately returns a Future. This Future object acts as a placeholder for the actual result available later.

Futures are commonly used for I/O operations, network requests, and other time-consuming tasks that run asynchronously. They allow the program to continue executing other tasks rather than waiting for the operation to be completed. That property is referred to as non-blocking.

Once the Future is fulfilled, the result can be accessed through the Future, often via callbacks, polling, or blocking until the result is available.

A Promise is the writable, controlling counterpart to a Future. It represents the producer side of the asynchronous operation, which will eventually provide a result to its associated Future. When the operation completes, the Promise is fulfilled with a value or rejected with an error, which then resolves the Future.

Promises can be chained, allowing a sequence of asynchronous operations to be performed clearly and concisely.

By allowing a program to continue execution without waiting for asynchronous operations, applications become more responsive. Another benefit is composability: multiple asynchronous operations can be combined, sequenced, or executed in parallel in a clean and manageable way.

Real-world examples

Ordering a custom dining table from a carpenter provides a tangible example of the Future and Promise pattern. When you place the order, you receive an estimated completion date and design sketch (Future), representing the carpenter’s promise to deliver the table. As the carpenter works, this promise moves toward fulfillment. The delivery of the completed table resolves the Future, marking the fulfillment of the carpenter’s promise to you.

We can also find several examples in the digital realm, such as the following:

• Online shopping order tracking: When you place an order online, the website immediately provides you with an order confirmation and a tracking number (Future). As your order is processed, shipped, and delivered, status updates (Promise fulfillment) are reflected in real time on the tracking page, eventually resolving to a final delivery status.

• Food delivery apps: Upon ordering your meal through a food delivery app, you’re given an estimated delivery time (Future). The app continuously updates the order status—from preparation through pickup and delivery (Promise being fulfilled)—until the food arrives at your door, at which point the Future is resolved with the completion of your order.

• Customer support tickets: When you submit a support ticket on a website, you immediately receive a ticket number and a message stating that someone will get back to you (Future). Behind the scenes, the support team addresses tickets based on priority or in the order they were received. Once your ticket is addressed, you receive a response, fulfilling the Promise made when you first submitted the ticket.

Use cases for the Future and Promise pattern

There are at least four use cases where the Future and Promise pattern is recommended:

1.Data pipelines: In data processing pipelines, data is often transformed through multiple stages before reaching its final form. By representing each stage with a Future, you can effectively manage the asynchronous flow of data. For example, the output of one stage can serve as the input for the next, but because each stage returns a Future, subsequent stages don’t have to block while waiting for the previous ones to complete.

2.Task scheduling: Task scheduling systems, such as those in an operating system or a high-level application, can use Futures to represent tasks that are scheduled to run at a future time. When a task is scheduled, a Future is returned to represent the eventual completion of that task. This allows the system or the application to keep track of the task’s state without blocking execution.

3.Complex database queries or transactions: Executing database queries asynchronously is crucial for maintaining application responsiveness, particularly in web applications where user experience is paramount. By using Futures to represent the outcome of database operations, applications can initiate a query and immediately return control to the user interface or the calling function. The Future will eventually resolve with the query result, allowing the application to update the UI or process the data without having frozen or become unresponsive while waiting for the database response.

4.File I/O operations: File I/O operations can significantly impact application performance, particularly if executed synchronously on the main thread. By applying the Future and Promise pattern, file I/O operations are offloaded to a background process, with a Future returned to represent the completion of the operation. This approach allows the application to continue running other tasks or responding to user interactions while the file is being read from or written to. Once the I/O operation completes, the Future resolves, and the application can process or display the file data.

In each of these use cases, the Future and Promise pattern facilitates asynchronous operation, allowing applications to remain responsive and efficient by not blocking the main thread with long-running tasks.

Implementing the Future and Promise pattern – using concurrent.futures

To understand how to implement the Future and Promise pattern, you must first understand the three steps of its mechanism. Let’s break those down next:

1.Initiation: The initiation step involves starting an asynchronous operation using a function where, instead of waiting for the operation to complete, the function immediately returns a “Future” object. This object acts as a placeholder for the result that will be available later. Internally, the asynchronous function creates a “Promise” object. This object is responsible for handling the outcome of the asynchronous operation. The Promise is linked to the Future, meaning the state of the Promise (whether it’s fulfilled or rejected) will directly affect the Future.

2.Execution: During the execution step, the operation proceeds independently of the main program flow. This allows the program to remain responsive and continue with other tasks. Once the asynchronous task completes, its result needs to be communicated back to the part of the program that initiated the operation. The outcome of the operation (be it a successful result or an error) is passed to the previously created Promise.

3.Resolution: If the operation is successful, the Promise is “fulfilled” with the result. If the operation fails, the Promise is “rejected” with an error. The fulfillment or rejection of the Promise resolves the Future. Using the result is often done through a callback or continuation function, which is a piece of code that specifies what to do with the result. The Future provides mechanisms (for example, methods or operators) to specify these callbacks, which will execute once the Future is resolved.

In our example, we use an instance of the ThreadPoolExecutor class to execute tasks asynchronously. The submit method returns a Future object that will eventually contain the result of the computation. We start by importing what we need, as follows:

from concurrent.futures import ThreadPoolExecutor, as_completed

Then, we define a function for the task to be executed:

def square(x):
    return x * x

We submit tasks and get Future objects, then we collect the completed Futures. The as_completed function allows us to iterate over completed Future objects and retrieve their results:

with ThreadPoolExecutor() as executor:
    future1 = executor.submit(square, 2)
    future2 = executor.submit(square, 3)
    future3 = executor.submit(square, 4)

    futures = [future1, future2, future3]

    for future in as_completed(futures):
        print(f"Result: {future.result()}")

 

Result: 16
Result: 4
Result: 9

Implementing the Future and Promise pattern – using asyncio

Python’s asyncio library provides another way to execute tasks using asynchronous programming. It is particularly useful for I/O-bound tasks. Let’s see a second example using this technique.

What is asyncio?
The asyncio library provides support for asynchronous I/O, event loops, coroutines, and other concurrency-related tasks. So, using asyncio, developers can write code that efficiently handles I/O-bound operations.

Coroutines and async/await
A coroutine is a special kind of function that can pause and resume its execution at certain points, allowing other coroutines to run in the meantime. Coroutines are declared with the async keyword. Also, a coroutine can be awaited from other coroutines, using the await keyword.

We import the asyncio module, which contains everything we need:

import asyncio

Then, we create a function for the task of computing and returning the square of a number. We also want an I/O-bound operation, so we use asyncio.sleep(). Notice that in the asyncio style of programming, such a function is defined using the combined keywords async def – it is a coroutine. The asyncio.sleep() function itself is a coroutine, so we make sure to use the await keyword when calling it:

async def square(x):
    # Simulate some IO-bound operation
    await asyncio.sleep(1)
    return x * x

Then, we move to creating our main() function. We use the asyncio.ensure_future() function to create the Future objects we want, passing it square(x), with x being the number to square. We create three Future objects, future1, future2, and future3. Then, we use the asyncio.gather() coroutine to wait for our Futures to complete and gather the results. The code for the main() function is as follows:

async def main():
    fut1 = asyncio.ensure_future(square(2))
    fut2 = asyncio.ensure_future(square(3))
    fut3 = asyncio.ensure_future(square(4))

    results = await asyncio.gather(fut1, fut2, fut3)

    for result in results:
        print(f"Result: {result}")

At the end of our code file, we have the usual if __name__ == "__main__": block. What is new here, since we are writing asyncio-based code, is that we need to run asyncio’s event loop, by calling asyncio.run(main()):

if __name__ == "__main__":
    asyncio.run(main())

 

Result: 4
Result: 9
Result: 16

The order of the results may vary, depending on who is running the program and when. In fact, it is not predictable. You may have noticed similar behavior in our previous examples. This is generally the case with concurrency or asynchronous code.

This simple example shows that asyncio is a suitable choice for the Future and Promise pattern when we need to efficiently handle I/O-bound tasks (in scenarios such as web scraping or API calls).

• The Observer pattern in reactive programming

The Observer pattern (covered in Chapter 5, Behavioral Design Patterns) is useful for notifying an object or a group of objects when the state of a given object changes. This type of traditional Observer allows us to react to some object change events. It provides a nice solution for many cases, but in a situation where we must deal with many events, some depending on each other, the traditional way could lead to complicated, difficult-to-maintain code. That is where another paradigm called reactive
programming gives us an interesting option. In simple terms, the concept of reactive programming is to react to many events (streams of events) while keeping our code clean.

Let’s focus on ReactiveX (http://reactivex.io), which is a part of reactive programming. At the heart of ReactiveX is a concept known as an Observable. According to its official website, ReactiveX is about providing an API for asynchronous programming with what are called observable streams. This concept is added to the idea of the Observer, which we already discussed.

Imagine an Observable like a river that flows data or events down to an Observer. This Observable sends out items one after another. These items travel through a path made up of different steps or operations until they reach an Observer, who takes them in or consumes them.

Real-world examples

An airport’s flight information display system is analogous to an Observable in reactive programming. Such a system continuously streams updates about flight statuses, including arrivals, departures, delays, and cancellations. This analogy illustrates how observers (travelers, airline staff, and airport services subscribed to receive updates) subscribe to an Observable (the flight display system) and react to a continuous stream of updates, allowing for dynamic responses to real-time information.

A spreadsheet application can also be seen as an example of reactive programming, based on its internal behavior. In virtually all spreadsheet applications, interactively changing any one cell in the sheet will result in immediately reevaluating all formulas that directly or indirectly depend on that cell and updating the display to reflect these reevaluations.

Implementing the Observer pattern in reactive programming

For this example, we decided to build a stream of a list of (fake) people’s names (in people.txt) text file, and an observable based on it.

We start by importing what we need:

from pathlib import Path
import reactivex as rx
from reactivex import operators as ops

We define a function, firstnames_from_db(), which returns an Observable from the text file (reading the content of the file) containing the names, with transformations (as we have already seen) using flat_map(), filter(), and map() methods, and a new operation, group_by(), to emit items from another sequence—the first name found in the file, with its number of occurrence:

def firstnames_from_db(path: Path):
    file = path.open()

    # collect and push stored people firstnames
    return rx.from_iterable(file).pipe(
        ops.flat_map(
            lambda content: rx.from_iterable(
                content.split(", ")
            )
        ),
        ops.filter(lambda name: name != ""),
        ops.map(lambda name: name.split()[0]),
        ops.group_by(lambda firstname: firstname),
        ops.flat_map(
            lambda grp: grp.pipe(
                ops.count(),
                ops.map(lambda ct: (grp.key, ct)),
            )
        ),
    )

Then, in the main() function, we define an Observable that emits data every 5 seconds, merging its emission with what is returned from firstnames_from_db(db_file), after setting db_file to the people names text file, as follows:

def main():
    db_path = Path(__file__).parent / Path("people.txt")

    # Emit data every 5 seconds
    rx.interval(5.0).pipe(
        ops.flat_map(lambda i: firstnames_from_db(db_path))
    ).subscribe(lambda val: print(str(val)))

    # Keep alive until user presses any key
    input("Starting... Press any key and ENTER, to quit\n")

 

Starting... Press any key and ENTER, to quit
('Peter', 1)
('Gabriel', 1)
('Gary', 1)
('Heather', 1)
('Juan', 1)
('Alan', 1)
('Travis', 1)
('David', 1)
('Christopher', 1)
('Brittany', 1)
('Brian', 1)
('Stefanie', 1)
('Craig', 1)
('William', 1)
('Kirsten', 1)
('Daniel', 1)
('Derrick', 1)

Once you press a key and press Enter on the keyboard, the emission is interrupted, and the program stops.

Handling new streams of data

Our test worked, but in a sense, it was static; the stream of data was limited to what is currently in the text file. What we need now is to generate several streams of data. The technique we can use to generate the type of fake data in the text file is based on a third-party module called Faker (https://pypi.org/project/Faker). The code that produces the data is provided to you, for free (in peoplelist.py file), as follows:

from faker import Faker
import sys


fake = Faker()

args = sys.argv[1:]
if len(args) == 1:
    output_filename = args[0]
    persons = []
    for _ in range(0, 20):
        p = {"firstname": fake.first_name(), "lastname": fake.last_name()}
        persons.append(p)

    persons = iter(persons)

    data = [f"{p['firstname']} {p['lastname']}" for p in persons]
    data = ", ".join(data) + ", "

    with open(output_filename, "a") as f:
        f.write(data)
else:
    print("You need to pass the output filepath!")

Now, let’s see what happens when we execute both programs (ch07/observer_rx/peoplelist.py and ch07/observer_rx/rx_peoplelis.py):

• From one command-line window or terminal, you can generate people’s names, passing the right file path to the script; you would execute the following command: python ch07/observer_rx/peoplelist.py ch07/observer_rx/people.txt.
• From a second shell window, you can run the program that implements the Observable via the python ch07/observer_rx/rx_peoplelist.py command.

So, what is the output from both commands?

A new version of the people.txt file is created (with the random names in it, separated by a comma), to replace the existing file. And, each time you rerun that command (python ch07/observer_rx/peoplelist.py), a new set of names is added to the file.

The second command gives an output like the one you got with the first execution; the difference is that now it is not the same set of data that is emitted repeatedly. Now, new data can be generated in the source and emitted.

• Other concurrency and asynchronous patterns

There are some other concurrency and asynchronous patterns developers may use. We can cite the following:

The Actor model: A conceptual model to deal with concurrent computation. It defines some rules for how actor instances should behave: an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.

Coroutines: General control structures where flow control is cooperatively passed between two different routines without returning. Coroutines facilitate asynchronous programming by allowing execution to be suspended and resumed. As we have seen in one of our examples, Python has coroutines built in (via the asyncio library).

Message passing: Used in parallel computing, object-oriented programming (OOP), and inter-process communication (IPC), where software entities communicate and coordinate their actions by passing messages to each other.

Backpressure: A mechanism to manage the flow of data through software systems and prevent overwhelming components. It allows systems to gracefully handle overload by signaling the producer to slow down until the consumer can catch up.

posted on 2024-08-23 17:19  ZhangZhihuiAAA  阅读(1)  评论(0编辑  收藏  举报