如何把任务分配给线程
We can allocate tasks to threads in two different ways: static scheduling or dynamic scheduling.
NOTE
Under worksharing constructs in OpenMP and the parallel algorithms of Intel Threading Building Blocks (TBB), the actual assignment of tasks to threads is done “under the covers.” The programmer can influence that assignment to some degree, though…
In static scheduling, the division of labor is known at the outset of the computation and doesn’t change during the computation. If at all possible, when developing your own concurrent code, try to use a static schedule. This is the easiest method to program and will incur the least amount of overhead.
The mechanics and logic of code needed to implement a static schedule will involve each thread having a unique identification number in the range of [0, N–1] for N threads. This number can be easily assigned at the time a thread is created in the order that threads are created (code that can generate unique identification numbers to threads will be part of several implementation examples in later chapters). If tasks are collections of separate, independent function calls or groups of sequential source lines, you can group those calls or code lines into tasks that are assigned to threads through a thread’s ID number (e.g., through a switch statement). If tasks are loop iterations, you can divide the total number of iterations by the number of threads and assign block(s) of iterations to threads, again through the thread’s ID
number. You will have to add additional logic to compute the start and end values of the loop bounds in order for each thread to determine the block that should be executed.
When assigning loop iterations into blocks, you need to be sure that each thread doesn’t overlap execution of an iteration assigned to another thread and that all the loop iterations are covered by some thread. You won’t get the correct results if threads execute an iteration multiple times or leave out the computation of some iterations. An alternative to assigning loop iterations into blocks is to use the thread’s ID number as the starting value of the loop iterator and increment that iterator by the number of threads, rather than by 1. For example, if you have two threads, one thread will execute the odd-numbered iterations and the other thread will execute the even iterations. Obviously, you will need to make adjustments to where the loop starts and how to compute the next iteration per thread if the loop iterator doesn’t start at 0 and is already incremented by something other than 1. However, the implementation of setting up N threads to each do every Nth iteration will involve fewer code changes than dividing the iteration set into separate blocks.
Static scheduling is best used in those cases where the amount of computation within each task is the same or can be predicted at the outset. If you have a case where the amount of computation between tasks is variable and/or unpredictable, then you would be better served by using a dynamic scheduling scheme.
Under a dynamic schedule, you assign tasks to threads as the computation proceeds. The driving force behind the use of a dynamic schedule is to try to balance the load as evenly as possible between threads. Assigning tasks to threads is going to incur overhead from executing the added programming logic to carry out the assignment and from having threads seek out a new task.
There are many different ways to implement a dynamic method for scheduling tasks to threads, but they all require a set of many more tasks than threads. Probably the easiest scheduling scheme involves indexing the tasks. A shared counter is used to keep track of and assign the next task for execution. When seeking a new task, a thread gains mutually exclusive access to the counter, copies the value into a local variable, and increments the counter value for the next thread.
Another simple dynamic scheduling method involves setting up a shared container (typically a queue) that can hold tasks and allow threads to pull out a new task once the previous task is complete. Tasks (or adequate descriptions of tasks) must be encapsulated into some structure that can be pushed into the queue. Access to the queue must be mutually exclusive between threads to ensure that threads get unique tasks and no tasks are lost through some corruption of the shared container.
下面是一些更复杂的情形。
If tasks require some preprocessing before their assignment to threads, or if tasks are not all known at the outset of computation, you may need more complex scheduling methods. You can set one of your threads aside to do the preprocessing of each task or receive new tasks as they arise. If the computation threads rendezvous(会合,集结) with this extra thread in order to receive the next task for execution, you have a boss/worker algorithm. By placing a shared container to distribute tasks between the threads preparing tasks and the threads executing the task, you get the producer/consumer method.