任务分解并行模式:如何找到可并行任务(task)
Simulating the parallel or concurrent execution of multiple threads on given source code is a skill that has been extremely beneficial to me in both designing concurrent algorithms and in proving them to be error-free。
寻找并行可分解任务需要 practice
NOTE
There is one tiny exception for not having a “magic bullet” that can identify potentially independent computations with loop iterations. If you suspect a loop has independent iterations (those that can be run in any order), try executing the code with the loop iterations running in reverse of their original order. If the application still gets the same results, there is a strong chance that the iterations are independent and can be decomposed into tasks. Beware that there might still be a “hidden” dependency waiting to come out and bite you when the iterations are run concurrently—for example, the intermediate sequence of values stored in a variable that is harmless when the loop iterations were run in serial, even when run backward.
方法:
… should initially focus on computationally intense portions of the application. That is, look at those sections of code that do the most computation or account for the largest percentage of the execution time.
Once you have identified a portion of the serial code that can be executed concurrently, keep in mind the following two criteria for the actual decomposition into tasks:
• There should be at least as many tasks as there will be threads (or cores).
• The amount of computation within each task (granularity) must be large enough to offset the overhead that will be needed to manage the tasks and the threads.
The first criterion is used to assure that you won’t have idle threads (or idle cores) during the execution of the application. If you can create the number of tasks based on the number of threads that are available, your application will be better equipped to handle execution platform changes from one run to the next. It is almost always better to have (many) more tasks than threads. This will allow the scheduling of tasks to threads greater flexibility to achieve a good load balance. This is especially true when the execution times of each task are not all the same or the time for tasks is unpredictable.
The second criterion seeks to give you the opportunity to actually get a performance boost in the parallel execution of your application. The amount of computation within a task is called the granularity. The more computation there is within a task, the higher the granularity; the less computation there is, the lower the granularity. The terms coarse-grained and finegrained are used to describe instances of high granularity and low granularity, respectively.
NOTE
The granularity of a task must be large enough to render the task and thread management code a minuscule fraction of the overall parallel execution. If tasks are too small, execution of the code to encapsulate the task, assign it to a thread, handle the results from the task, and any other thread coordination or management required in the concurrent algorithm can eliminate (best case) or even dwarf (worst case) the performance gained by running your algorithm on multiple cores.
You will need to strike a balance between these two criteria.
the second criterion, which is the more important of the two,…
Finally, don’t be afraid to go back and rework your task decomposition.