Step 4. Tune for Performance: Removing Performance Bottlenecks

After making sure that you have removed all the threading (and new logic) errors from your code, the final step is to make sure the code is running at its best level of performance. (这是第四步的目的。) Before threading a serial application, be sure you start with a tuned code. Making serial tuning modifications to threaded code may change the whole dynamic of the threaded portions such that the additional threading material can actually degrade performance. If you have started with serial code that is already tuned, you can focus your search for performance problems on only those parts that have been threaded.

Tuning threaded code typically comes down to identifying situations like contention on synchronization objects(一), imbalance between the amount of computation assigned to each thread (二), and excessive overhead due to threading API calls or not enough work available to justify the use of threads (三). As with threading errors, there are software tools available to assist you in diagnosing and tracking down these and other performance issues.

You must also be aware that the actual threading of the code may be the culprit (犯人,罪犯;被控犯罪的人) to a performance bottleneck. By breaking up the serial computations in order to assign them to threads, your carefully tuned serial execution may not be as tuned as it was before. You may introduce performance bugs like false sharing, inefficient memory access patterns, or bus overload. Identification of these types of errors will require whatever technology can find these types of serial performance errors. The avoidance of both threading and serial performance problems (introduced due to threading) is another minor theme of this book. With a good solid design, you should be able to achieve very good parallel performance, so not much verbiage is spent on finding or tuning performance problems in code.

The testing and tuning cycle

When you modify your code to correct an identified performance bug, you may inadvertently add a threading error. This can be especially true if you need to revise the use of synchronization objects. Once you've made changes for performance tuning, you should go back to the Test for Correctness step and ensure that your changes to fix the performance bugs have not introduced any new threading or logic errors. If you find any problems and modify code to repair them, be sure to again examine the code for any new performance problems that may have been inserted when fixing your correctness issues.

Sometimes it may be worse than that. If you are unable to achieve the expected performance speed from your application, you may need to return to the Design and Implementation step and start all over. Obviously, if you have multiple sites within your application that have been made concurrent, you may need to start at the design step for each code segment once you have finished with the previous code segment. If some threaded code sections can be shown to improve performance, these might be left as is, unless modifications to algorithms or global data structures will affect those previously threaded segments. It can all be a vicious circle and can make you dizzy if you think about it too hard.

posted on 2010-09-11 09:55  胡是  阅读(269)  评论(0编辑  收藏  举报

导航