how can you communicate how much faster your concurrent application now performs

You could report two execution figures-

serial execution time

parallel execution time

on the same input data set-to your manager.

speedup

When computing speedup, be sure to use the best serial algorithms and code to compare against.

Be aware that speedup can (and should) change with the number of cores employed by the application.

known as superlinear speedup. If you run into this,suspect that you have made some error.

First, double-check the timings of both the serial and the concurrent applications.
Next, make sure that your applications are performing the desired computations correctly and getting the expected results.
Finally, ask yourself whether you are testing your application with a data set whose size is typical, as opposed to one that simply tests specific functionality.

How to use Amdahl's Law (how to estimate how much parallelism in the application)

If you have an idea about which functions should execute mostly concurrent, you can use a profiler report (from a typical data set) with a breakdown of percentages of execution time per function. Once you have the percentage of parallel execution time and, consequently, serial execution time, just plug the values into the formula.

关于Amdahl定律的评价，很经典，如下：

Amdahl's Law has received quite a bit of criticism for the way it ignores real-world circumstances like concurrency overhead (communication, synchronization, and other thread management) and not having processors with infinite numbers of cores available (yet). Other parallel execution models have been proposed that attempt to make reasonable assumptions for the discrepancies in the simple model of Amdahl's Law. Still, for its simplicity and the
understanding by the user that this is an upper bound, which is very unlikely to be achieved or surpassed, Amdahl's Law is a pretty good indication of the potential for speedup in a serial application.

Gustafson-Barsis's Law

Besides not taking into account the overheads inherent in concurrent algorithms, one of the strongest criticisms of Amdahl's Law is that as the number of cores increase, the amount of data handled is likely to increase as well. Amdahl's Law assumes a fixed data set size for any and all numbers of cores used. This is reflected in the assumption of the serial percentage remaining the same. But what if you had eight cores and were able to compute a data set that was eight times the size of the original? Does the serial execution time increase? Even if it does, will the time of the serial portion in the concurrent code be the same fraction of overall execution time as it would be if you ran this larger data set using the serial application? Perhaps more to the point, can the larger data set even be run on a single core system?

The Gustafson-Barsis Law, also known as scaled speedup, takes into account an increase in the data size in proportion to the increase in the number of cores and computes the (upper bound) speedup of the application, as if the larger data set could be executed in serial. Where Amdahl's Law is a tool for predicting the amount of speedup you could achieve by parallelizing a serial code, Gustafson-Barsis's Law is used to compute the speedup of an existing parallel code. The formula for scaled speedup is:

Speedup ≤ p + (1 - p)s

where p is the number of cores, and s is the percentage of time the parallel application spends in serial execution for the given data set and number of cores.

For example, if the total execution time for a parallel application is 1,040 seconds on 32 cores, but 14 seconds of that time is for serial execution on 1 of those 32 cores, the speedup of this application over the same data set being run on a single thread (if it were possible) is:

Speedup ≤ 32 + (1 - 32)(0.013) = 32 - 0.403 = 31.597

Efficience

speedup/core numbers

One Final Note on Speedup and Efficiency

Throughout this discussion of metrics, I assume that you are using one thread per core. If your application overloads the system with more threads than cores, you need to use the number of threads (in place of "cores") in all the previous formulas. There can be performance benefits (though usually minor) when you overload the system with threads. Of course, Intel processors with Hyper-Threading (HT) technology are designed to support multiple threads per core.

Your measurements of speedup and efficiency will tell you whether the utilization of more threads than cores is worthwhile. If your execution time remains almost the same with two threads per core and no change in the data set, your speedup will also stay about the same. Your efficiency will be halved, though. This is telling you that even if the physical cores are cranking at nearly 100%, the threads are only being used half the time (where threads may have been nearly fully utilized when there was only one thread per core). If hardware utilization is more important to you, and the execution time is not suffering, not having all threads as busy as possible may not be a detriment.

posted on 2010-09-12 17:38 胡是阅读(282) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部