课本:Computer Architecture: A Quantitative Approach Author: John L. Hennessy and David A. Patterson
1 Fundamentals of Computer Architecture
1.1 Layers of Computer System
Application Language Machine -- High-Level Language Machine -- Assembly Language Machine M3 --
Operating System Machine M2 -- Conventional Machine --Microprogram Machine
1.2 Defining Computer Architecture
What needs to be taken into consideration.
1.3 Measuring and Reporting Performance
Define X is n times faster than Y:Execution time Y/Execution time X = n
To measure performance: CPU time = User CPU time + System CPU time, and thus we can calculate a percentage.
Reporting results:
To calculate the execution time based on mutiple tests:
1. Arithmetic mean
2. Weighted arithmetic mean
Methods above are bad.
3. Geometic mean
Example:
1.4 Quantitative Principles of Computer Design
Make the common case fast
Amdahl's law:
The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.
加速某一部分对性能的提升取决于这部分在整体中所占的百分比
speedup: 加速后性能 / 没加速的性能
I can hardly stand the fucking results.
Example:
Suppose that we are considering an enhancement that runs 10 times faster than the original machine, but is only usable 40% of the time. What is the overall speedup gained by incorporating the enhancement?
1/((1-0.4)+0.4/10) = 1.56
CPU performance Equation:
CPU time: CPU clock cycles * Clock cycle time = CPU clock cycles / Clock rate
CPI(Clock cycles per instruction 一条指令用几个周期) = CPU clock cycles / IC(instrcution count)
According to above, CPU time = CPI * IC / Clock rate, that's the data format we may use in exams.
Example:
Suppose we have the following measurements:
* Frequency of FP operations = 25%
* Average CPI of FP operations = 4.0
* Average CPI of other instructions = 1.33
* Frequency of FPSQR= 2%
* CPI of FPSQR = 20
Assume that the two design alternatives are to reduce the CPI of FPSQR to 2 or to reduce the average CPI of all FP operations to 2. Compare these two design alternatives using the CPU performance equation.
Original CPI: 0.25*4+0.75*1.33=2
1. 2-0.02*(20-2)=1.64
2. 2-0.25*(4-2)=1.5
Choose the second scheme.
More Examples:
- 1.1 Three enhancements with the following speedups are proposed for a new architecture :
Speedup1=30 Speedup2=20 Speedup3=15
Only one enhancement is usable at a time.
A. If enhancements 1 and 2 are each usable for 25% of the time, what fraction of the time must enhancement 3 be used to achieve an overall speedup of 10?
Ans: 1/[(1-0.25-0.25-x)+0.25/30+0.25/20+x/15]=10, x=0.45
B. Assume the enhancements can be used 25%, 35% and 10% of the time for enhancements 1,2,and 3,respectively. For what fraction of the reduced execution time is no enhancement in use?
Ans: (1-0.25-0.35-0.1)/[(1-0.25-0.35-0.1)+0.25/30+0.35/20+0.1/15]=90.2%
C. Assume, for some benchmark, the possible fraction of use is 15% for each of enhancements 1 and 2 and 70% for enhancement 3. We want to maximize performance. If only one enhancement can be implemented, which should it be ? If two enhancements can be implemented, which should be chosen? 可
List all the possible choices and compare.
- 1.3 In many practical applications that demand a real-time response, the computational workload W is often fixed. As the number of processors increases in a parallel computer, the fixed workload is distributed to more processors for parallel execution. Assume 20 percent of W must be executed sequentially, and 80 percent can be executed by 4 nodes simultaneously. What is a fixed-load speedup?
Ans: 1/(0.2+0.8/4)=2.5