性能调优 session 1 - 计算机体系结构 量化研究方法

近期本人参与的存储系统项目进入到性能调优阶段,当前系统的性能指标离项目预期目标还有较大差距。本人一直奉行"理论指导下的实践",尤其在调试初期,更要抓住主要矛盾,投入最少的资源来获取最大的收益。如何找到主要矛盾并重点解决呢?
本文参考经典书籍《计算机体系结构 量化研究方法》,主要介绍系统可靠性和性能评估的基本理论,以及 Amdahl's Law (阿姆达定律)和 processor performance equation(处理器性能等式),为性能调优和系统可靠性评估提供理论支撑。

Background and Introduction

Dependability

  • SLA (Service Level Agreement)
    • Service Accomplishment, where the service is delivered as specified
    • Service Interruption, where the delivered service is different from the SLA
  • Module Reliability
    • Mean time to failure (MTTF)
    • Mean time to repair (MTTR)
    • Mean time between failures (MTBF) = MTTF + MTTR
    • Failure in time (FIT): failures per billion hours
  • Module Availability
    • Module availability = MTTF / (MTTF + MTTR)

Example1

Assume a disk subsystem with the following components and MTTF:

  • 10 disks, each rated at 1,000,000-hour MTTF
  • 1 ATA controller, 500,000-hour MTTF
  • 1 power supply, 200,000-hour MTTF
  • 1 fan, 200,000-hour MTTF
  • 1 ATA cable, 1,000,000-hour MTTF

Using the simplifying assumptions that the lifetime are exponentially distributed and that failures are independent, compute the MTTF of the system as a whole.

Answer1

The sum of the failure rates is

Failure ratesystem=10×11,000,000+1500,000+1200,000+1200,000+11,000,000=10+2+5+5+11,000,000=231,000,000=23,0001,000,000,000

or 23,000 FIT.
The MTTF for the system is just the inverse of the failure rate

MTTFsystem=1Failure rate=1,000,0002343,500 hours

or just under 5 years.

Example2

Disk subsystems often have redundant power supplies to improve dependability. Using the preceding components and MTTFs, calculate the reliability of redundant power supplies. Assume that one power supply is sufficient to run the disk subsystem and that we are adding one redundant power supply.
Assumptions:

  1. lifetime of components are exponentially distributed.
  2. there is no dependency between the components failures.
  3. MTTF for our redundant power supplies is the mean time until one power supply failed divided by the chance that the other will fail before the first one is replaced.

Answer2

Mean time until one supply failed is MTTFpowersupply/2.
A good approximation of the probability of a second failure is MTTR over the mean time until the other power supply fails.

MTTFpower supply pair=MTTFpower supply/2MTTRpower supplyMTTFpower supply=MTTFpower supply22×MTTRpower supply

Assume a human operator to notice the failure and replace it, the reliability of the fault tolerant pair of power supplies is

MTTFpowersupplypair=20000022×24830,000,000

making the pair about 4150 times more reliable than a single supply.

Annual Failure Rate

Fallacy

The rated mean time to failure of disks is 1,200,000 hours or almost 140 years so disk practically never fail.

The number 1,200,000 far exceeds the lifetime of a disk, which is commonly assumed to be 5 years or 43,800 hours.
For this large MTTF to make some sense: keep replacing the disk every 5 years - the planned lifetime of the disk. Replace a disk 27 times before a failure in next century, or about 140 years.
Therefore, more useful measure is the percentage of disks that fail, which is called annual failure rate (AFR).

Example

Assume 1000 disks with a 1,000,000-hour MTTF and that the disks are used 24 hours a day. If you replaced failed disk with a new one having the same reliability characteristics, the number of failed disks in a year(8760 hours) is

Failed disks=number of disks×time periodMTTF=1000 disks×8760 hours/disk1,000,000hours=9

0.9% of disks would fail per year, 4.4% over 5-years lifetime.
In real environments according to research, 3%-7% of drives failed per year for an MTTF of about 125,000-300,000 hours.

The real-world MTTF is about 2-10 times worse than the manufacture's MTTF.

Performance Measurement

  • Typical performance metrics
    • response time
    • throughput
  • Execution time
    • Wall clock time: include all system overheads
    • CPU time: only computation time
  • Speedup of X relative to Y
    X is faster than Y,

    n=Execution timeYExecution timeX=1/PerformanceY1/PerformanceX=PerformanceXPerformanceY

  • Benchmarks
    • Kernels(e.g. matrix multiply)
    • Toy program (e.g. quick sort)
      Above 2 metrics cannot give the real performance of application execution.
    • Synthetic benchmarks (e.g. Dhrystone)
    • Benchmark suites (e.g. SPEC06FP, TPC-c)

Quantitative Principles of Computer Design

  • Take advantage of parallelism
    e.g. multiple processors, disks, memory banks, pipelining, multiple function units
  • Principle of locality
    • reuse of data and instructions
    • Temporal locality and spatial locality
  • Focus on the common case
    • favor the frequent case over the infrequent case
    • Amdahl's Law
    • processor performance equation

Amdahl's Law

Basics

Amdahl's law gives us a quick way to find speedup from some enhancement, which depends on 2 factors:

  • the fraction of the computation time in the original computer that can be converted to take advantage of the enhancement.
  • the improvement gained by the enhanced execution mode, that is, how much faster the task would run if the enhanced mode were used for the entire program.

Execution timenew=Execution timeold×((1Fractionenhanced)+FractionenhancedSpeedupenhanced)

The overall speedup is the ratio of the execution times:

Speedupoverall=Execution timeoldExecution timenew=1(1Fractionenhanced)+FractionenhancedSpeedupenhanced

Examples

Example1

Suppose that we want to enhance the processor used for web serving. The new processor is 10 times faster on computation in the web serving application than the old processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for IO 60% of the time, what is the overall speedup gained by incorporating the enhancement?
Answer1

Fractionenhanced=0.4;Speedupenhanced=10;Speedupoverall=10.6+0.4101.56

Example2

FSQRT (Floating-point square root)
Proposal 1: FSQRT is responsible for 20% of the execution time of a critical graphics benchmark. Enhance FSQRT hardware and speed up this operation by a factor of 10.
Proposal 2: FP instructions are responsible for half of the execution time for the application. Make all FP instructions in the graphics process run faster by a factor of 1.6.
Compare these 2 design alternatives.

Answer2

SpeedupFSQRT=1(10.2)+0.210=1.22

SpeedupFP=10.5+0.51.6=1.23

Improving the performance of the FP operations overall is slightly better because of the higher frequency.

Example3

Back to dependability example:

Failure ratesystem=10+2+5+5+11,000,000=231,000,000

The fraction of power supply in system is 523=0.22.
After adding a redundant power supply, the system is about 4150 times more reliable than before.
The reliability improvement would be

Improvementpowersupplypair=1(10.22)+0.2241501.28

Despite an impressive 4150x improvement in reliability of one module, from the system's perspective, the change has a measurable but small benefit.

Summary

  • Amdahl's law can serve as a guide to how much an enhancement will improve performance and how to distribute resources to improve cost performance. The goal, clearly, is to speed resources proportional to where time is spent.
  • Amdahl's law is particularly useful for comparing the overall system performance/processor design of 2 alternatives.

Processor Performance Equation

Basics

CPU time=CPU clock cycles of a program×Clock cycle time

or

CPU time=CPU clock cycles of a programClock rate

From instruction respect,

CPI=CPU clock cycles of a programInstruction count

CPU time=IC×CPI×clock cycle time

Term & Dependency:

  • clock cycle time - Hardware technology and organization, 1/clock rate
  • CPI, clock cycles per instruction - Organization and instruction set architecture
  • IC, instruction count - Instruction set architecture and compiler technology

For different types of instructions,

CPU time=(Σi=1nICi×CPIi)×Clock cycle time

Overall CPI

CPI=Σi=1nICi×CPIiIC=Σi=1nICiIC×CPIi

Examples

Consider previous Example2 in section Amdahl's Law, here modified to use measurements of the frequency of the instructions and of the instruction CPI values, which, in practice, are obtained by simulation or by hardware instrumentation.
Example

Suppose we made the following measurements:

  • Frequency of FP operations = 25%
  • Average CPI of FP operations = 4.0
  • Average CPI of other instructions = 1.33
  • Frequency of FSQRT = 2%
  • CPI of FSQRT = 20

Assume that the 2 design alternatives are to

  1. decrease the CPI of FSQRT to 2
  2. decrease the average CPI of all FP operations to 2.5.

Compare these 2 design alternatives using the processor performance equation.

Answer

Original CPI with neither enhancement:

CPIoriginal=Σi=1nICiIC×CPIi=(4.0×25%)+(1.33×75%)=2.0

CPIwith new FSQRT=CPIoriginal2%×(CPIold FSQRTCPIof new FSQRT)=2.02%×(202)=1.64

Since the CPI of overall FP enhancement is slightly lower, its performance will be marginally better.

Speedupnew FP=CPU timeoriginalCPU timenewFP=IC×CPIoriginal×clock cycle timeIC×CPInewFP×clock cycle time=2.01.625=1.23

It is more possible to measure the constituent parts of the processor performance equation. Such isolated measurements are a key advantage of using processor performance equation versus Amdahl's Law in the previous example. In particular, it may be difficult to measure things such as the fraction of execution time for which a set of instructions is responsible.

posted @   Changry  阅读(291)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!
点击右上角即可分享
微信分享提示