存储器结构层次(二)
局部性:
局部性分为时间局部性和空间局部性:Locality is typically described as having two distinct forms: temporal locality and spatial locality. In a program with good temporal locality, a memory location that is referenced once is likely to be referenced again multiple times in the near future. In a program with good spatial locality, if a memory location is referenced once, then the program is likely to reference a nearby memory location in the near future.
一个使用局部性的例子:
At the operating system level, the principle of locality allows the system to use the main memory as a cache of the most recently referenced chunks of the virtual address space. Similarly, the operating system uses main memory to cache the most recently used disk blocks in the disk file system.
CSAPP分两个方面说明了局部性的问题:
Locality of References to Program Data 和 Locality of Instruction Fetches
关于前者,CSAPP是举例说明了局部性的问题:
1 int sumvec(int v[N]) 2 { 3 int i, sum = 0; 4 5 for (i = 0; i < N; i++) 6 sum += v[i]; 7 return sum; 8 }
在这个例子里:
sum每次循环都会被访问一次,所以有好的时间局部性
v是一个接着一个地读取,空间局部性好,时间局部性差
Stride-1 reference patterns are a common and important source of spatial locality in programs. In general, as the stride increases, the spatial locality decreases.
关于取指令的局部性分析,举例:
int sumarraycols(int a[M][N]) 2 { 3 int i, j, sum = 0; 4 5 for (j = 0; j < N; j++) 6 for (i = 0; i < M; i++) 7 sum += a[i][j]; 8 return sum; 9 }
从取指的角度,这个函数时间局部性和空间局部性都很好,解释如下:
The instructions in the body of the for loop are executed in sequential memory order, and thus the loop enjoys good spatial locality. Since the loop body is executed multiple times, it also enjoys good temporal locality.
还顺带解释了指令和数据的区别:
An important property of code that distinguishes it from program data is that it is rarely modified at run time. While a program is executing, the CPU reads its instructions from memory. The CPU rarely overwrites or modifies these instructions.
关于locality的总结:
Programs that repeatedly reference the same variables enjoy good temporal locality(不断引用同一个变量的程序具有好的时间局部性)
For programs with stride-k reference patterns, the smaller the stride the better the spatial locality. Programs with stride-1 reference patterns have good spatial locality. Programs that hop around memory with large strides have poor spatial locality(步长越短,空间局部性越好)
Loops have good temporal and spatial locality with respect to instruction fetches. The smaller the loop body and the greater the number of loop iterations, the better the locality(取指的时候,循环的时间局部性和空间局部性很好,循环体越短,循环次数越多,局部性越好)
存储器的层次图:
值得一提的固态硬盘的位置:
As another example, solid state disks are playing an increasingly important role in the memory hierarchy, bridging the gulf between DRAM and rotating disk.
关于cold misses: An empty cache is sometimes referred to as a cold cache, and misses of this kind are called compulsory misses or cold misses. Cold misses are important because they are often transient events that might not occur in steady state, after the cache has been warmed up by repeated memory accesses
一种设计缓存的方法是利用哈希,使得k+1层的数据按照地址映射到k层的某个位置
working set是程序运行过程中访问的一个大小相对固定的缓存块的一部分
capacity misses: When the size of the working set exceeds the size of the cache, the cache will experience what are known as capacity misses. In other words, the cache is just too small to handle this particular working set.
那么,不同层次的缓存是由谁管理的呢?
The compiler manages the register file, the highest level of the cache hierarchy. It decides when to issue loads when there are misses, and determines which register to store the data in. The caches at levels L1, L2, and L3 are managed entirely by hardware logic built into the caches. In a system with virtual memory, the DRAM main memory serves as a cache for data blocks stored on disk, and is managed by a combination of operating system software and address translation hardware on the CPU. For a machine with a distributed file system such as AFS, the local disk serves as a cache that is managed by the AFS client process running on the local machine. In most cases, caches operate automatically and do not require any specific or explicit actions from the program.