工具/插件 -- CACTI:一种Cache/Memory分析工具
工具/插件 -- CACTI:一种Cache/Memory分析工具
@(工具/插件)
最近发现了一种可以评估DRAM访存功耗的工具,对于需要分析片外存储(DRAM)的访存功耗以及延时的设计比较有用,例如:深度学习加速器设计。
1. 简介
CACTI是一种分析工具,它接受一组 Caches/Memory参数作为输入,并计算其访存时间、功耗、周期时间和面积。目前更新到7.0版本,并且支持下面几种Memory的分析:
- direct mapped caches
- set-associative caches
- fully associative caches
- Embedded DRAM memories
- Commodity DRAM memories
此外,还有以下功能:
-
支持multi-ported uniform cache access (UCA)和multi-banked, multi-ported non-uniform cache access (NUCA).
-
泄漏功耗的计算也考虑到了环境温度。
-
Router power model.
-
Interconnect model with different delay, power, and area properties including low-swing wire model.
-
An interface to perform trade-off analysis involving power, delay,area, and bandwidth.
-
All process specific values used by the tool are obtained from ITRS and currently, the tool supports 90nm, 65nm, 45nm, and 32nm technology nodes.
-
Chip IO model to calculate latency and energy for DDR bus. Users can model different loads (fan-outs) and evaluate the impact on frequency and energy. This model can be used to study LR-DIMMs, R-DIMMs, etc.
2. 使用
源码地址:https://github.com/HewlettPackard/cacti
技术文档: http://www.hpl.hp.com/techreports/2013/HPL-2013-79.pdf
在Windows上没调起来(windows上c++库缺少pthread,没找到比较简单的方法),后面直接在Centos上测试,下面是简单的使用方法:
- 从源码地址下载c++源码,放到centos系统下。
- 进入源码文件夹,直接在命令行里
make
- 生成名为
cacti
的可执行文件后,执行
./cacti -infile ***.cfg
其中.cfg文件是配置memory属性的文件,需要根据所使用的DRAM属性进行更改,这里我直接拿了他sample里的一个配置文件运行了:./cacti -infile sample_config_files/ddr3_cache.cfg
最后会得到一个详细的分析文档,这边贴一下:
Cache size : 8388608
Block size : 64
Associativity : 8
Read only ports : 0
Write only ports : 0
Read write ports : 1
Single ended read ports : 0
Cache banks (UCA) : 1
Technology : 0.022
Temperature : 360
Tag size : 42
array type : Cache
Model as memory : 0
Model as 3D memory : 0
Access mode : 0
Data array cell type : 0
Data array peripheral type : 0
Tag array cell type : 0
Tag array peripheral type : 0
Optimization target : 2
Design objective (UCA wt) : 0 0 0 100 0
Design objective (UCA dev) : 20 100000 100000 100000 100000
Cache model : 0
Nuca bank : 0
Wire inside mat : 1
Wire outside mat : 1
Interconnect projection : 1
Wire signaling : 1
Print level : 1
ECC overhead : 1
Page size : 8192
Burst length : 8
Internal prefetch width : 8
Force cache config : 0
Subarray Driver direction : 1
iostate : READ
dram_ecc : NO_ECC
io_type : DDR3
dram_dimm : UDIMM
IO Area (sq.mm) = inf
IO Timing Margin (ps) = 35.8333
IO Votlage Margin (V) = 0.155
IO Dynamic Power (mW) = 1282.42 PHY Power (mW) = 232.752 PHY Wakeup Time (us) = 27.503
IO Termination and Bias Power (mW) = 3136.7
---------- CACTI (version 7.0.3DD Prerelease of Aug, 2012), Uniform Cache Access SRAM Model ----------
Cache Parameters:
Total cache size (bytes): 8388608
Number of banks: 1
Associativity: 8
Block size (bytes): 64
Read/write Ports: 1
Read ports: 0
Write ports: 0
Technology size (nm): 22
Access time (ns): 3.03414
Cycle time (ns): 1.84197
Total dynamic read energy per access (nJ): 0.381869
Total dynamic write energy per access (nJ): 0.446873
Total leakage power of a bank (mW): 2520.29
Total gate leakage power of a bank (mW): 4.71441
Cache height x width (mm): 3.07383 x 2.89775
Best Ndwl : 8
Best Ndbl : 8
Best Nspd : 2
Best Ndcm : 1
Best Ndsam L1 : 8
Best Ndsam L2 : 1
Best Ntwl : 16
Best Ntbl : 8
Best Ntspd : 8
Best Ntcm : 1
Best Ntsam L1 : 8
Best Ntsam L2 : 2
Data array, H-tree wire type: Global wires with 30% delay penalty
Tag array, H-tree wire type: Global wires with 30% delay penalty
Time Components:
Data side (with Output driver) (ns): 3.03414
H-tree input delay (ns): 0.860695
Decoder + wordline delay (ns): 0.607741
Bitline delay (ns): 0.473783
Sense Amplifier delay (ns): 0.00189739
H-tree output delay (ns): 1.09002
Tag side (with Output driver) (ns): 0.866708
H-tree input delay (ns): 0.250295
Decoder + wordline delay (ns): 0.0962495
Bitline delay (ns): 0.078
Sense Amplifier delay (ns): 0.00189739
Comparator delay (ns): 0.0162774
H-tree output delay (ns): 0.440265
Power Components:
Data array: Total dynamic read energy/access (nJ): 0.360657
Total energy in H-tree (that includes both address and data transfer) (nJ): 0.270396
Output Htree inside bank Energy (nJ): 0.263979
Decoder (nJ): 0.000237668
Wordline (nJ): 0.000275334
Bitline mux & associated drivers (nJ): 0
Sense amp mux & associated drivers (nJ): 0
Bitlines precharge and equalization circuit (nJ): 0.00163006
Bitlines (nJ): 0.0612354
Sense amplifier energy (nJ): 0.0018371
Sub-array output driver (nJ): 0.0249178
Total leakage power of a bank (mW): 2357.99
Total leakage power in H-tree (that includes both address and data network) ((mW)): 18.9776
Total leakage power in cells (mW): 0
Total leakage power in row logic(mW): 0
Total leakage power in column logic(mW): 0
Total gate leakage power in H-tree (that includes both address and data network) ((mW)): 0.0916133
Tag array: Total dynamic read energy/access (nJ): 0.0212128
Total leakage read/write power of a bank (mW): 162.298
Total energy in H-tree (that includes both address and data transfer) (nJ): 0.00268136
Output Htree inside a bank Energy (nJ): 0.00104879
Decoder (nJ): 0.000585105
Wordline (nJ): 0.000356972
Bitline mux & associated drivers (nJ): 0
Sense amp mux & associated drivers (nJ): 0.000288214
Bitlines precharge and equalization circuit (nJ): 0.00153419
Bitlines (nJ): 0.0132631
Sense amplifier energy (nJ): 0.00155643
Sub-array output driver (nJ): 8.13397e-05
Total leakage power of a bank (mW): 162.298
Total leakage power in H-tree (that includes both address and data network) ((mW)): 0.23223
Total leakage power in cells (mW): 0
Total leakage power in row logic(mW): 0
Total leakage power in column logic(mW): 0
Total gate leakage power in H-tree (that includes both address and data network) ((mW)): 0.00146699
Area Components:
Data array: Area (mm2): 7.28836
Height (mm): 3.07383
Width (mm): 2.3711
Area efficiency (Memory cell area/Total area) - 73.1983 %
MAT Height (mm): 0.716448
MAT Length (mm): 0.540768
Subarray Height (mm): 0.328909
Subarray Length (mm): 0.26532
Tag array: Area (mm2): 0.377107
Height (mm): 0.716051
Width (mm): 0.526648
Area efficiency (Memory cell area/Total area) - 74.9106 %
MAT Height (mm): 0.173381
MAT Length (mm): 0.063873
Subarray Height (mm): 0.0822272
Subarray Length (mm): 0.027995
Wire Properties:
Delay Optimal
Repeater size - 42.0297
Repeater spacing - 0.0329013 (mm)
Delay - 0.216837 (ns/mm)
PowerD - 0.000279845 (nJ/mm)
PowerL - 0.0215298 (mW/mm)
PowerLgate - 9.15623e-05 (mW/mm)
Wire width - 0.022 microns
Wire spacing - 0.022 microns
5% Overhead
Repeater size - 17.0297
Repeater spacing - 0.0329013 (mm)
Delay - 0.226875 (ns/mm)
PowerD - 0.0001818 (nJ/mm)
PowerL - 0.00872349 (mW/mm)
PowerLgate - 3.70994e-05 (mW/mm)
Wire width - 0.022 microns
Wire spacing - 0.022 microns
10% Overhead
Repeater size - 15.0297
Repeater spacing - 0.0329013 (mm)
Delay - 0.235988 (ns/mm)
PowerD - 0.000174237 (nJ/mm)
PowerL - 0.00769899 (mW/mm)
PowerLgate - 3.27424e-05 (mW/mm)
Wire width - 0.022 microns
Wire spacing - 0.022 microns
20% Overhead
Repeater size - 12.0297
Repeater spacing - 0.0329013 (mm)
Delay - 0.257722 (ns/mm)
PowerD - 0.00016297 (nJ/mm)
PowerL - 0.00616223 (mW/mm)
PowerLgate - 2.62069e-05 (mW/mm)
Wire width - 0.022 microns
Wire spacing - 0.022 microns
30% Overhead
Repeater size - 10.0297
Repeater spacing - 0.0329013 (mm)
Delay - 0.28134 (ns/mm)
PowerD - 0.000155511 (nJ/mm)
PowerL - 0.00513773 (mW/mm)
PowerLgate - 2.18498e-05 (mW/mm)
Wire width - 0.022 microns
Wire spacing - 0.022 microns
Low-swing wire (1 mm) - Note: Unlike repeated wires,
delay and power values of low-swing wires do not
have a linear relationship with length.
delay - 0.0902442 (ns)
powerD - 2.8399e-06 (nJ)
PowerL - 1.71796e-07 (mW)
PowerLgate - 1.29017e-09 (mW)
Wire width - 4.4e-08 microns
Wire spacing - 4.4e-08 microns
Segmentation fault
其中
Cache Parameters:
Total dynamic read energy per access (nJ): 0.381869
Total dynamic write energy per access (nJ): 0.446873
给出了单次的读写功耗。
具体的配置文件相关条目的说明可以翻阅上面提到的技术文档,后面有时间再研究一下。