阿里平头哥C906特性及FMA指令latency和throughput
Feature | Description |
---|---|
Architechture | RV64GCV |
Pipline | 5 stage in-order |
Vector Unit | 32*128bit |
CACHE | 32KB I/D-cache |
DRAM | DDR3 2GB |
Vector Unit: rvv0.7.1, INT8-64, FP8-64, BFP16, 4GFLOPs(1GHz)
I-cache: 2-way set associative, 64B line size, VIPT, FIFO
D-cache: 4-way set associative, 64B line size, VIPT, FIFO
According to C906 user's manual, the latency of 32bit FMA is 4.
We can also caculate the throughput(CPI) is 1 using 4(GFLOPS)/1(GHz)/(128(bit)/32(bit)).