阿里平头哥C906特性及FMA指令latency和throughput

Feature Description
Architechture RV64GCV
Pipline 5 stage in-order
Vector Unit 32*128bit
CACHE 32KB I/D-cache
DRAM DDR3 2GB

Vector Unit: rvv0.7.1, INT8-64, FP8-64, BFP16, 4GFLOPs(1GHz)
I-cache: 2-way set associative, 64B line size, VIPT, FIFO
D-cache: 4-way set associative, 64B line size, VIPT, FIFO

According to C906 user's manual, the latency of 32bit FMA is 4.

We can also caculate the throughput(CPI) is 1 using 4(GFLOPS)/1(GHz)/(128(bit)/32(bit)).

posted @ 2021-08-26 16:10  HarryPotterIsDead!  阅读(440)  评论(0编辑  收藏  举报