TENSOR CORE PERFORMANCE: THE ULTIMATE GUIDE

TENSOR CORE PERFORMANCE: THE ULTIMATE GUIDE

1. 一个有意思的点,batch size / 108 整除的性能(TFLOPS)更好,因为A100的tensor core sm数为108.

见参考

 

 

 

参考:

https://developer.download.nvidia.cn/video/gputechconf/gtc/2020/presentations/s21929-tensor-core-performance-on-nvidia-gpus-the-ultimate-guide.pdf

 

posted @ 2022-06-01 15:55  xuyv  阅读(85)  评论(0编辑  收藏  举报