TENSOR CORE PERFORMANCE: THE ULTIMATE GUIDE
TENSOR CORE PERFORMANCE: THE ULTIMATE GUIDE
1. 一个有意思的点,batch size / 108 整除的性能(TFLOPS)更好,因为A100的tensor core sm数为108.
见参考
参考:
https://developer.download.nvidia.cn/video/gputechconf/gtc/2020/presentations/s21929-tensor-core-performance-on-nvidia-gpus-the-ultimate-guide.pdf