CUDA编程图例

CUDA编程图例

CUDA C++ Programming Guide

 

 Figure 7. Matrix Multiplication without Shared Memory

 

 Figure 8. Matrix Multiplication with Shared Memory

 

 Figure 20. Examples of Global Memory Accesses. Examples of Global Memory Accesses by a Warp, 4-Byte Word per Thread, and Associated Memory Transactions for Compute Capabilities 3.x and Beyond

 

 Figure 21. Strided Shared Memory Accesses. Examples for devices of compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x

 

 Figure 22. Irregular Shared Memory Accesses. Examples for devices of compute capability 3.x, 5.x, or 6.x.

 

参考链接:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions__throughput-native-arithmetic-instructions

posted @ 2021-12-07 06:12  吴建明wujianming  阅读(73)  评论(0编辑  收藏  举报