TVM性能评估分析(五)

TVM性能评估分析(五)

 

 

 

 

 

 

 Figure 3.  A futher speed up with operator fusion

 

 

 Table 1.  Performance issue of cuBLAS’ batch matmul

 

 

 Table 2.  Finding the best combination of number_thread. The results are obtained on a NVIDIA M40 GPU device with CUDA8.0.

 

 

 Figure 4.  DLPack provides an intermediate wrapper that is shared between frameworks and TVM

 

 

 Figure 5.  The OpenGL/WebGL Backend

 

 

 Figure 6. TVM utilizes a unified AST to define kernels, and compiles it to code on different platforms.

 

 

 Figure 7.  The benchmark is run in 4 different settings

 

 

 Figure 8. Inference Speed of Different Backends on ImageNet

 

 

 Figure 9.  Mali T860 and T880

 

 

 Figure 10.  Inference Speed of Different Backends on ImageNet

 

 

 Table 3. Inference Speed of FP16 on ImageNet

 

posted @ 2021-05-30 07:29  吴建明wujianming  阅读(160)  评论(0编辑  收藏  举报