摘要:
自动驾驶算力效率 特斯拉 Hardware 3.0 的效率之谜 特斯拉在其推出的 Hardware 3.0 自动驾驶平台中,采用自研芯片替代了Nvidia Drive PX2,其理论算力直线提升了 12 倍,而以 MAPS 方式来评估,其真实 AI 性能更是惊人的提升了 21 倍。具体而言,Hard 阅读全文
摘要:
昇腾AI 软硬件全栈平台 阅读全文
摘要:
TVM性能评估分析(七) Figure 1. Performance Improvement Figure 2. Depthwise convolution Figure 3. Data Fusion Figure 4. Data Fusion(2) Figure 5. Shared memory 阅读全文
摘要:
TVM性能评估分析(六) Figure 1. The workflow of development PC, compile, deploy to the device, test, then modify the codes again to see whether it accelerates. 阅读全文
摘要:
TVM性能评估分析(五) Figure 3. A futher speed up with operator fusion Table 1. Performance issue of cuBLAS’ batch matmul Table 2. Finding the best combination 阅读全文
摘要:
TVM性能评估分析(四) Figure 1. Efficient Privacy-Preserving ML Using TVM Figure 2. Motivation: Privacy-Preserving ML Figure 3. Backend Figure 4. Differential 阅读全文
摘要:
TVM性能评估分析(三) Figure 1. TVM’s WebGPU backend close to native GPU performance when deploying models to the web. Figure 2. WebGPU is to write shaders for 阅读全文
摘要:
TVM性能评估分析(二) Figure 1. A bird’s eye view of the µTVM + AutoTVM infrastructure Figure 2. A standard µTVM setup, where the host communicates with the de 阅读全文
摘要:
TVM性能评估分析(一) System Overview AutoTVM vs Auto-scheduler Table 1. Workflow Comparision Figure 1. Search Process Overview Figure 2. Code Performance Comp 阅读全文