随笔分类 - AI hardware
论文笔记 -- Communication Lower Bound in Convolution Accelerators 卷积加速器中的通信下界
摘要:论文笔记 Communication Lower Bound in Convolution Accelerators 卷积加速器中的通信下界 @(论文笔记) [TOC] 声明 : 本文是对 计算机体系结构领域的旗舰会议 HPCA 2020论文 : Chen X , Han Y , Wang Y .
Batch Normalization 和 Batch Renormalization 前向和反向公式详细推导
摘要:Batch Normalization 和 Batch Renormalization 前向和反向公式详细推导 [TOC] 一、BN前向传播 根据论文‘’Batch Normalization: Accelerating Deep Network Training by Reducing Inter
ACM FPGA 2019 -- Reconfigurable Convolutional Kernels for Neural Networks on FPGAs 论文解读
摘要:Reconfigurable Convolutional Kernels for Neural Networks on FPGAs 2019 ACM FPGA @(论文笔记) @[toc] reconfgurable constant multipliers (RCMs) showed that 1
ISSCC-2020:GANPU 论文解读
摘要:ISSCC 2020 GANPU论文解读 @论文笔记 GANPU: A 135TFLOPS/W Multi DNN Training Processor for GANs with Speculative Dual Sparsity Exploitation [TOC] 一、背景和动机 1.背景 这
Tangram: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators 阅读笔记
摘要:Tangram: Optimized Coarse Grained Dataflow for Scalable NN Accelerators @(论文笔记) [TOC] 1.Abstract + 针对层内并行性提出了buffer sharing dataflow。可以将分布式buffer组织为一种共