GEMM-深度学习的心脏

GEMM就是BLAS中的一个功能,它实现了大矩阵之间相乘。其中必然涉及了如何读取,存储等问题。

参考博客:https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

看到这个时间分布图你是不是震惊了!要想提高神经网络计算时间,通过提高卷积层计算效率才是真理。

So what is GEMM?  It stands for General Matrix to Matrix Multiplication, and it essentially does exactly what it says on the tin, multiplies two input matrices together to get an output one. The difference between it and the kind of matrix operations I was used to in the 3D graphics world is that the matrices it works on are often very big. For example, a single layer in a typical network may require the multiplication of a 256 row, 1,152 column matrix by an 1,152 row, 192 column matrix to produce a 256 row, 192 column result. Naively, that requires 57 million (256 x 1,152, x 192) floating point operations and there can be dozens of these layers in a modern architecture, so I often see networks that need several billion FLOPs to calculate a single frame. Here’s a diagram that I sketched to help me visualize how it works:

 

posted on 2017-09-24 13:36  MissSimple  阅读(2382)  评论(0编辑  收藏  举报

导航