卷积和滤波器的关系 / 为什么2D卷积的卷积单元是3D的

Filters and Convolutions#

Excerpt from Focal Loss#

Classification Subnet:

The classification subnet predicts the probability of object presence at each spatial position for each of the A anchors and K object classes. This subnet is a small FCN attached to each FPN level; parameters of this subnet are shared across all pyramid levels.

Its design is simple. Taking an input feature map with C channels from a given pyramid level, the subnet applies four 3×3 conv layers, each with C filters and each followed by ReLU activations, followed by a 3×3 conv layer with K×A filters. Finally sigmoid activations are attached to output the K×A binary predictions per spatial location, see Figure 5 (c).

We use C = 256 and A = 9 in most experiments. In contrast to RPN [3], our object classification subnet is deeper, uses only 3×3 convs, and does not share parameters with the box regression subnet (described next).We found these higherlevel design decisions to be more important than specific values of hyperparameters.

Filters and Convs#

C 2D filters of size h×w can be concatenated to form one 3D filter of size C×h×w

如果我说 3x3 conv,并且输入图像有Cin个维度,希望网络的输出有Cout个通道,那么

image

  • 总共需要有Cout3×3卷积单元
  • 每个卷积单元有Cin个滤波器
  • Cin个滤波器滤波器大小均为3×3,在Cin个输入通道上单独运作

image

  • 每个2D卷积单元实际上是一个Cin×3×33D权重矩阵

image

叫2D的原因是卷积核步长移动的维度是2D的

So, is there a separate filter for each input channel?#

ref: https://ai.stackexchange.com/questions/5769/in-a-cnn-does-each-new-filter-have-different-weights-for-each-input-channel-or

YES, there are as many 2D filters as the number of input channels in the image. However, it helps if you think that for input matrices with more than one channel, there is only one 3D filter (as shown in the image above).

Then why is this called 2D convolution (if the filter is 3D and the input matrix is 3D)?#

This is 2D convolution because the strides of the filter are along the height and width dimensions only (NOT depth) and therefore, the output produced by this convolution is also a 2D matrix. The number of movement directions of the filter determines the dimensions of convolution.

Note: If you build up your understanding by visualizing a single 3D filter instead of multiple 2D filters (one for each layer), then you will have an easy time understanding advanced CNN architectures like Resnet, InceptionV3, etc.

posted @   ZXYFrank  阅读(92)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 零经验选手,Compose 一天开发一款小游戏!
· 一起来玩mcp_server_sqlite,让AI帮你做增删改查!!
点击右上角即可分享
微信分享提示
主题色彩