[阅读] 神经网络 1 /[Reading] Neural Networks 1

[Reading] Neural Networks 1

Preface/前言

I wrote this blog to record my reading which is on http://neuralnetworksanddeeplearning.com/chap6.html.

What's more, I want to train my skills in English writing so I would write the blog in English originally. Also, for the convenience of Chinese Reader, I would translate it with Google Translate and post it behind all the article instead of my own translation because I think it's may be a waste of time and I think it's always good to read the materials in its own original language. 😃 Thanks for your understanding. Written by HarrySong.

我写了这个博客来记录我的读物,该书位于http://neuralnetworksanddeeplearning.com/chap6.html。

而且,我想训练我的英语写作技能,所以我会用以英语写博客。 另外,为了方便中国读者,我将用Google Translate翻译并将其张贴在所有文章的后面,而不是我亲自翻译,因为我觉得可能这样会浪费时间同时我认为读一个原语言组织的文字总是有好处的。 :) 感谢您的理解。由HarrySong编写。

English Version

Core Question

MNIST handwritten digit data classification: Why don't we use the fully connected layers to classify? Because it seems strange when you treat the pixels which are far and close equally, which means that you must consider the spatial structure of the original pictures, where we would use CNN. It's also important to note that the deep Convolutional Neural Networks are used in most networks for image recognition.

Here, a new concept, Concolutional Neural Network(CNN), is introduced.

CNN

CNN could have 3 basic ideas in all: local receptive fields, shared weights and pooling.

Local receptive fields

First, it's deserved to notice that the inputs which was a vertical line before would become a 28x28 square neurons instead. As to the next layer, which is the first hidden layer, the input neurons would not be connected fully while the neuron in the hidden layer would be connected to one region of the inputs, which is the local receptive field. Each connection learns a weight while the neuron would learn an overall bias.

One sentence to conclude is that one neuron in the hidden layer would be in charge of a particular local receptive field.

Therefore, as to the 28x28 image and 5x5 local receptive fields, the next layer would be 24x24 (28-5+1). (Condition: the stride length is 1)

Shared wights and biases

for the j, kth hidden neuron, the output is

\[\sigma(b+\sum^{4}_{l=0}\sum^{4}_{m=0}w_{l,m}a_{j+l,k+m}) \]

b is the shared bias, \(w_{l,m}\) is a 5x5 array of shared weights, $ a_{x,y}$ here means the input activation at position \(x,y\).

The shared weights and bias are often said to define a kernel or filter, where a kernel means a kind of feature. In order to do image recognition, we set 3 different feature maps from input layers to next layer. In practice , LeNet-5 uses 6 feature maps, each associated to a 5x5 local receptive field to recognize MNIST.

It's difficult to see what these feature detectors are learning from the feature maps images.

The advantage is that the shared kernel would reduce the number of parameters, where we only need 5*5+1=26 parameters in a feature map. Assuming that we have 20 feature maps, and we have 20x26=520 parameters. However, as to fully connected network, where the hidden layer has 30 neurons, we would have 28x28x30+30=23550 parameters in all. The fully connected network have more than 40 times as many parameters as the convolutional layer.

People sometimes write that equation as \(a_{1}=σ(b+w∗a_0)\), where \(a_1\) denotes the set of output activations from one feature map, \(a_0\) is the set of input activations, and \(*\) is called a convolution operation.

Pooling Layer

Usually, pooling layers are used immediately after convolutional layers, in order to simplify the information in the output from the convolutional layer.

A pooling layer here would take every feature maps output (\(\text{input} \rightarrow^{kernel} \text{feature maps}\)) from the convolutional layer. The pooling layer is used to summarize a region of the neurons in the previous layer. For example, max-pooling, which is to extract the maximum of a region(2x2) to the following neuron.

Assuming that we get a 12x12 neurons with 2x2 pooling from 24x24 output from the convolutional layer. Pooling layer have two functions here: first, it can give the information whether we find a feature in a region of the image; Another part is that, we will drop the exact location and get a rough location instead, which would reduced the number of parameters.

Max-pooling is only one of the techniques. We can also get the\(\sqrt{\sum_{i=0,j=0}^{m} f_{i,j}}\) to pool, which is L2 pooling academically.

Putting together

With such architecture, we would fullconected to the output of the pooling layer with the 10 output neurons representing 10 possible values ranging from 0 to 9.

中文版

核心问题

MNIST手写数字数据分类:为什么我们不使用完全连接的图层进行分类?因为当您将远近像素均等地对待时,这似乎很奇怪,这意味着您必须考虑原始图片的空间结构,我们将在此处使用CNN。还需要注意的是,大多数网络都使用深层卷积神经网络进行图像识别。

在这里,介绍了一个新概念,即协同神经网络(CNN)。

CNN

CNN可能在所有方面都有3个基本概念:本地接受领域,共享权重和集中化。

局部感受野

首先,应该注意到,以前是一条垂直线的输入将变为28x28方形神经元。至于下一层,即第一隐藏层,输入神经元将不会完全连接,而隐藏层中的神经元将连接到输入的一个区域,即局部感受野。每个连接学习重量,而神经元则学习整体偏向。

可以得出的一句话是,隐藏层中的一个神经元将负责特定的局部感受野。

因此,对于28x28图像和5x5局部接收场,下一层将是24x24(28-5 + 1)。 (条件:步幅为1)

共同的权重和偏移

对于第j个第k个隐藏神经元,输出为

\[\sigma(b + \sum ^ {4} _ {l = 0} \sum ^ {4} _ {m = 0} w_ {l,m} a_ {j + l,k + m}) \]

b是共享偏差,\(w_ {l,m}\)是共享权重的5x5数组,\(a_ {x,y}\)这里表示位置\(x,y\)的输入激活。

通常说来,共享权重和偏差定义了kernelfilter,其中内核表示一种功能。为了进行图像识别,我们从输入层到下一层设置了3个不同的特征图。在实践中,LeNet-5使用6个特征图,每个特征图与一个5x5的局部接收场相关联以识别MNIST。

很难看到这些特征检测器从特征图图像中学到了什么。

好处是共享内核会减少参数的数量,在特征图中,我们只需要5 * 5 + 1 = 26个参数。假设我们有20个特征图,并且我们有20x26 = 520个参数。但是,对于隐藏层有30个神经元的完全连接的网络,我们总共需要28x28x30 + 30 = 23550个参数。全连接网络的参数是卷积层的40倍以上。

人们有时将等式写为\(a_ {1} =σ(b + w ∗ a_0)\),其中\(a_1\)表示来自一个特征图的一组输出激活,\(a_0\)是一组输入激活,而\(*\)称为卷积运算。

池化层

通常,在卷积层之后立即使用池化层,以简化卷积层输出中的信息。

这里的池化层将从卷积层获取每个要素地图输出(\(\text {input} \rightarrow^ {kernel} \text {feature maps}\))。合并层用于汇总上一层中的神经元区域。例如,max-pooling,它是将最大区域(2x2)提取到随后的神经元。

假设我们从卷积层的24x24输出中获得了一个具有2x2池的12x12神经元。汇聚层在这里具有两个功能:首先,它可以提供是否在图像区域中找到特征的信息;另一部分是,我们将删除精确的位置并获得一个粗略的位置,这将减少参数的数量。

最大池化只是其中一种技术。我们还可以将\(\sqrt {\sum_ {i = 0,j = 0} ^ {m} f_ {i,j}}\)汇集到池中,这在学术上是L2汇集

放在一起

有了这样的架构,我们将完整地考虑池化层的输出,其中的10个输出神经元代表10个可能的值,范围从0到9。

posted @ 2020-12-02 23:11  Harry666  阅读(114)  评论(0编辑  收藏  举报