基于theano的深度卷积神经网络

1.引言

卷积神经网络(Convolutional Neural Networks , CNN)受到视网膜上的细胞只对视野范围内的部分区域敏感,这一部分区域称为感受域(receptive field).卷积神经网络正是采用了这种机制,每一个神经元只与一部分输入相连接。

2.稀疏连接

CNNs通过局部连接的方式揭示了空间中的局部相关性。在 $m$ 层的隐单元的输入来自于 $m-1$ 层的一部分单元的加权和,这一部分单元在空间上是连续的感受域。如下图:

可以把 $m-1$ 层想象成视网膜输入。$m$ 层的单元的感受域的宽度均为3,因此只与视网膜层的 3 个相邻的神经元相连接。$m+1$ 层的单元与其下面一层的连接方式也是如此。每一个神经元对不在感受域范围内的变化是没有反应的,所以上面的结构保证学习出一种“滤波器“,使其对局部空间的输入模式产生强烈的反应。

但是,正如上面图中所示,把许多这样的滤波器层层级联,局部感知逐渐变得全局感知,$m$ 层的每一个单元只对部分输入感知,而 $m+1$ 层的单元又将 $m$ 层的感知结果综合起来从而形成对输入层全部的一个感知,所以$m+1$隐层单元可以看作是对宽度为5的特征的一个非线性编码。

3.共享权重(Shared Weights

在CNNs,每个滤波器 $h_{i}$ 重复地逐步横跨整个输入层。重复的单元共享参数(权重向量和偏置),从而形成一幅特征图。

在上图中,3个隐层单元属于同一幅特征图,一样颜色的权重值是共享的,即相等的。

滤波器通过这种方式使得图像中可视层中任意位置的特征都能被检测出来,权重共享大大减少了需要学习的参数的数量。

4.细节和符号

通过重复地把一个函数运用到整个图像的子区域可以得到一幅特征图,即用一个线性滤波器对图像进行卷积操作,加上偏置项,然后再采用一个非线性函数。如果用 $h^{k}$ 表示第 $k$ 幅特征图,其对应的滤波器由 $W^{k}$ 和偏置 $b_{k}$ 决定, 那么特征图 $h^{k}$ 可以由下计算得到(采用 tanh 作为非线性函数):

$h_{ij}^{k}=tanh((W^{k}*x)_{ij}+b_{k}$

为了得到对数据更加丰富的表示,通常每个隐层都由多幅特征图组成:$\{h^{\text{(k)}},k=0,...K\}$.权重 $W$ 由一个4维的张量表示, 4各维度分别表示:目的特征图,源特征图,源特征图的垂直坐标,源特征图的水平坐标。偏置 $b$ 由一个向量表示,其中每一个元素是每一个目标特征图对应的偏置。可以表示如下:

在上图中 $W_{ij}^{kl}$ 表示在 $m-1$ 层的第 $k$ 幅特征图的每一个像素 与第 $m$ 层的第 $l$ 幅特征图的像素 $(i,j)$ 之间的连接权重。

5.卷积操作

卷积操作(Convolution operation,ConvOp)在theano中是通过theano.tensor.signal.conv2d实现的,它需要两个输入:

  • 输入图像的部分子集对应的一个4阶张量,该张量的每一维分别表示:子集的大小,输入特征图的编号,图像的高度,图像的宽度
  • 表示权重矩阵 $W$ 的一个4阶张量,每一维分别表示:在 $m$ 的特征图像的编号,$m-1$ 层特征图像的编号,滤波器的高度,滤波器的宽度

这里还要介绍一个在下面代码中将要用到的一个函数 dimshuffle(*pattern):

  例如dimshuffle('x', 2, 'x', 0, 1),就是将原来3阶张量扩展为5阶张量,新张量的第0维和第2维为0,而第1维,第3维和第4维分别由原来3阶张量的第2维,第0维和第1维映射而来。

  如果原来张量的形状为(20,30,40),通过dimshuffle('x', 2, 'x', 0, 1)之后,形状变为(1,40,1,20,30)

  dimshuffle(0, 1) -> 和原来一样

  dimshuffle(1, 0) -> 交换第1维和第0维的数据

  更多详细资料参看:dimshuffle

下面用到的图片3wolfmoon

下面对输入是3 幅RGB 特征图,进行卷积操作,并输出卷积前后的对比图:

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Tue Apr 28 10:22:14 2015
 4 
 5 @author: ZengJiulin
 6 """
 7 
 8 import theano
 9 from theano import tensor as T
10 from theano.tensor.nnet import conv
11 import pylab
12 from PIL import Image
13 import numpy
14 
15 rng = numpy.random.RandomState(23455)
16 
17 # instantiate 4D tensor for input
18 input = T.tensor4(name='input',dtype='float64')
19 
20 # initialize shared variable for weights.
21 # 输出的特征图 2 幅
22 # 输入的特征图 3 幅
23 # 滤波器的大小 9*9
24 w_shp = (2, 3, 9, 9)
25 w_bound = numpy.sqrt(3 * 9 * 9)
26 W = theano.shared( numpy.asarray(
27             rng.uniform(
28                 low=-1.0 / w_bound,
29                 high=1.0 / w_bound,
30                 size=w_shp),
31             dtype=input.dtype), name ='W')
32 
33 # initialize shared variable for bias (1D tensor) with random values
34 # IMPORTANT: biases are usually initialized to zero. However in this
35 # particular application, we simply apply the convolutional layer to
36 # an image without learning the parameters. We therefore initialize
37 # them to random values to "simulate" learning.
38 # 输出的特征图有 2 幅,所以偏置向量的元素个数同样为 2
39 b_shp = (2,)
40 b = theano.shared(numpy.asarray(
41             rng.uniform(low=-.5, high=.5, size=b_shp),
42             dtype=input.dtype), name ='b')
43 
44 # build symbolic expression that computes the convolution of input with filters in w
45 conv_out = conv.conv2d(input, W)
46 
47 # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output
48 # A few words on ``dimshuffle`` :
49 #   ``dimshuffle`` is a powerful tool in reshaping a tensor;
50 #   what it allows you to do is to shuffle dimension around
51 #   but also to insert new ones along which the tensor will be
52 #   broadcastable;
53 #   dimshuffle('x', 2, 'x', 0, 1)
54 #   This will work on 3d tensors with no broadcastable
55 #   dimensions. The first dimension will be broadcastable,
56 #   then we will have the third dimension of the input tensor as
57 #   the second of the resulting tensor, etc. If the tensor has
58 #   shape (20, 30, 40), the resulting tensor will have dimensions
59 #   (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor)
60 #   More examples:
61 #    dimshuffle('x') -> make a 0d (scalar) into a 1d vector
62 #    dimshuffle(0, 1) -> identity
63 #    dimshuffle(1, 0) -> inverts the first and second dimensions
64 #    dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN)
65 #    dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1)
66 #    dimshuffle(2, 0, 1) -> AxBxC to CxAxB
67 #    dimshuffle(0, 'x', 1) -> AxB to Ax1xB
68 #    dimshuffle(1, 'x', 0) -> AxB to Bx1xA
69 
70 # 卷积后的结果加上偏置,然后进行一个非线性函数计算,这里采用的是sigmoid函数
71 output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x'))
72 
73 # create theano function to compute filtered images
74 f = theano.function([input], output)
75 
76 
77 
78 # open random image of dimensions 639x516
79 img_file = open('E:\\Python\\3wolfmoon.jpg','rb')
80 img = Image.open(img_file)
81 # dimensions are (height, width, channel)
82 img = numpy.asarray(img, dtype='float64') / 256.
83 
84 # put image in 4D tensor of shape (1, 3, height, width)
85 cc = img.transpose(2, 0, 1)
86 img_ = img.transpose(2, 0, 1).reshape(1, 3, 639, 516)
87 filtered_img = f(img_)
88 
89 # plot original image and first and second components of output
90 pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img)
91 pylab.gray();
92 # recall that the convOp output (filtered image) is actually a "minibatch",
93 # of size 1 here, so we take index 0 in the first dimension:
94 pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :])
95 pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :])
96 pylab.show()

注意到,随机初始化的滤波器非常像一个边缘检测器。

6.最大池化(MaxPooling)

最大池化是一种下采样的形式,最大池化额操作就是把图像分割成不重叠的矩形区域,每一个子区域选出一个最大值。

最大池化的两个作用:

  • 去除了非最大值,减少了后面一层的计算量
  • (这里还没怎么看懂,后面是原讲义的说法)It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

最大池化在theano中是通过theano.tensor.signal.downsample.max_pool_2d实现的,例如:

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Tue Apr 28 15:17:23 2015
 4 
 5 @author: ZengJiulin
 6 """
 7 import theano
 8 from theano import tensor as T
 9 import numpy
10 from theano.tensor.signal import downsample
11 
12 input = T.dtensor4('input')
13 maxpool_shape = (2, 2)
14 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=True)
15 f = theano.function([input],pool_out)
16 
17 invals = numpy.random.RandomState(1).rand(3, 2, 5, 5)
18 print 'With ignore_border set to True:'
19 print 'invals[0, 0, :, :] =\n', invals[0, 0, :, :]
20 print 'output[0, 0, :, :] =\n', f(invals)[0, 0, :, :]
21 
22 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=False)
23 f = theano.function([input],pool_out)
24 print 'With ignore_border set to False:'
25 print 'invals[1, 0, :, :] =\n ', invals[1, 0, :, :]
26 print 'output[1, 0, :, :] =\n ', f(invals)[1, 0, :, :]

 

注意忽略边界和不忽略边界的区别:

>>> runfile('E:/Python/downsample.py', wdir=r'E:/Python')
Using gpu device 0: GeForce GT 720M
With ignore_border set to True:
invals[0, 0, :, :] =
[[  4.17022005e-01   7.20324493e-01   1.14374817e-04   3.02332573e-01
    1.46755891e-01]
 [  9.23385948e-02   1.86260211e-01   3.45560727e-01   3.96767474e-01
    5.38816734e-01]
 [  4.19194514e-01   6.85219500e-01   2.04452250e-01   8.78117436e-01
    2.73875932e-02]
 [  6.70467510e-01   4.17304802e-01   5.58689828e-01   1.40386939e-01
    1.98101489e-01]
 [  8.00744569e-01   9.68261576e-01   3.13424178e-01   6.92322616e-01
    8.76389152e-01]]
output[0, 0, :, :] =
[[ 0.72032449  0.39676747]
 [ 0.6852195   0.87811744]]
With ignore_border set to False:
invals[1, 0, :, :] =
  [[ 0.01936696  0.67883553  0.21162812  0.26554666  0.49157316]
 [ 0.05336255  0.57411761  0.14672857  0.58930554  0.69975836]
 [ 0.10233443  0.41405599  0.69440016  0.41417927  0.04995346]
 [ 0.53589641  0.66379465  0.51488911  0.94459476  0.58655504]
 [ 0.90340192  0.1374747   0.13927635  0.80739129  0.39767684]]
output[1, 0, :, :] =
  [[ 0.67883553  0.58930554  0.69975836]
 [ 0.66379465  0.94459476  0.58655504]
 [ 0.90340192  0.80739129  0.39767684]]
>>> 

 

7.LeNet整个模型

稀疏,卷积层和最大池化是 LeNet 模型的核心,但是具体的其他细节可能变化很大。下图给出LeNet的一个描述:

底层由卷积层和下采样层交替,顶层与传统的 MLP 全连接。

从整个执行过程看,就是把一个4阶的张量整理成MLP能够处理的2维特征图。

8.全部代码

  1 # -*- coding: utf-8 -*-
  2 """
  3 Created on Sat Apr 25 14:20:02 2015
  4 
  5 @author: ZengJiulin
  6 """
  7 
  8 """This tutorial introduces the LeNet5 neural network architecture
  9 using Theano.  LeNet5 is a convolutional neural network, good for
 10 classifying images. This tutorial shows how to build the architecture,
 11 and comes with all the hyper-parameters you need to reproduce the
 12 paper's MNIST results.
 13 
 14 
 15 This implementation simplifies the model in the following ways:
 16 
 17  - LeNetConvPool doesn't implement location-specific gain and bias parameters
 18  - LeNetConvPool doesn't implement pooling by average, it implements pooling
 19    by max.
 20  - Digit classification is implemented with a logistic regression rather than
 21    an RBF network
 22  - LeNet5 was not fully-connected convolutions at second layer
 23 
 24 References:
 25  - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner:
 26    Gradient-Based Learning Applied to Document
 27    Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998.
 28    http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
 29 
 30 """
 31 import os
 32 import sys
 33 import time
 34 
 35 import numpy
 36 
 37 import theano
 38 import theano.tensor as T
 39 from theano.tensor.signal import downsample
 40 from theano.tensor.nnet import conv
 41 
 42 from logistic_sgd import LogisticRegression, load_data
 43 from mlp import HiddenLayer
 44 
 45 
 46 class LeNetConvPoolLayer(object):
 47     """Pool Layer of a convolutional network """
 48 
 49     def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
 50         """
 51         Allocate a LeNetConvPoolLayer with shared variable internal parameters.
 52 
 53         :type rng: numpy.random.RandomState
 54         :param rng: a random number generator used to initialize weights
 55 
 56         :type input: theano.tensor.dtensor4
 57         :param input: symbolic image tensor, of shape image_shape
 58 
 59         :type filter_shape: tuple or list of length 4
 60         :param filter_shape: (number of filters, num input feature maps,
 61                               filter height, filter width)
 62 
 63         :type image_shape: tuple or list of length 4
 64         :param image_shape: (batch size, num input feature maps,
 65                              image height, image width)
 66 
 67         :type poolsize: tuple or list of length 2
 68         :param poolsize: the downsampling (pooling) factor (#rows, #cols)
 69         """
 70 
 71         assert image_shape[1] == filter_shape[1]
 72         self.input = input
 73 
 74         # there are "num input feature maps * filter height * filter width"
 75         # inputs to each hidden unit
 76         
 77         fan_in = numpy.prod(filter_shape[1:])
 78         # each unit in the lower layer receives a gradient from:
 79         # "num output feature maps * filter height * filter width" /
 80         #   pooling size
 81         fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /
 82                    numpy.prod(poolsize))
 83         # initialize weights with random weights
 84         W_bound = numpy.sqrt(6. / (fan_in + fan_out))
 85         #卷积核本质上就是下面这个权重矩阵
 86         self.W = theano.shared(
 87             numpy.asarray(
 88                 rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
 89                 dtype=theano.config.floatX
 90             ),
 91             borrow=True
 92         )
 93 
 94         # the bias is a 1D tensor -- one bias per output feature map
 95         b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
 96         self.b = theano.shared(value=b_values, borrow=True)
 97 
 98         # convolve input feature maps with filters
 99         conv_out = conv.conv2d(
100             input=input,
101             filters=self.W,
102             filter_shape=filter_shape,
103             image_shape=image_shape
104         )
105 
106         # downsample each feature map individually, using maxpooling
107         pooled_out = downsample.max_pool_2d(
108             input=conv_out,
109             ds=poolsize,
110             ignore_border=True
111         )
112 
113         # add the bias term. Since the bias is a vector (1D array), we first
114         # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will
115         # thus be broadcasted across mini-batches and feature map
116         # width & height
117         self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
118 
119         # store parameters of this layer
120         self.params = [self.W, self.b]
121 
122 
123 def evaluate_lenet5(learning_rate=0.1, n_epochs=200,
124                     dataset='mnist.pkl.gz',
125                     nkerns=[20, 50], batch_size=500):
126     """ Demonstrates lenet on MNIST dataset
127 
128     :type learning_rate: float
129     :param learning_rate: learning rate used (factor for the stochastic
130                           gradient)
131 
132     :type n_epochs: int
133     :param n_epochs: maximal number of epochs to run the optimizer
134 
135     :type dataset: string
136     :param dataset: path to the dataset used for training /testing (MNIST here)
137 
138     :type nkerns: list of ints
139     :param nkerns: number of kernels on each layer(两层,第一层20个卷积核,
140         第二层50个卷积核)
141     """
142 
143     rng = numpy.random.RandomState(23455)
144 
145     datasets = load_data(dataset)
146 
147     train_set_x, train_set_y = datasets[0]
148     valid_set_x, valid_set_y = datasets[1]
149     test_set_x, test_set_y = datasets[2]
150 
151     # compute number of minibatches for training, validation and testing
152     n_train_batches = train_set_x.get_value(borrow=True).shape[0]
153     n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
154     n_test_batches = test_set_x.get_value(borrow=True).shape[0]
155     n_train_batches /= batch_size
156     n_valid_batches /= batch_size
157     n_test_batches /= batch_size
158 
159     # allocate symbolic variables for the data
160     index = T.lscalar()  # index to a [mini]batch
161 
162     # start-snippet-1
163     x = T.matrix('x')   # the data is presented as rasterized images
164     y = T.ivector('y')  # the labels are presented as 1D vector of
165                         # [int] labels
166 
167     ######################
168     # BUILD ACTUAL MODEL #
169     ######################
170     print '... building the model'
171 
172     # Reshape matrix of rasterized images of shape (batch_size, 28 * 28)
173     # to a 4D tensor, compatible with our LeNetConvPoolLayer
174     # (28, 28) is the size of MNIST images.
175     # 输入一幅图像
176     layer0_input = x.reshape((batch_size, 1, 28, 28))
177 
178     # Construct the first convolutional pooling layer:
179     # filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24)
180     # maxpooling reduces this further to (24/2, 24/2) = (12, 12)
181     # 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12)
182     layer0 = LeNetConvPoolLayer(
183         rng,
184         input=layer0_input,
185         image_shape=(batch_size, 1, 28, 28),
186         filter_shape=(nkerns[0], 1, 5, 5),
187         poolsize=(2, 2)
188     )
189 
190     # Construct the second convolutional pooling layer
191     # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8)
192     # maxpooling reduces this further to (8/2, 8/2) = (4, 4)
193     # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4)
194     # 由于第0层有nkerns[0]个卷积核,所以输出了nkerns[0]幅特征图
195     # 第1层的输入就是第0层的输出
196     layer1 = LeNetConvPoolLayer(
197         rng,
198         input=layer0.output,
199         image_shape=(batch_size, nkerns[0], 12, 12),
200         filter_shape=(nkerns[1], nkerns[0], 5, 5),
201         poolsize=(2, 2)
202     )
203 
204     # the HiddenLayer being fully-connected, it operates on 2D matrices of
205     # shape (batch_size, num_pixels) (i.e matrix of rasterized images).
206     # This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4),
207     # or (500, 50 * 4 * 4) = (500, 800) with the default values.
208     layer2_input = layer1.output.flatten(2)
209 
210     # construct a fully-connected sigmoidal layer
211     layer2 = HiddenLayer(
212         rng,
213         input=layer2_input,
214         n_in=nkerns[1] * 4 * 4,
215         n_out=500,
216         activation=T.tanh
217     )
218 
219     # classify the values of the fully-connected sigmoidal layer
220     layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)
221 
222     # the cost we minimize during training is the NLL of the model
223     cost = layer3.negative_log_likelihood(y)
224 
225     # create a function to compute the mistakes that are made by the model
226     test_model = theano.function(
227         [index],
228         layer3.errors(y),
229         givens={
230             x: test_set_x[index * batch_size: (index + 1) * batch_size],
231             y: test_set_y[index * batch_size: (index + 1) * batch_size]
232         }
233     )
234 
235     validate_model = theano.function(
236         [index],
237         layer3.errors(y),
238         givens={
239             x: valid_set_x[index * batch_size: (index + 1) * batch_size],
240             y: valid_set_y[index * batch_size: (index + 1) * batch_size]
241         }
242     )
243 
244     # create a list of all model parameters to be fit by gradient descent
245     params = layer3.params + layer2.params + layer1.params + layer0.params
246 
247     # create a list of gradients for all model parameters
248     grads = T.grad(cost, params)
249 
250     # train_model is a function that updates the model parameters by
251     # SGD Since this model has many parameters, it would be tedious to
252     # manually create an update rule for each model parameter. We thus
253     # create the updates list by automatically looping over all
254     # (params[i], grads[i]) pairs.
255     updates = [
256         (param_i, param_i - learning_rate * grad_i)
257         for param_i, grad_i in zip(params, grads)
258     ]
259 
260     train_model = theano.function(
261         [index],
262         cost,
263         updates=updates,
264         givens={
265             x: train_set_x[index * batch_size: (index + 1) * batch_size],
266             y: train_set_y[index * batch_size: (index + 1) * batch_size]
267         }
268     )
269     # end-snippet-1
270 
271     ###############
272     # TRAIN MODEL #
273     ###############
274     print '... training'
275     # early-stopping parameters
276     patience = 10000  # look as this many examples regardless
277     patience_increase = 2  # wait this much longer when a new best is
278                            # found
279     improvement_threshold = 0.995  # a relative improvement of this much is
280                                    # considered significant
281     validation_frequency = min(n_train_batches, patience / 2)
282                                   # go through this many
283                                   # minibatche before checking the network
284                                   # on the validation set; in this case we
285                                   # check every epoch
286 
287     best_validation_loss = numpy.inf
288     best_iter = 0
289     test_score = 0.
290     start_time = time.clock()
291 
292     epoch = 0
293     done_looping = False
294 
295     while (epoch < n_epochs) and (not done_looping):
296         epoch = epoch + 1
297         for minibatch_index in xrange(n_train_batches):
298 
299             iter = (epoch - 1) * n_train_batches + minibatch_index
300 
301             if iter % 100 == 0:
302                 print 'training @ iter = ', iter
303             cost_ij = train_model(minibatch_index)
304 
305             if (iter + 1) % validation_frequency == 0:
306 
307                 # compute zero-one loss on validation set
308                 validation_losses = [validate_model(i) for i
309                                      in xrange(n_valid_batches)]
310                 this_validation_loss = numpy.mean(validation_losses)
311                 print('epoch %i, minibatch %i/%i, validation error %f %%' %
312                       (epoch, minibatch_index + 1, n_train_batches,
313                        this_validation_loss * 100.))
314 
315                 # if we got the best validation score until now
316                 if this_validation_loss < best_validation_loss:
317 
318                     #improve patience if loss improvement is good enough
319                     if this_validation_loss < best_validation_loss *  \
320                        improvement_threshold:
321                         patience = max(patience, iter * patience_increase)
322 
323                     # save best validation score and iteration number
324                     best_validation_loss = this_validation_loss
325                     best_iter = iter
326 
327                     # test it on the test set
328                     test_losses = [
329                         test_model(i)
330                         for i in xrange(n_test_batches)
331                     ]
332                     test_score = numpy.mean(test_losses)
333                     print(('     epoch %i, minibatch %i/%i, test error of '
334                            'best model %f %%') %
335                           (epoch, minibatch_index + 1, n_train_batches,
336                            test_score * 100.))
337 
338             if patience <= iter:
339                 done_looping = True
340                 break
341 
342     end_time = time.clock()
343     print('Optimization complete.')
344     print('Best validation score of %f %% obtained at iteration %i, '
345           'with test performance %f %%' %
346           (best_validation_loss * 100., best_iter + 1, test_score * 100.))
347     print >> sys.stderr, ('The code for file ' +
348                           os.path.split(__file__)[1] +
349                           ' ran for %.2fm' % ((end_time - start_time) / 60.))
350 
351 if __name__ == '__main__':
352     evaluate_lenet5()
353 
354 
355 def experiment(state, channel):
356     evaluate_lenet5(state.learning_rate, dataset=state.dataset)
View Code

 

在GeForce GT 720M GPU上运行170多分钟

9.训练技巧

  • 滤波器的数量:计算一个卷积滤波器要比训练传统的MLPs花费更多的时间!由于特征图的尺寸随着深度不断减小,所以在靠近输出层的时候,滤波器(卷积核)的数量通常比较少。为了保留输入层的信息,激活单元的数量在层数增加的时候要保证不能减少。
  • 滤波器尺寸:滤波器尺寸通常依赖于数据集。在Minist数据集上最好的尺寸是5*5,通常的自然图像较好的是12*12或者15*15
  • 池化尺寸:典型的值就是2*2,对于很大的输入,可以在较低的层上使用4*4,但是记住,这将会使得信号的维度降低为原来的1/16,可能会损失太多的信息

学习资料来源:http://deeplearning.net/tutorial/lenet.html#lenet

posted @ 2015-04-28 01:25  90Zeng  阅读(4251)  评论(0编辑  收藏  举报