卷积神经网络 Convolutional Neural Networks (LeNet)
conv2d是theano中的用于计算卷积的方法(theano.tensor.conv2($input$, $W$)),其中$W$表示卷积核。$W$是必须是一个4D的tensor(T.tensor4),$input$也必须是一个4D的tensor。
$input \in (batches, feature, I_h, I_w)$分别表示batch size,number of feature map, image height ,image width
$W \in (filters, feature, f_h, f_w)$ 分别表示number of filters, number of feature map, filter height, filter width
其中$W_{shape[1]}$必须等于$input_{shape[1]}$。$W_{shape[1]} = 1$表示这个filter是在2D空间中的filter,$W_{shape[1]} > 1$表示这个filter是3D中间中的filter,如果$W_{shape[1]} = 3$这是这个filter是图像3通道上的filter,3个通道上进行卷积。
\begin{equation} input \in (batches, feature, I_h, I_w) \\ W \in (filters, feature, f_h, f_w) \end{equation}
\begin{equation} output = input \otimes W \\ output \in (batches, filters , I_h - f_h + 1, I_w - f_w + 1) \end{equation}
Pooling是在二维空间中操作的,如上图所示,将特征按照空间位置分成大的block,然后再每个block中计算特征。$max pooling$就是在这个block中计算所有位置的最大值作为特征,$average pooling$为计算区域内的特征均值
1.如果不使用pooling,那么通过卷积计算得到的隐层节点的个数是卷积类型的倍数。举个例子:如上面的$input$,和$W$,$input$中每个patch的输入节点个数为$feature \times I_h \times I_w$,通过$W$的卷积运算后,$output$的节点数目为$filters \times (I_h - f_h + 1) \times (I_w - f_w + 1)$,如果引入pooling策略,$output$的节点数目就变为$filters \times \frac{I_h - f_h + 1}{p_h} \times \frac{I_w - f_w + 1}{p_w}$其中$p_h, p_w$表示pooling中每个区域的大小。从而减少了隐含层节点的个数,降低了计算复杂度。
在theano中,用于计算pooling的函数为$\text{theano.tensor.signal.downsample.max_pool_2d}$。对一个$N(N \geq 2)$维的输入矩阵,通过定义$p_h, p_w$然后对输入数据进行pooling
在Deep Learning tutorial的Convolutional Neural Network(LeNet)中,改例子用于MNIST数据集的字符识别(10个类别,识别阿拉伯数字),每个字符为$28\times28$的像素的输入,50000个样本用于训练,10000个样本用于交叉验证,另外10000个用于测试。可以在这里下载MNIST,另外,模型采用基于mini-batch的SGD进行优化。
输入层:每个mini-batch的原始图像$image shape = (batch size, 1, 28, 28)$
layer0_input = x.reshape((batch_size, 1, 28, 28))
卷积层1:对于输入的每个mini-batch的数据,output为卷积+pooling处理后的结果,第一层卷积类型为$nkerns[0]=20$个,卷积核的尺度为$f_h = 5, f_w = 5$
通过卷积,$filtershape=(nkerns[0],1,5,5)$,图像的尺度变化$(I_h -f_h + 1, I_w - f_w +1) \to (28, 28) ---> (24,24)$
通过pooling后$(24, 24) ---> (24/2,24/2)$
feature map的维度变为卷积类型数,所有$outputshape=(batch size, nkerns[0], 12, 12)$
layer0 = LeNetConvPoolLayer(rng, input=layer0_input, image_shape=(batch_size,1,28,28), filter_shape=(nkerns[0], 1, 5, 5), poolsize=(2,2))
卷积层2:输入为卷积层1的输出,所以$inputsize=(batch size, nkerns[0], 12, 12)$
通过卷积,$filtershape=(nkerns[1],nkerns[0],5,5)$,图像的尺度变化$(I_h -f_h + 1, I_w - f_w +1) \to (12, 12) ---> (8, 8)$
通过pooling后$(8, 8) ---> (8/2, 8/2)$
feature map的维度变为卷积类型数,所有$outputshape=(batch size, nkerns[1], 4, 4)$
layer1 = LeNetConvPoolLayer(rng, input=layer0.output, image_shape=(batch_size, nkerns[0], 12, 12), filter_shape=(nkerns[1], nkerns[0], 5,5), poolsize=(2,2))
Layer2_input = layer1.output.flatten(2) # construct a fully-connected sigmoidal layer layer2 = HiddenLayer(rng, input=Layer2_input, n_in=nkerns[1]*4*4, n_out=500, activation=T.tanh)
# classify the values of the fully-connected sigmoidal layer layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)
(1) import部分
import sys import time import theano import theano.tensor as T import numpy as np from theano.tensor.nnet import conv from theano.tensor.signal import downsample from LogistRegression import LogisticRegression, load_data from mlp import HiddenLayer
(2) LeNetConvPoolLayer的定义部分
fan_in = np.prod(filter_shape[1:]) W_values = np.asarray(rng.uniform( low=-np.sqrt(3./fan_in), high=np.sqrt(3./fan_in), size=filter_shape), dtype=theano.config.floatX) self.W = theano.shared(value=W_values, name='W')
fan_in = np.prod(filter_shape[1:]) # each unit in the lower layer receives a gradient from: # "num output feature maps * filter height * filter width" / # pooling size fan_out = (filter_shape[0] * np.prod(filter_shape[2:])/np.prod(poolsize)) # initialize weights with random weights W_bound = np.sqrt(6. / (fan_in + fan_out)) W_values = np.asarray(rng.uniform( low=-W_bound, high=W_bound, size=filter_shape), dtype=theano.config.floatX)
(3) LeNet网络结构定义
n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size
(4) Mini-batch SGD优化
{|\mathcal{D}|}\sum_{i=0}^{|\mathcal{D}|} \log{P(Y=y^{(i)}|x^{(i)}, W, B)} \\
\ell (\theta=\{W,b\},\mathcal{D}) = - \frac{1}{|\mathcal{D}|}\mathcal{L}
# the cost we minimize during training is the NLL of the model cost = layer3.negative_log_likelihood(y)
# create a function to compute the msitaken that are made by the model test_model = theano.function([index], layer3.errors(y), givens={ x:test_set_x[index*batch_size:(index+1)*batch_size], y:test_set_y[index*batch_size:(index+1)*batch_size]}) validate_model = theano.function([index], layer3.errors(y), givens={ x:valid_set_x[index*batch_size:(index+1)*batch_size], y:valid_set_y[index*batch_size:(index+1)*batch_size]})
# create a list of all model parameters to be fit by gradient descent params = layer3.params + layer2.params + layer1.params + layer0.params # create a list of gradients for all model parameters grads = T.grad(cost, params)
updates = [] for param_i, grad_i in zip(params, grads): updates.append((param_i, param_i - learning_rate * grad_i)) train_model = theano.function(inputs=[index], outputs=cost, updates=updates, givens={ x:train_set_x[index*batch_size:(index + 1)*batch_size], y:train_set_y[index*batch_size:(index + 1)*batch_size]})
def evaluate_lenet5(learning_rate=0.1, n_epochs=200,dataset = './data/mnist.pkl.gz', nkerns=[20, 50], batch_size=500): """ Demostartes lenet on MNIST dataset :type learning_rate: float :param learning_rate: learning rate used(factor for the stochastic gradient) :type n_epochs: int :param n_epochs: maximal number of epochs to run the optimizer :type dataset: string :param dataset: path to the dataset used for training / testing :type nkerns: list of ints :param nkerns: number of kernels on each LeNetConvPoolLayer :type batch_size : int :param batch_size : size of data in each batch """ #used for LeNetConvPoolLayer to random the filter weights rng = np.random.RandomState(23455) datasets = load_data(dataset) print >> sys.stdout, '...load data is ok' # get train_set vaild_set and test set train_set_x, train_set_y = datasets[0] valid_set_x, valid_set_y = datasets[1] test_set_x, test_set_y = datasets[2] # calculate there are how many batches n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size #print "n_train_batches = %d n_valid_batches = %d n_test_batches = %d" %(train_set_x.get_value(borrow=True).shape[0], # valid_set_x.get_value(borrow=True).shape[0],test_set_x.get_value(borrow=True).shape[0]) ###################### # BUILD ACTUAL MODEL # ###################### print '...building the model' index = T.lscalar() # index to [mini]batches x = T.matrix('x') # images y = T.ivector('y') # the labels ishape = (28, 28) # the size of MNIST images # Reshape matrix of images of shape(batches, 28 * 28) # to a 4D tensor, compatible with our LeNetConvPoolLayer layer0_input = x.reshape((batch_size, 1, 28, 28)) # Construct the first convolutional pooling layer: # filtering reduce the image size to (I_h - f_h + 1, I_w - f_w + 1) # this problem is (28, 28)---->(28-5+1, 28-5+1)=(24,24) # maxpooling reduces this futher to (24/2, 24/2)= (12, 12) # so the 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12) layer0 = LeNetConvPoolLayer(rng, input=layer0_input, image_shape=(batch_size,1,28,28), filter_shape=(nkerns[0], 1, 5, 5), poolsize=(2,2)) # Construct the first convolutional pooling layer: # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8) # max pooling reduces this futert to (8/2, 8/2)=(4,4) # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4) layer1 = LeNetConvPoolLayer(rng, input=layer0.output, image_shape=(batch_size, nkerns[0], 12, 12), filter_shape=(nkerns[1], nkerns[0], 5,5), poolsize=(2,2)) # the TanhLayer being full-connected,it operates on 2D matrices of # the shape (batches, num_pixels) (i.e matrix of rasterized images) # This will generate a matrix of (batches, nkerns[1]*4*4) Layer2_input = layer1.output.flatten(2) # construct a fully-connected sigmoidal layer layer2 = HiddenLayer(rng, input=Layer2_input, n_in=nkerns[1]*4*4, n_out=500, activation=T.tanh) # classify the values of the fully-connected sigmoidal layer layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10) # the cost we minimize during training is the NLL of the model cost = layer3.negative_log_likelihood(y) # create a function to compute the msitaken that are made by the model test_model = theano.function([index], layer3.errors(y), givens={ x:test_set_x[index*batch_size:(index+1)*batch_size], y:test_set_y[index*batch_size:(index+1)*batch_size]}) validate_model = theano.function([index], layer3.errors(y), givens={ x:valid_set_x[index*batch_size:(index+1)*batch_size], y:valid_set_y[index*batch_size:(index+1)*batch_size]}) # create a list of all model parameters to be fit by gradient descent params = layer3.params + layer2.params + layer1.params + layer0.params # create a list of gradients for all model parameters grads = T.grad(cost, params) # train_model is a function that updates the model parameters by # SGD Since this model has many parameters, it would be tedious # manually create an update rule for each model paramter. We thus # crate updates list by automatically looping over all # (params[i].grad[i]) pairs updates = [] for param_i, grad_i in zip(params, grads): updates.append((param_i, param_i - learning_rate * grad_i)) train_model = theano.function(inputs=[index], outputs=cost, updates=updates, givens={ x:train_set_x[index*batch_size:(index + 1)*batch_size], y:train_set_y[index*batch_size:(index + 1)*batch_size]}) ############### # TRAIN MODEL # ############### print '... training' # early-stoping parameters patience = 10000 # look as this many examples regardless patience_increase = 2 # wait this much longer when a new best is found improvement_threshold = 0.995 # a relative improvement of this much is considered significant validation_frequency = min(n_train_batches, patience/2) best_params = None best_validation_loss = np.inf best_iter = 0 test_score = 0 start_time = time.clock() epoch = 0 done_looping = False while epoch < n_epochs and (not done_looping): epoch = epoch + 1 for minibatch_index in xrange(n_train_batches): minibatch_avg_cost = train_model(minibatch_index) iter = (epoch - 1) * n_train_batches + minibatch_index if ( iter + 1 ) % validation_frequency == 0: valication_losses = [validate_model(i) for i in xrange(n_valid_batches)] this_validation_loss = np.mean(valication_losses) print ('epoch %i, minibacth %i/%i, validation error %f %%' % \ (epoch, minibatch_index + 1 , n_train_batches, this_validation_loss * 100.)) if this_validation_loss < best_validation_loss: if this_validation_loss < best_validation_loss * improvement_threshold: patience = max(patience, iter * patience_increase) best_validation_loss = this_validation_loss # test it on the test set best_iter = iter test_losses = [test_model(i) for i in xrange(n_test_batches)] test_score = np.mean(test_losses) print ' patience %d epoch %i, minibatch %i/%i , test error of best model %f %%' %( patience, epoch, minibatch_index + 1, n_train_batches, test_score * 100.) if patience <= iter: done_looping = True break end_time = time.clock() print 'Optimization complete with best validation score of %f %% with the test performance %f %%' \ %(best_validation_loss * 100. , test_score * 100.) print 'The code run for %d epochs with %f epchos /sec' %(epoch, 1. * epoch / (end_time - start_time)) print >> sys.stderr, ('The code for file ' + os.path.split(__file__)[1] + ' ran for %.1fs' % ((end_time - start_time)))