简单的卷积神经网络(CNN)的搭建
卷积神经网络(Convolutional Neural Network, CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。与普通神经网络非常相似,它们都由具有可学习的权重和偏置常量(biases)的神经元组成。每个神经元都接收一些输入,并做一些点积计算,输出是每个分类的分数,普通神经网络里的一些计算技巧到这里依旧适用。
卷积神经网络通常包含以下几种层:
- 卷积层(Convolutional layer),卷积神经网路中每层卷积层由若干卷积单元组成,每个卷积单元的参数都是通过反向传播算法优化得到的。卷积运算的目的是提取输入的不同特征,第一层卷积层可能只能提取一些低级的特征如边缘、线条和角等层级,更多层的网络能从低级特征中迭代提取更复杂的特征。
- 线性整流层(Rectified Linear Units layer, ReLU layer),这一层神经的活性化函数(Activation function)使用线性整流(Rectified Linear Units, ReLU)f(x)=max(0,x)。
- 池化层(Pooling layer),通常在卷积层之后会得到维度很大的特征,将特征切成几个区域,取其最大值或平均值,得到新的、维度较小的特征。
- Drop out, 通常我们在训练Covnets时,会随机的丢弃一部分训练获得的参数,这样可以在一定程度上来防止过度拟合
- 全连接层( Fully-Connected layer), 把所有局部特征结合变成全局特征,用来计算最后每一类的得分。
下面是代码部分,今天我将使用Covnets去完成一件非常非常简单的图像分类任务。这里我们将对 CIFAR-10 数据集 中的图片进行分类。该数据集包含飞机、猫狗和其他物体。
首先,我们先获得数据集 (或者直接从 https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz )这里直接下载
1 from urllib.request import urlretrieve 2 from os.path import isfile, isdir 3 from tqdm import tqdm 4 import tarfile 5 6 cifar10_dataset_folder_path = 'cifar-10-batches-py' 7 8 class DLProgress(tqdm): 9 last_block = 0 10 11 def hook(self, block_num=1, block_size=1, total_size=None): 12 self.total = total_size 13 self.update((block_num - self.last_block) * block_size) 14 self.last_block = block_num 15 16 if not isfile(tar_gz_path): 17 with DLProgress(unit='B', unit_scale=True, miniters=1, desc='CIFAR-10 Dataset') as pbar: 18 urlretrieve( 19 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', 20 tar_gz_path, 21 pbar.hook) 22 23 if not isdir(cifar10_dataset_folder_path): 24 with tarfile.open(tar_gz_path) as tar: 25 tar.extractall() 26 tar.close()
在数据载入之后,我们需要对我们的图片预处理下,因为现在的像素点是0-255之间,我们需要把图片的像素点的值变成0-1之间,这样方便在后面的计算
1 def normalize(x): 2 """ 3 Normalize a list of sample image data in the range of 0 to 1 4 : x: List of image data. The image shape is (32, 32, 3) 5 : return: Numpy array of normalize data 6 """ 7 a = 0 8 b = 1 9 grayscale_min = 0 10 grayscale_max = 255 11 return a + (((x - grayscale_min) * (b - a))/(grayscale_max - grayscale_min))
因为CIFAR数据集里面有10类不同的图片,现在我们需要使用ONE-HOT的方法来给图片打上标签
1 def one_hot_encode(x): 2 """ 3 One hot encode a list of sample labels. Return a one-hot encoded vector for each label. 4 : x: List of sample Labels 5 : return: Numpy array of one-hot encoded labels 6 """ 7 d = {0:[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], 8 1:[0, 1, 0, 0, 0, 0, 0, 0, 0, 0], 9 2:[0, 0, 1, 0, 0, 0, 0, 0, 0, 0], 10 3:[0, 0, 0, 1, 0, 0, 0, 0, 0, 0], 11 4:[0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 12 5:[0, 0, 0, 0, 0, 1, 0, 0, 0, 0], 13 6:[0, 0, 0, 0, 0, 0, 1, 0, 0, 0], 14 7:[0, 0, 0, 0, 0, 0, 0, 1, 0, 0], 15 8:[0, 0, 0, 0, 0, 0, 0, 0, 1, 0], 16 9:[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]} 17 18 map_list = [] 19 for item in x: 20 map_list.append(d[item]) 21 target = np.array(map_list) 22 23 return target
下面,我们就开始构建我们的Covnets了,首先,我们需要构建placeholder来储存我们的训练图片,训练数据的one-hot标签的编码以及我们dropout时候的概率值
1 import tensorflow as tf 2 3 def neural_net_image_input(image_shape): 4 """ 5 Return a Tensor for a batch of image input 6 : image_shape: Shape of the images 7 : return: Tensor for image input. 8 """ 9 x = tf.placeholder(tf.float32,[None, image_shape[0], image_shape[1],image_shape[2]],'x') 10 return x 11 12 13 def neural_net_label_input(n_classes): 14 """ 15 Return a Tensor for a batch of label input 16 : n_classes: Number of classes 17 : return: Tensor for label input. 18 """ 19 y = tf.placeholder(tf.float32,[None, n_classes],'y') 20 return y 21 22 23 def neural_net_keep_prob_input(): 24 """ 25 Return a Tensor for keep probability 26 : return: Tensor for keep probability. 27 """ 28 keep_prob = tf.placeholder(tf.float32,None,'keep_prob') 29 return keep_prob
接着 我们来构建Covnets中最核心的 卷积层+最大池化层(这里我们用最大池化)
1 def conv2d_maxpool(x_tensor, conv_num_outputs, conv_ksize, conv_strides, pool_ksize, pool_strides): 2 """ 3 Apply convolution then max pooling to x_tensor 4 :param x_tensor: TensorFlow Tensor 5 :param conv_num_outputs: Number of outputs for the convolutional layer 6 :param conv_ksize: kernal size 2-D Tuple for the convolutional layer 7 :param conv_strides: Stride 2-D Tuple for convolution 8 :param pool_ksize: kernal size 2-D Tuple for pool 9 :param pool_strides: Stride 2-D Tuple for pool 10 : return: A tensor that represents convolution and max pooling of x_tensor 11 """ 12 ## Weights and Bias 13 weight = tf.Variable(tf.truncated_normal([conv_ksize[0],conv_ksize[1], 14 x_tensor.get_shape().as_list()[-1],conv_num_outputs],stddev=0.1)) 15 bias = tf.Variable(tf.zeros(conv_num_outputs)) 16 ## Apply Convolution 17 conv_layer = tf.nn.conv2d(x_tensor,weight,strides = [1,conv_strides[0],conv_strides[1],1], padding='SAME') 18 ## Add Bias 19 conv_layer = tf.nn.bias_add(conv_layer,bias) 20 ## Apply Relu 21 conv_layer = tf.nn.relu(conv_layer) 22 23 return tf.nn.max_pool(conv_layer, 24 ksize=[1,pool_ksize[0],pool_ksize[1],1], 25 strides=[1,pool_strides[0],pool_strides[1],1], 26 padding='SAME')
实现 flatten
层,将 x_tensor
的维度从四维张量(4-D tensor)变成二维张量。输出应该是形状(部分大小(Batch Size),扁平化图片大小(Flattened Image Size))
1 def flatten(x_tensor): 2 """ 3 Flatten x_tensor to (Batch Size, Flattened Image Size) 4 : x_tensor: A tensor of size (Batch Size, ...), where ... are the image dimensions. 5 : return: A tensor of size (Batch Size, Flattened Image Size). 6 """ 7 # Get the shape of tensor 8 shape = x_tensor.get_shape().as_list() 9 # Compute the dim for image 10 dim = np.prod(shape[1:]) 11 # reshape the tensor 12 13 return tf.reshape(x_tensor, [-1,dim])
在网络的最后一步,我们需要做一个全连接层 + 输出层,然后输出一个1*10的结果(10种结果的概率)
1 def fully_conn(x_tensor, num_outputs): 2 """ 3 Apply a fully connected layer to x_tensor using weight and bias 4 : x_tensor: A 2-D tensor where the first dimension is batch size. 5 : num_outputs: The number of output that the new tensor should be. 6 : return: A 2-D tensor where the second dimension is num_outputs. 7 """ 8 weight = tf.Variable(tf.truncated_normal([x_tensor.get_shape().as_list()[-1], num_outputs],stddev=0.1)) 9 bias = tf.Variable(tf.zeros([num_outputs])) 10 11 fc = tf.reshape(x_tensor,[-1, weight.get_shape().as_list()[0]]) 12 fc = tf.add(tf.matmul(fc,weight), bias) 13 fc = tf.nn.relu(fc) 14 15 return fc 16 17 def output(x_tensor, num_outputs): 18 """ 19 Apply a output layer to x_tensor using weight and bias 20 : x_tensor: A 2-D tensor where the first dimension is batch size. 21 : num_outputs: The number of output that the new tensor should be. 22 : return: A 2-D tensor where the second dimension is num_outputs. 23 """ 24 25 weight_out = tf.Variable(tf.truncated_normal([x_tensor.get_shape().as_list()[-1],num_outputs],stddev=0.1)) 26 bias_out = tf.Variable(tf.zeros([num_outputs])) 27 28 out = tf.reshape(x_tensor, [-1, weight_out.get_shape().as_list()[0]]) 29 out = tf.add(tf.matmul(out,weight_out),bias_out) 30 31 return out
在我们都完成基本的元素之后,我们这个时候来构建我们的网络
1 def conv_net(x, keep_prob): 2 """ 3 Create a convolutional neural network model 4 : x: Placeholder tensor that holds image data. 5 : keep_prob: Placeholder tensor that hold dropout keep probability. 6 : return: Tensor that represents logits 7 """ 8 9 conv1 = conv2d_maxpool(x, 32,(5,5),(2,2),(4,4),(2,2)) 10 11 conv2 = conv2d_maxpool(conv1, 128, (5,5),(2,2),(2,2),(2,2)) 12 13 conv3 = conv2d_maxpool(conv2, 256, (5,5),(2,2),(2,2),(2,2)) 14 15 16 # flatten(x_tensor) 17 18 flatten_layer = flatten(conv3) 19 20 # fully_conn(x_tensor, num_outputs) 21 22 fc = fully_conn(flatten_layer, 1024) 23 24 # Set this to the number of classes 25 # Function Definition from Above: 26 # output(x_tensor, num_outputs) 27 28 output_layer = output(fc, 10) 29 30 return output_layer 31 32 33 ############################## 34 ## Build the Neural Network ## 35 ############################## 36 37 # Remove previous weights, bias, inputs, etc.. 38 tf.reset_default_graph() 39 40 # Inputs 41 x = neural_net_image_input((32, 32, 3)) 42 y = neural_net_label_input(10) 43 keep_prob = neural_net_keep_prob_input() 44 45 # Model 46 logits = conv_net(x, keep_prob) 47 48 # Name logits Tensor, so that is can be loaded from disk after training 49 logits = tf.identity(logits, name='logits') 50 51 # Loss and Optimizer 52 cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y)) 53 optimizer = tf.train.AdamOptimizer().minimize(cost) 54 55 # Accuracy 56 correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1)) 57 accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
在网络构建完成后,我们可以开始把我们的数据喂进去,训练我们的模型了
这里我随便设置下Hyper-Paramter
1 epochs = 30 2 batch_size = 256 3 keep_probability = 0.5
还需要设置下,在训练的过程中,我们一直需要看到测试集的accuracy来观测我们训练的情况
1 def print_stats(session, feature_batch, label_batch, cost, accuracy): 2 """ 3 Print information about loss and validation accuracy 4 : session: Current TensorFlow session 5 : feature_batch: Batch of Numpy image data 6 : label_batch: Batch of Numpy label data 7 : cost: TensorFlow cost function 8 : accuracy: TensorFlow accuracy function 9 """ 10 loss = sess.run(cost, feed_dict = { 11 x:feature_batch, 12 y:label_batch, 13 keep_prob:1. 14 }) 15 16 valid_acc = sess.run(accuracy,feed_dict = { 17 x:valid_features, 18 y:valid_labels, 19 keep_prob:1. 20 }) 21 22 print('Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format( 23 loss, 24 valid_acc))
模型训练
1 save_model_path = './image_classification' 2 3 print('Training...') 4 with tf.Session() as sess: 5 # Initializing the variables 6 sess.run(tf.global_variables_initializer()) 7 8 # Training cycle 9 for epoch in range(epochs): 10 # Loop over all batches 11 n_batches = 5 12 for batch_i in range(1, n_batches + 1): 13 for batch_features, batch_labels in helper.load_preprocess_training_batch(batch_i, batch_size): 14 train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels) 15 print('Epoch {:>2}, CIFAR-10 Batch {}: '.format(epoch + 1, batch_i), end='') 16 print_stats(sess, batch_features, batch_labels, cost, accuracy) 17 18 # Save Model 19 saver = tf.train.Saver() 20 save_path = saver.save(sess, save_model_path)
贴上我在训练的最后的验证集的准确率
Epoch 29, CIFAR-10 Batch 4: Loss: 0.0139 Validation Accuracy: 0.625600 Epoch 29, CIFAR-10 Batch 5: Loss: 0.0090 Validation Accuracy: 0.631000 Epoch 30, CIFAR-10 Batch 1: Loss: 0.0138 Validation Accuracy: 0.638800 Epoch 30, CIFAR-10 Batch 2: Loss: 0.0192 Validation Accuracy: 0.627400 Epoch 30, CIFAR-10 Batch 3: Loss: 0.0055 Validation Accuracy: 0.633400 Epoch 30, CIFAR-10 Batch 4: Loss: 0.0114 Validation Accuracy: 0.641800 Epoch 30, CIFAR-10 Batch 5: Loss: 0.0050 Validation Accuracy: 0.647400
还不错,50%以上了,如果瞎猜 只有10%的
当然了,我们的模型的效率可以进一步提高,比如我们进一步去选择更合适的超参数,或者加入一些其他的技巧。
http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#43494641522d3130
这里有个链接,是大家利用这个数据集训练的结果,现在最高的已经96.53%了,看看大佬们是怎么做的吧。。。。