理解caffe的fine-tuning

来源：知乎 https://www.zhihu.com/question/40850491

比如说，先设计出一个CNN结构。
然后用一个大的数据集A，训练该CNN网络，得到网络a。
可是在数据集B上，a网络预测效果并不理想（可能的原因是数据集A和B存在一些差异，比如数据来源不同导致的代表性差异）。如果直接用B的一部分训练的话，数据量太小，CNN不适用。

解决方法：
将数据集B分为train集和test，以a网络的参数为初始参数，以较小的学习率，以B的train集为训练数据，继续训练，得到网络b。

这样，b在B的test集中一般可实现较好的预测精度。

作者：王艺程
链接：https://www.zhihu.com/question/40850491/answer/88651844
来源：知乎
著作权归作者所有，转载请联系作者获得授权。

——————————————————————————————————————————————————————————————————————————————

就是把现成的模型略加修改然后再作少量training，主要用于样本数量不足的情形。

把已经训练过的模型应用到新的数据集上。主要优点是相比于从scratch训练能在更短时间内达到相同的效果。
例子：
1.fine-tuning: 先拿CNN在cifar100上训练，然后仅仅修改最后一层softmax的输出节点个数（100改为10），再放到cifar10上训练。
2. train from scratch: 相同结构的CNN直接用cifar10训练。
结果：
第一种情况可能只要1000次迭代就能到达60%accuracy，第二种情况需要4000次才能达到60%的accuracy.
caffe官网有fine-tuning的例子，解释更详细。

作者：钱飞鸿
链接：https://www.zhihu.com/question/40850491/answer/88763800
来源：知乎
著作权归作者所有，转载请联系作者获得授权。

———————————————————————————————————————————————————————————————————————————————

这就是迁移学习，大概意思就是将一个任务训练好的参数直接拿到另一个任务作为他的神经网络初始参数值，然后进行训练，这样比直接随机初始化的参数精度有提高。同时可以按照自己的需求设置某一些层的参数不变。

参考文献：文章 Fine-tuning Deep Convolutional Networks for Plant Recognition

Angie K. Reyes1 , Juan C. Caicedo2 , and Jorge E. Camargo1

1 Laboratory for Advanced Computational Science and Engineering Research, Universidad Antonio Nari˜no, Colombia

angreyes,jorgecamargo{@uan.edu.co},

2 Fundaci´on Universitaria Konrad Lorenz, Colombia

juanc.caicedor@konradlorenz.edu.co

主要内容是：用ImageNet的非常大量的数据集（1000多个分类）训练得到的模型基础上，fine-tune某个自定义的植物识别的数据集。

其中3.2节 Fine-tuning the CNN

We initialized the CNN to recognize 1,000 categories of generic objects that are part of the ImageNet hierarchy following the procedure described in the previous section. Then, we proceed to finetune the network for the Plant Identification task.

Fine-tuning a network is a procedure based on the concept of transfer learning [1, 3]. We start training a CNN to learn features for a broad domain with a classification function targeted at minimizing error in that domain. Then, we replace the classification function and optimize the network again to minimize error in another, more specific domain. Under this setting, we are transferring the features and the parameters of the network from the broad domain to the specific one.

The classification function in the original CNN is a softmax classifier that computes the probability of 1,000 classes of the ImageNet dataset. To start the fine-tuning procedure, we remove this softmax classifier and initialize a new one with random values. The new softmax classifier is trained from scratch using the back-propagation algorithm with data from the Plant Identification task, which also has 1,000 different categories.

In order to start the back-propagation algorithm for fine-tuning, it is key to set the learning rates of each layer appropriately. The classification layer, i.e., the new softmax classifier, needs a large learning rate because it has been just initialized with random values. The rest of the layers need a relatively small learning rate because we want to preserve the parameters of the previous network to transfer that knowledge into the new network. However, notice that the learning rate is not set to zero in the rest of the layers: they will be optimized again at a slower pace.

In our experiments we set the learning rate of the top classification layer to 10, while leaving the learning rate of all the other seven layers to 0.1. We run the back-propagation algorithm for 50,000 iterations, which optimizes the network parameters using stochastic gradient descent (SGD). Figure 3 shows how the precision of classifying single images improves with more iterations. Our implementation is based on the open source Deep Learning library Caffe [7], and we run the experiments using a NVIDIA Titan Z GPU (5,760 cores and 12 GB of RAM).

Fig. 3. Evolution of image classification accuracy in a validation set during the finetuning process. Accuracy improves quickly during the first iterations and stabilizes after 40,000 iterations.

主要内容为：

本文欲进行的植物分类的种类也是1000多种，但是分类会和ImageNet不同（ImageNet不仅仅有植物）。因此需要重新定义分类函数softmax的参数。目标是，使得根据scratch训练得到的new softmax的向后传播（back-propagation）的误差（error）最小。

启动向后传播算法的关键是，在每一层设置合适的学习率（learning rates）。最高层分类层，也就是新的softmax函数那一层，学习率要大一点，因为新的softmax函数是随机初始化的。而其余的层都要用相对小一点的学习率，因为我们想保留之前pre-trained的网络信息。但是注意，并不能为了保留之前信息就把其余层的参数设置为0，否则网络会优化得更慢。

因此，实验中分类层的学习率设置为10，其余层的学习率设置为0.1。迭代50000次，用随机梯度下降方法（stochastic gradient descent，SGD）来优化网络（也就是使向后传播误差最小）。从图中可以看到，第一次迭代准确率（Accuracy）就上升得特别快，大约在40000次迭代的时候趋于稳定。

posted @ 2016-08-03 22:31 小猪童鞋阅读(8263) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

小猪童鞋

理解caffe的fine-tuning

公告