caffe训练自己的数据集

默认caffe已经编译好了，并且编译好了pycaffe

1 数据准备

首先准备训练和测试数据集，这里准备两类数据，分别放在文件夹0和文件夹1中（之所以使用0和1命名数据类别，是因为方便标注数据类别，直接用文件夹的名字即可）。即训练数据集：/data/train/0、/data/train/1 训练数据集：/data/val/0、/data/val/1。

数据准备好之后，创建记录数据文件和对应标签的txt文件

（1）创建训练数据集的train.txt

 1 import os
 2 f =open(r'train.txt',"w")
 3 path = os.getcwd()+'/data/train/'
 4 for filename in os.listdir(path) :
 5     count = 0
 6     for file in os.listdir(path+filename) :
 7         count = count + 1
 8         ff='/'+filename+"/"+file+" "+filename+"\n"
 9         f.write(ff)
10     print '{} class: {}'.format(filename,count)
11 f.close()

（2）创建测试数据集val.txt

 1 import os
 2 f =open(r'val.txt',"w")
 3 path = os.getcwd()+'/data/val/'
 4 for filename in os.listdir(path) :
 5     count = 0
 6     for file in os.listdir(path+filename) :
 7         count = count + 1
 8         ff='/'+filename+"/"+file+" "+filename+"\n"
 9         f.write(ff)
10     print '{} class: {}'.format(filename,count)
11 f.close()

注意，txt中文件的路径为： /类别文件夹名/文件名（空格，不能是制表符）类别

2 创建LMDB数据文件

创建createlmdb.sh使用caffe自带的（bulid/tools下的）convert_imageset创建LMDB数据文件，主要是注意数据文件以及上一步生成的txt文件的位置，注意数据文件的RESIZE，后边在进行训练和测试的时候还要用到，其余就是文件的路径的问题了。

 1 #!/usr/bin/env sh
 2 
 3 CAFFE_ROOT=/home/caf/object/caffe
 4 TOOLS=$CAFFE_ROOT/build/tools
 5 TRAIN_DATA_ROOT=/home/caf/wk/learn/data/train
 6 VAL_DATA_ROOT=/home/caf/wk/learn/data/val
 7 DATA=/home/caf/wk/learn/data
 8 EXAMPLE=/home/caf/wk/learn/data/lmdb
 9 # Set RESIZE=true to resize the images to 60 x 60. Leave as false if images have
10 # already been resized using another tool.
11 RESIZE=true
12 if $RESIZE; then
13   RESIZE_HEIGHT=227
14   RESIZE_WIDTH=227
15 else
16   RESIZE_HEIGHT=0
17   RESIZE_WIDTH=0
18 fi
19 
20 if [ ! -d "$TRAIN_DATA_ROOT" ]; then
21   echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
22   echo "Set the TRAIN_DATA_ROOT variable in create_face_48.sh to the path" \
23        "where the face_48 training data is stored."
24   exit 1
25 fi
26 
27 if [ ! -d "$VAL_DATA_ROOT" ]; then
28   echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
29   echo "Set the VAL_DATA_ROOT variable in create_face_48.sh to the path" \
30        "where the face_48 validation data is stored."
31   exit 1
32 fi
33 
34 echo "Creating train lmdb..."
35 
36 GLOG_logtostderr=1 $TOOLS/convert_imageset \
37     --resize_height=$RESIZE_HEIGHT \
38     --resize_width=$RESIZE_WIDTH \
39     --shuffle \
40     $TRAIN_DATA_ROOT \
41     $DATA/train.txt \
42     $EXAMPLE/face_train_lmdb
43 
44 echo "Creating val lmdb..."
45 
46 GLOG_logtostderr=1 $TOOLS/convert_imageset \
47     --resize_height=$RESIZE_HEIGHT \
48     --resize_width=$RESIZE_WIDTH \
49     --shuffle \
50     $VAL_DATA_ROOT \
51     $DATA/val.txt \
52     $EXAMPLE/face_val_lmdb
53 
54 echo "Done."

3 定义网络

caffe接受的网络模型是prototxt文件，对于caffe网络的定义语法有详细的解释，本次实验用的是AlexNet，保存在train_val.prototxt

  1 name: "AlexNet"
  2 layer {
  3   name: "data"
  4   type: "Data"
  5   top: "data"
  6   top: "label"
  7   include {
  8     phase: TRAIN
  9   }
 10   data_param {
 11     source: "/home/caf/wk/learn/data/lmdb/face_train_lmdb"
 12     batch_size: 256
 13     backend: LMDB
 14   }
 15 }
 16 layer {
 17   name: "data"
 18   type: "Data"
 19   top: "data"
 20   top: "label"
 21   include {
 22     phase: TEST
 23   }
 24   data_param {
 25     source: "/home/caf/wk/learn/data/lmdb/face_val_lmdb"
 26     batch_size: 50
 27     backend: LMDB
 28   }
 29 }
 30 layer {
 31   name: "conv1"
 32   type: "Convolution"
 33   bottom: "data"
 34   top: "conv1"
 35   param {
 36     lr_mult: 1
 37     decay_mult: 1
 38   }
 39   param {
 40     lr_mult: 2
 41     decay_mult: 0
 42   }
 43   convolution_param {
 44     num_output: 96
 45     kernel_size: 11
 46     stride: 4
 47     weight_filler {
 48       type: "gaussian"
 49       std: 0.01
 50     }
 51     bias_filler {
 52       type: "constant"
 53       value: 0
 54     }
 55   }
 56 }
 57 layer {
 58   name: "relu1"
 59   type: "ReLU"
 60   bottom: "conv1"
 61   top: "conv1"
 62 }
 63 layer {
 64   name: "norm1"
 65   type: "LRN"
 66   bottom: "conv1"
 67   top: "norm1"
 68   lrn_param {
 69     local_size: 5
 70     alpha: 0.0001
 71     beta: 0.75
 72   }
 73 }
 74 layer {
 75   name: "pool1"
 76   type: "Pooling"
 77   bottom: "norm1"
 78   top: "pool1"
 79   pooling_param {
 80     pool: MAX
 81     kernel_size: 3
 82     stride: 2
 83   }
 84 }
 85 layer {
 86   name: "conv2"
 87   type: "Convolution"
 88   bottom: "pool1"
 89   top: "conv2"
 90   param {
 91     lr_mult: 1
 92     decay_mult: 1
 93   }
 94   param {
 95     lr_mult: 2
 96     decay_mult: 0
 97   }
 98   convolution_param {
 99     num_output: 256
100     pad: 2
101     kernel_size: 5
102     group: 2
103     weight_filler {
104       type: "gaussian"
105       std: 0.01
106     }
107     bias_filler {
108       type: "constant"
109       value: 0.1
110     }
111   }
112 }
113 layer {
114   name: "relu2"
115   type: "ReLU"
116   bottom: "conv2"
117   top: "conv2"
118 }
119 layer {
120   name: "norm2"
121   type: "LRN"
122   bottom: "conv2"
123   top: "norm2"
124   lrn_param {
125     local_size: 5
126     alpha: 0.0001
127     beta: 0.75
128   }
129 }
130 layer {
131   name: "pool2"
132   type: "Pooling"
133   bottom: "norm2"
134   top: "pool2"
135   pooling_param {
136     pool: MAX
137     kernel_size: 3
138     stride: 2
139   }
140 }
141 layer {
142   name: "conv3"
143   type: "Convolution"
144   bottom: "pool2"
145   top: "conv3"
146   param {
147     lr_mult: 1
148     decay_mult: 1
149   }
150   param {
151     lr_mult: 2
152     decay_mult: 0
153   }
154   convolution_param {
155     num_output: 384
156     pad: 1
157     kernel_size: 3
158     weight_filler {
159       type: "gaussian"
160       std: 0.01
161     }
162     bias_filler {
163       type: "constant"
164       value: 0
165     }
166   }
167 }
168 layer {
169   name: "relu3"
170   type: "ReLU"
171   bottom: "conv3"
172   top: "conv3"
173 }
174 layer {
175   name: "conv4"
176   type: "Convolution"
177   bottom: "conv3"
178   top: "conv4"
179   param {
180     lr_mult: 1
181     decay_mult: 1
182   }
183   param {
184     lr_mult: 2
185     decay_mult: 0
186   }
187   convolution_param {
188     num_output: 384
189     pad: 1
190     kernel_size: 3
191     group: 2
192     weight_filler {
193       type: "gaussian"
194       std: 0.01
195     }
196     bias_filler {
197       type: "constant"
198       value: 0.1
199     }
200   }
201 }
202 layer {
203   name: "relu4"
204   type: "ReLU"
205   bottom: "conv4"
206   top: "conv4"
207 }
208 layer {
209   name: "conv5"
210   type: "Convolution"
211   bottom: "conv4"
212   top: "conv5"
213   param {
214     lr_mult: 1
215     decay_mult: 1
216   }
217   param {
218     lr_mult: 2
219     decay_mult: 0
220   }
221   convolution_param {
222     num_output: 256
223     pad: 1
224     kernel_size: 3
225     group: 2
226     weight_filler {
227       type: "gaussian"
228       std: 0.01
229     }
230     bias_filler {
231       type: "constant"
232       value: 0.1
233     }
234   }
235 }
236 layer {
237   name: "relu5"
238   type: "ReLU"
239   bottom: "conv5"
240   top: "conv5"
241 }
242 layer {
243   name: "pool5"
244   type: "Pooling"
245   bottom: "conv5"
246   top: "pool5"
247   pooling_param {
248     pool: MAX
249     kernel_size: 3
250     stride: 2
251   }
252 }
253 layer {
254   name: "fc6"
255   type: "InnerProduct"
256   bottom: "pool5"
257   top: "fc6"
258   param {
259     lr_mult: 1
260     decay_mult: 1
261   }
262   param {
263     lr_mult: 2
264     decay_mult: 0
265   }
266   inner_product_param {
267     num_output: 4096
268     weight_filler {
269       type: "gaussian"
270       std: 0.005
271     }
272     bias_filler {
273       type: "constant"
274       value: 0.1
275     }
276   }
277 }
278 layer {
279   name: "relu6"
280   type: "ReLU"
281   bottom: "fc6"
282   top: "fc6"
283 }
284 layer {
285   name: "drop6"
286   type: "Dropout"
287   bottom: "fc6"
288   top: "fc6"
289   dropout_param {
290     dropout_ratio: 0.5
291   }
292 }
293 layer {
294   name: "fc7"
295   type: "InnerProduct"
296   bottom: "fc6"
297   top: "fc7"
298   param {
299     lr_mult: 1
300     decay_mult: 1
301   }
302   param {
303     lr_mult: 2
304     decay_mult: 0
305   }
306   inner_product_param {
307     num_output: 4096
308     weight_filler {
309       type: "gaussian"
310       std: 0.005
311     }
312     bias_filler {
313       type: "constant"
314       value: 0.1
315     }
316   }
317 }
318 layer {
319   name: "relu7"
320   type: "ReLU"
321   bottom: "fc7"
322   top: "fc7"
323 }
324 layer {
325   name: "drop7"
326   type: "Dropout"
327   bottom: "fc7"
328   top: "fc7"
329   dropout_param {
330     dropout_ratio: 0.5
331   }
332 }
333 layer {
334   name: "fc8"
335   type: "InnerProduct"
336   bottom: "fc7"
337   top: "fc8"
338   param {
339     lr_mult: 1
340     decay_mult: 1
341   }
342   param {
343     lr_mult: 2
344     decay_mult: 0
345   }
346   inner_product_param {
347     num_output: 2
348     weight_filler {
349       type: "gaussian"
350       std: 0.01
351     }
352     bias_filler {
353       type: "constant"
354       value: 0
355     }
356   }
357 }
358 layer {
359   name: "accuracy"
360   type: "Accuracy"
361   bottom: "fc8"
362   bottom: "label"
363   top: "accuracy"
364   include {
365     phase: TEST
366   }
367 }
368 layer {
369   name: "loss"
370   type: "SoftmaxWithLoss"
371   bottom: "fc8"
372   bottom: "label"
373   top: "loss"
374 }
375 layer {
376   name: "prob"
377   type: "Softmax"
378   bottom: "fc8"
379   top: "prob"
380 }

View Code

创建超参数文件slover.prototxt，主要定义训练的参数，包括迭代次数，每迭代多少次保存模型文件，学习率等等，net就是刚才定义的训练网络，这里训练和测试使用同一个网络。

 1 net: "train_val.prototxt"
 2 test_iter: 2
 3 test_interval: 10
 4 base_lr: 0.001
 5 lr_policy: "step"
 6 gamma: 0.1
 7 stepsize: 100
 8 display: 20
 9 max_iter: 100
10 momentum: 0.9
11 weight_decay: 0.005
12 solver_mode: GPU
13 snapshot: 20
14 snapshot_prefix: "model/"

4 训练模型

创建train.sh使用GPU进行训练，否则太慢！！！

1 #!/usr/bin/env sh
2 CAFFE_ROOT=/home/caf/object/caffe
3 SLOVER_ROOT=/home/caf/wk/learn
4 $CAFFE_ROOT/build/tools/caffe train --solver=$SLOVER_ROOT/slover.prototxt --gpu=0

在model文件夹下会生成caffemodel文件，使用这些文件用于图像的分类等操作。

4 测试

创建deploy.prototxt进行测试，和训练网络一样，只不过用于实际分类的网络并不需要训练网络那些参数了，因此需要重新定义一个模型文件，测试的图片在该模型中进行。

deploy.prototxt文件和train_val.prototxt文件不同的地方在于：

（1）输入的数据不再是LMDB，也不分为测试集和训练集，输入的类型为Input，定义的维度，和训练集的数据维度保持一致，227*227，否则会报错；

（2）去掉weight_filler和bias_filler，这些参数已经存在于caffemodel中了，由caffemodel进行初始化。

（3）去掉最后的Accuracy层和loss层，换位Softmax层，表示分为某一类的概率。

  1 name: "AlexNet"
  2 layer {               
  3   name: "data"
  4   type: "Input"
  5   top: "data"
  6   input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } } 
  7 }
  8 layer {
  9   name: "conv1"
 10   type: "Convolution"
 11   bottom: "data"
 12   top: "conv1"
 13   param {
 14     lr_mult: 1
 15     decay_mult: 1
 16   }
 17   param {
 18     lr_mult: 2
 19     decay_mult: 0
 20   }
 21   convolution_param {
 22     num_output: 96
 23     kernel_size: 11
 24     stride: 4
 25   }
 26 }
 27 layer {
 28   name: "relu1"
 29   type: "ReLU"
 30   bottom: "conv1"
 31   top: "conv1"
 32 }
 33 layer {
 34   name: "norm1"
 35   type: "LRN"
 36   bottom: "conv1"
 37   top: "norm1"
 38   lrn_param {
 39     local_size: 5
 40     alpha: 0.0001
 41     beta: 0.75
 42   }
 43 }
 44 layer {
 45   name: "pool1"
 46   type: "Pooling"
 47   bottom: "norm1"
 48   top: "pool1"
 49   pooling_param {
 50     pool: MAX
 51     kernel_size: 3
 52     stride: 2
 53   }
 54 }
 55 layer {
 56   name: "conv2"
 57   type: "Convolution"
 58   bottom: "pool1"
 59   top: "conv2"
 60   param {
 61     lr_mult: 1
 62     decay_mult: 1
 63   }
 64   param {
 65     lr_mult: 2
 66     decay_mult: 0
 67   }
 68   convolution_param {
 69     num_output: 256
 70     pad: 2
 71     kernel_size: 5
 72     group: 2
 73   }
 74 }
 75 layer {
 76   name: "relu2"
 77   type: "ReLU"
 78   bottom: "conv2"
 79   top: "conv2"
 80 }
 81 layer {
 82   name: "norm2"
 83   type: "LRN"
 84   bottom: "conv2"
 85   top: "norm2"
 86   lrn_param {
 87     local_size: 5
 88     alpha: 0.0001
 89     beta: 0.75
 90   }
 91 }
 92 layer {
 93   name: "pool2"
 94   type: "Pooling"
 95   bottom: "norm2"
 96   top: "pool2"
 97   pooling_param {
 98     pool: MAX
 99     kernel_size: 3
100     stride: 2
101   }
102 }
103 layer {
104   name: "conv3"
105   type: "Convolution"
106   bottom: "pool2"
107   top: "conv3"
108   param {
109     lr_mult: 1
110     decay_mult: 1
111   }
112   param {
113     lr_mult: 2
114     decay_mult: 0
115   }
116   convolution_param {
117     num_output: 384
118     pad: 1
119     kernel_size: 3
120   }
121 }
122 layer {
123   name: "relu3"
124   type: "ReLU"
125   bottom: "conv3"
126   top: "conv3"
127 }
128 layer {
129   name: "conv4"
130   type: "Convolution"
131   bottom: "conv3"
132   top: "conv4"
133   param {
134     lr_mult: 1
135     decay_mult: 1
136   }
137   param {
138     lr_mult: 2
139     decay_mult: 0
140   }
141   convolution_param {
142     num_output: 384
143     pad: 1
144     kernel_size: 3
145     group: 2
146   }
147 }
148 layer {
149   name: "relu4"
150   type: "ReLU"
151   bottom: "conv4"
152   top: "conv4"
153 }
154 layer {
155   name: "conv5"
156   type: "Convolution"
157   bottom: "conv4"
158   top: "conv5"
159   param {
160     lr_mult: 1
161     decay_mult: 1
162   }
163   param {
164     lr_mult: 2
165     decay_mult: 0
166   }
167   convolution_param {
168     num_output: 256
169     pad: 1
170     kernel_size: 3
171     group: 2
172   }
173 }
174 layer {
175   name: "relu5"
176   type: "ReLU"
177   bottom: "conv5"
178   top: "conv5"
179 }
180 layer {
181   name: "pool5"
182   type: "Pooling"
183   bottom: "conv5"
184   top: "pool5"
185   pooling_param {
186     pool: MAX
187     kernel_size: 3
188     stride: 2
189   }
190 }
191 layer {
192   name: "fc6"
193   type: "InnerProduct"
194   bottom: "pool5"
195   top: "fc6"
196   param {
197     lr_mult: 1
198     decay_mult: 1
199   }
200   param {
201     lr_mult: 2
202     decay_mult: 0
203   }
204   inner_product_param {
205     num_output: 4096
206   }
207 }
208 layer {
209   name: "relu6"
210   type: "ReLU"
211   bottom: "fc6"
212   top: "fc6"
213 }
214 layer {
215   name: "drop6"
216   type: "Dropout"
217   bottom: "fc6"
218   top: "fc6"
219   dropout_param {
220     dropout_ratio: 0.5
221   }
222 }
223 layer {
224   name: "fc7"
225   type: "InnerProduct"
226   bottom: "fc6"
227   top: "fc7"
228   param {
229     lr_mult: 1
230     decay_mult: 1
231   }
232   param {
233     lr_mult: 2
234     decay_mult: 0
235   }
236   inner_product_param {
237     num_output: 4096
238   }
239 }
240 layer {
241   name: "relu7"
242   type: "ReLU"
243   bottom: "fc7"
244   top: "fc7"
245 }
246 layer {
247   name: "drop7"
248   type: "Dropout"
249   bottom: "fc7"
250   top: "fc7"
251   dropout_param {
252     dropout_ratio: 0.5
253   }
254 }
255 layer {
256   name: "fc8"
257   type: "InnerProduct"
258   bottom: "fc7"
259   top: "fc8"
260   param {
261     lr_mult: 1
262     decay_mult: 1
263   }
264   param {
265     lr_mult: 2
266     decay_mult: 0
267   }
268   inner_product_param {
269     num_output: 2
270   }
271 }
272 layer {
273   name: "prob"
274   type: "Softmax"
275   bottom: "fc8"
276   top: "prob"
277 }

View Code

用于训练的python代码，使用caffe中python的接口，主要定义好自己训练好的参数文件，模型文件的位置，以及均值文件的位置。

 1 import numpy as np
 2 import matplotlib.pyplot as plt
 3 
 4 import sys
 5 caffe_root="/home/caf/object/caffe/"
 6 sys.path.insert(0,caffe_root+'python')
 7 import caffe
 8 caffe.set_device(0)
 9 caffe.set_mode_gpu()
10 model_def = 'deploy.prototxt'
11 model_weights = 'model/_iter_100.caffemodel'
12 net = caffe.Net(model_def,
13                 model_weights,  
14                 caffe.TEST)     
15 mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')
16 mu = mu.mean(1).mean(1)
17 #print 'mean-subtracted values:', zip('BGR', mu)
18 transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
19 transformer.set_transpose('data', (2,0,1))
20 transformer.set_mean('data', mu)
21 transformer.set_raw_scale('data', 255)    
22 transformer.set_channel_swap('data', (2,1,0))
23 net.blobs['data'].reshape(3,227, 227)
24 image = caffe.io.load_image('test.jpg')
25 transformed_image = transformer.preprocess('data', image)
26 #plt.imshow(image)
27 #plt.show()
28 net.blobs['data'].data[...] = transformed_image
29 output = net.forward()  
30 output_prob = output['prob']
31 print output_prob
32 print 'predicted class is:', output_prob.argmax()

遇到的问题

（1）标签文件不能用制表符，必须是空格，否则会找不到数据文件

（2）CUDA问题，报一个类似叫CUDASuccess的错误，说明GPU空间不够，需要释放空间，使用 nvidia-smi 命令查看那个程序占用GPU过高，使用 kill -9 PID结束掉即可

（3）由于caffe版本的问题，层的定义有layer和layers，使用layer定义，type需要加双引号，是字符格式；使用layers定义，type不用加双引号，变为全大写字母

posted @ 2017-04-16 19:45 康小武阅读(6788) 评论(0) 编辑收藏举报

刷新页面返回顶部

康小武