PY-R-FCN+ResNet101训练自己数据集时的问题
py-R-FCN源码下载地址:
https://github.com/Orpine/py-R-FCN
配置环境省略
1、测试Demo
下载已训练模型(自己云盘)将模型放入data/rfcn_models中
运行:
./tools/demo_rfcn.py --net ResNet-101
2、拷贝数据集
我这里训练选择的是在VOC2007的基础上添加一些数据集:
2.1首先将我要添加的几个数据集重新排编号:rename.py
import os i=16900 for files in os.listdir("./2018/"): os.rename(os.path.join("./2018/",files),os.path.join("./2018/","0"+str(i)+".jpg")) for files1 in os.listdir("./20181/"): if(files[:-4]==files1[:-4]): os.rename(os.path.join("./20181/",files1),os.path.join("./20181/","0"+str(i)+".xml")) i=i+1
i表示需要的起始编号.这里需要注意,更改过后图片与XML文件名是否连续,是否一一对应,否则训练是会出错。
2.2、如果遇到XML文件中标签大小写问题,将大写标签改为小写
命令格式:find -name '要查找的文件名' | xargs perl -pi -e 's|被替换的字符串|替换后的字符串|g'
2.3、如果只需要检测几类目标,可以将数据集中不需要的目标类的标签删除
import xml.etree.cElementTree as ET import os path_root = [ 'Annotations'] CLASSES = [ "aeroplane", "bicycle", "cat", "car", "dog", "motorbike", "person", "horse","train", "bus"] for anno_path in path_root: xml_list = os.listdir(anno_path) for axml in xml_list: path_xml = os.path.join(anno_path, axml) tree = ET.parse(path_xml) root = tree.getroot() for child in root.findall('object'): name = child.find('name').text if not name in CLASSES: root.remove(child) tree.write(os.path.join('Annotations', axml))
上面标签里是自己需要的几类,这里一定要注意,备份数据集(压缩)最好是删除不需要的几类时,将数据集重新命名,因为此处代码会将电脑中所有相同名字的XML文件都给更改了,绝望吧。将所有的xml放到VOC2007下的Annotations中。
2.4、将所有的训练图片放到JPEGImages文件夹中,重新生成ImageSet\Main里的四个txt文件,分别是:trainval.txt(训练和验证集总和)、train.txt(训练集)、val.txt(验证集)、test.txt(测试集)。
import os import random trainval_percent = 0.66 train_percent = 0.5 xmlfilepath = 'Annotations' txtsavepath = 'ImageSets\Main' total_xml = os.listdir(xmlfilepath) num=len(total_xml) list=range(num) tv=int(num*trainval_percent) tr=int(tv*train_percent) trainval= random.sample(list,tv) train=random.sample(trainval,tr) ftrainval = open('ImageSets/Main/trainval.txt', 'w') ftest = open('ImageSets/Main/test.txt', 'w') ftrain = open('ImageSets/Main/train.txt', 'w') fval = open('ImageSets/Main/val.txt', 'w') for i in list: name=total_xml[i][:-4]+'\n' if i in trainval: ftrainval.write(name) if i in train: ftrain.write(name) else: fval.write(name) else: ftest.write(name) ftrainval.close() ftrain.close() fval.close() ftest .close()
3、修改配置文件
3.1、修改class-aware/train_ohem.prototxt
layer { name: 'input-data' type: 'Python' top: 'data' top: 'im_info' top: 'gt_boxes' python_param { module: 'roi_data_layer.layer' layer: 'RoIDataLayer' param_str: "'num_classes': 16" #cls_num } }
layer { name: 'roi-data' type: 'Python' bottom: 'rpn_rois' bottom: 'gt_boxes' top: 'rois' top: 'labels' top: 'bbox_targets' top: 'bbox_inside_weights' top: 'bbox_outside_weights' python_param { module: 'rpn.proposal_target_layer' layer: 'ProposalTargetLayer' param_str: "'num_classes': 16" #cls_num } }
layer { bottom: "conv_new_1" top: "rfcn_cls" name: "rfcn_cls" type: "Convolution" convolution_param { num_output: 784 #cls_num*(score_maps_size^2) kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }
layer { bottom: "conv_new_1" top: "rfcn_bbox" name: "rfcn_bbox" type: "Convolution" convolution_param { num_output: 3136 #4*cls_num*(score_maps_size^2) kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }
layer { bottom: "rfcn_cls" bottom: "rois" top: "psroipooled_cls_rois" name: "psroipooled_cls_rois" type: "PSROIPooling" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 16 #cls_num group_size: 7 } }
layer { bottom: "rfcn_bbox" bottom: "rois" top: "psroipooled_loc_rois" name: "psroipooled_loc_rois" type: "PSROIPooling" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 64 #4*cls_num group_size: 7 } }
3.2、修改class-aware/test.prototxt
layer { bottom: "conv_new_1" top: "rfcn_cls" name: "rfcn_cls" type: "Convolution" convolution_param { num_output: 784 #cls_num*(score_maps_size^2) kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }
layer { bottom: "conv_new_1" top: "rfcn_bbox" name: "rfcn_bbox" type: "Convolution" convolution_param { num_output: 3136 #4*cls_num*(score_maps_size^2) kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }
layer { bottom: "rfcn_cls" bottom: "rois" top: "psroipooled_cls_rois" name: "psroipooled_cls_rois" type: "PSROIPooling" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 16 #cls_num group_size: 7 } }
layer { bottom: "rfcn_bbox" bottom: "rois" top: "psroipooled_loc_rois" name: "psroipooled_loc_rois" type: "PSROIPooling" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 64 #4*cls_num group_size: 7 } }
layer { name: "cls_prob_reshape" type: "Reshape" bottom: "cls_prob_pre" top: "cls_prob" reshape_param { shape { dim: -1 dim: 16 #cls_num } } }
layer { name: "bbox_pred_reshape" type: "Reshape" bottom: "bbox_pred_pre" top: "bbox_pred" reshape_param { shape { dim: -1 dim: 64 #4*cls_num } } }
3.3、修改train_agnostic.prototxt
layer { name: 'input-data' type: 'Python' top: 'data' top: 'im_info' top: 'gt_boxes' python_param { module: 'roi_data_layer.layer' layer: 'RoIDataLayer' param_str: "'num_classes': 16" #cls_num } }
layer { bottom: "conv_new_1" top: "rfcn_cls" name: "rfcn_cls" type: "Convolution" convolution_param { num_output: 784 #cls_num*(score_maps_size^2) ### kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }
layer { bottom: "rfcn_cls" bottom: "rois" top: "psroipooled_cls_rois" name: "psroipooled_cls_rois" type: "PSROIPooling" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 16 #cls_num ### group_size: 7 } }
3.4、修改train_agnostic_ohem.prototxt
layer { name: 'input-data' type: 'Python' top: 'data' top: 'im_info' top: 'gt_boxes' python_param { module: 'roi_data_layer.layer' layer: 'RoIDataLayer' param_str: "'num_classes': 16" #cls_num ### } }
layer { bottom: "conv_new_1" top: "rfcn_cls" name: "rfcn_cls" type: "Convolution" convolution_param { num_output: 784 #cls_num*(score_maps_size^2) ### kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }
layer { bottom: "rfcn_cls" bottom: "rois" top: "psroipooled_cls_rois" name: "psroipooled_cls_rois" type: "PSROIPooling" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 16 #cls_num ### group_size: 7 } }
3.5、修改test_agnostic.prototxt
layer { bottom: "conv_new_1" top: "rfcn_cls" name: "rfcn_cls" type: "Convolution" convolution_param { num_output: 784 #cls_num*(score_maps_size^2) ### kernel_size: 1 pad: 0 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } param { lr_mult: 1.0 } param { lr_mult: 2.0 } }
layer { bottom: "rfcn_cls" bottom: "rois" top: "psroipooled_cls_rois" name: "psroipooled_cls_rois" type: "PSROIPooling" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 16 #cls_num ### group_size: 7 } }
layer { name: "cls_prob_reshape" type: "Reshape" bottom: "cls_prob_pre" top: "cls_prob" reshape_param { shape { dim: -1 dim: 16 #cls_num ### } } }
3.6、$RFCN/lib/datasets/pascal_voc.py
class pascal_voc(imdb): def __init__(self, image_set, year, devkit_path=None): imdb.__init__(self, 'voc_' + year + '_' + image_set) self._year = year self._image_set = image_set self._devkit_path = self._get_default_path() if devkit_path is None \ else devkit_path self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year) self._classes = ('__background__', # always index 0 '你的标签1','你的标签2',你的标签3','你的标签4' )
4、开始训练
./experiments/scripts/rfcn_end2end_ohem.sh 0 ResNet-101 pascal_voc
训练时遇见的几个不常见的坑:
1、训练完模型,在自己测试阶段出错
File "/home/nextcar/Py-rfcn/py-R-FCN/tools/../lib/datasets/voc_eval.py", line 20, in parse_rec
obj_struct['truncated'] = int(obj.find('truncated').text)
AttributeError: 'NoneType' object has no attribute 'text'我采取的方法时将相应文件中此行注释掉,应为后加的一部分图片中没有truncated标签。
2、int(bbox.find('ymin').text), ValueError: invalid literal for int() with base 10: '45.70000076293945'
解决方案:/lib/datasets/voc_eval.py改为:
obj_struct['bbox'] = [int(float(bbox.find('xmin').text)),
int(float(bbox.find('ymin').text)),
int(float(bbox.find('xmax').text)),
int(float(bbox.find('ymax').text))]
因为后加的一部分数据,相机标定框含有浮点型