caffe数据层

在caffe网络中数据层通常是最底层,数据通过Data layer进入caffe网络。效率起见,数据一般从databases(LevelDB, LMDB)导入,也可直接从内存(memory)导入。如不看重效率,也可从HDF5或是常见的图片格式导入。

一些通常的输入预处理(mean subtraction, scaling , random cropping , mirroring)可在一些层的TransformationParameters中进行注明。当TransformationParameters不可用时,bias,scale,crop等层可帮助转换输入。

这里我们只看一下比较常用的Database layer

message DataParameter {
  enum DB {
    LEVELDB = 0;
    LMDB = 1;
  }
  // Specify the data source.
  optional string source = 1;
  // Specify the batch size.
  optional uint32 batch_size = 4;
  // The rand_skip variable is for the data layer to skip a few data points
  // to avoid all asynchronous sgd clients to start at the same point. The skip
  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
  // be larger than the number of keys in the database.
  // DEPRECATED. Each solver accesses a different subset of the database.
  optional uint32 rand_skip = 7 [default = 0];
  optional DB backend = 8 [default = LEVELDB];
  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
  // simple scaling and subtracting the data mean, if provided. Note that the
  // mean subtraction is always carried out before scaling.
  optional float scale = 2 [default = 1];
  optional string mean_file = 3;
  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
  // crop an image.
  optional uint32 crop_size = 5 [default = 0];
  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
  // data.
  optional bool mirror = 6 [default = false];
  // Force the encoded image to have 3 color channels
  optional bool force_encoded_color = 9 [default = false];
  // Prefetch queue (Increase if data feeding bandwidth varies, within the
  // limit of device memory for GPU training)
  optional uint32 prefetch = 10 [default = 4];
}

其中必须的参数为

source: 设置database的路径

batc_size:一次处理的图片数量

可选参数

rand_skip: 跳到指定位置,对asynchronous sgd有用

backend: 默认为LEVELDB,(LEVELDB or LMDB)

 

选用Database type时,可选参数中没有shuffle,所以在转换LMDB及LEVELDB时要注意将数据洗牌。否则,若大量的相同label数据在一块会出现,loss大幅循环震荡的情况。

 

在deploy。prototxt中 datalayer 的输入参数input_param { shape: { dim: 1 dim: 3 dim: 18 dim: 18 } } 4个dim分别对应为 batchsize,channel, rows,cols

posted on 2017-03-10 10:33  klitech  阅读(402)  评论(0编辑  收藏  举报

导航