caffe 中的的参数
base_lr:初始学习率
momentum:上一次梯度权重
weight_decay:正则项系数
以上三个参数是SGD的核心,关于base_lr和momentum见:http://caffe.berkeleyvision.org/tutorial/solver.html
关于weight_decay: http://stats.stackexchange.com/questions/29130/difference-between-neural-net-weight-decay-and-learning-rate
lr_policy:(gamma、power、step)学习率更新规则,见caffe代码
// Return the current learning rate. The currently implemented learning rate // policies are as follows: // - fixed: always return base_lr. // - step: return base_lr * gamma ^ (floor(iter / step)) // - exp: return base_lr * gamma ^ iter // - inv: return base_lr * (1 + gamma * iter) ^ (- power) // - multistep: similar to step but it allows non uniform steps defined by // stepvalue // - poly: the effective learning rate follows a polynomial decay, to be // zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power) // - sigmoid: the effective learning rate follows a sigmod decay // return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize)))) //
lr_mult:每一层都有两个lr_mult参数代表本层的学习率,第一个是base_lr*
lr_mult代表本层样本,第二个是bias 的学习率
xavier:初始化参数,trick,见Understanding the difficulty of training deep feedforward neural networks