tensorflow,torch tips
- apply weightDecay,L2 REGULARIZATION_LOSSES
weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) for w in weights: print(w) l2r = tf.contrib.layers.l2_regularizer(0.001) tf.contrib.layers.apply_regularization(l2r,weights) tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
##cross_entropy loss
tf.add_to_collection('losses', cross_entropy_mean)
loss = tf.add_n(tf.get_collection('losses'), name='cross_entropy_loss')
# config optimizer
target_loss = target_loss + tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES),name='l2_loss')
train_step = tf.train.AdamOptimizer(
learning_rate).minimize(target_loss,global_step)
- .learningRateDecay
global_step = tf.Variable(0, trainable=False,name = 'global_step') learning_rate = tf.train.exponential_decay(opts.learning_rate, global_step, 10000, 0.96, staircase=True) train_step = tf.train.AdamOptimizer(learning_rate).minimize(target_loss,global_step)
- tensorflow 与torch 中 learningRateDecay的差异
torch: -- (3) learning rate decay (annealing) local clr = lr / (1 + state.t*lrd) state.t = state.t + 1 https://github.com/torch/optim/blob/master/adam.lua tensorflow: decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps) https://www.tensorflow.org/versions/r0.11/api_docs/python/train/decaying_the_learning_rate
torch中是每个batch执行一次,如果lrd = 0.001
tensorflow 对应的应该是:decay_steps设为1,decay_steps = 1-lrd=0.999,这样就与torch的方法近似了?
不对,tesorflow中有等价的tf.train.inverse_time_decay
- tensorflow 中的softmax与torch 中LogSoftmax
tf.nn.softmax
exp(logits) / reduce_sum(exp(logits), dim)
tf.log(tf.nn.softmax(logits))并不与torch的LogSoftmax,torch中的LogSoftmax实现方式不一样:
https://github.com/torch/nn/blob/master/lib/THNN/generic/LogSoftMax.c
http://blog.csdn.net/lanchunhui/article/details/51248184
- saver
http://www.jianshu.com/p/8487db911d9a
- tensorflow 与torch 中 DropOut的差异
torch: Furthermore, the outputs are scaled by a factor of 1/(1-p) during training. tensorflow: With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0. The scaling is so that the expected sum is unchanged.
所以torch中的dropout_rate = p,相当于tesnsorflow中的keep_prob = 1-p
参数顺序
conv:torch outputs*inputs*kh*kw , tf kh*kw*inputs*outputs
deconv:torch inputs*outputs*kh*kw , tf kh*kw*outputs*inputs
移动端&MPS: outputs*kh*kw*inputs ,注意deconv kh*kw rotate 180度