tensorflow,torch tips

  • apply weightDecay,L2 REGULARIZATION_LOSSES
weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
for w in weights:
    print(w)
l2r = tf.contrib.layers.l2_regularizer(0.001)
tf.contrib.layers.apply_regularization(l2r,weights)
tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)

##cross_entropy loss

tf.add_to_collection('losses', cross_entropy_mean)

loss = tf.add_n(tf.get_collection('losses'), name='cross_entropy_loss')

# config optimizer
target_loss = target_loss + tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES),name='l2_loss')
train_step = tf.train.AdamOptimizer(
learning_rate).minimize(target_loss,global_step)



 

  • .learningRateDecay
global_step = tf.Variable(0, trainable=False,name = 'global_step')
learning_rate = tf.train.exponential_decay(opts.learning_rate, global_step, 10000, 0.96, staircase=True)
train_step = tf.train.AdamOptimizer(learning_rate).minimize(target_loss,global_step)

 

  • tensorflow 与torch 中 learningRateDecay的差异
torch:  
 -- (3) learning rate decay (annealing)
   local clr = lr / (1 + state.t*lrd)

   state.t = state.t + 1

https://github.com/torch/optim/blob/master/adam.lua

tensorflow:
decayed_learning_rate = learning_rate *
                        decay_rate ^ (global_step / decay_steps)

https://www.tensorflow.org/versions/r0.11/api_docs/python/train/decaying_the_learning_rate

torch中是每个batch执行一次,如果lrd = 0.001

tensorflow 对应的应该是:decay_steps设为1,decay_steps = 1-lrd=0.999,这样就与torch的方法近似了?

不对,tesorflow中有等价的tf.train.inverse_time_decay

 

  • tensorflow 中的softmax与torch 中LogSoftmax

tf.nn.softmax 

 exp(logits) / reduce_sum(exp(logits), dim)

tf.log(tf.nn.softmax(logits))并不与torch的LogSoftmax,torch中的LogSoftmax实现方式不一样:

https://github.com/torch/nn/blob/master/lib/THNN/generic/LogSoftMax.c

http://blog.csdn.net/lanchunhui/article/details/51248184 

 

  • saver

http://www.jianshu.com/p/8487db911d9a 

 

  • tensorflow 与torch 中 DropOut的差异
torch:
Furthermore, the outputs are scaled by a factor of 1/(1-p) during training. 

tensorflow:
With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0. The scaling is so that the expected sum is unchanged.

所以torch中的dropout_rate = p,相当于tesnsorflow中的keep_prob = 1-p

 

参数顺序

conv:torch outputs*inputs*kh*kw , tf  kh*kw*inputs*outputs

deconv:torch inputs*outputs*kh*kw , tf  kh*kw*outputs*inputs

移动端&MPS: outputs*kh*kw*inputs ,注意deconv kh*kw rotate 180度

posted @ 2017-06-14 15:39  mlj318  阅读(539)  评论(0编辑  收藏  举报