1. tf.data
相比 feed_in和placeholder的优势:数据的一些操作(比如shuffle/batch/repeat/map)集成在tf中,所以效率高速度快,而且属于high-level api,使用方便
输出一些dataset的types/shape以做sanity check
print(xxxdataset.output_types) # >> (tf.float32, tf.float32) print(xxxdataset.output_shapes) # >> (TensorShape([]), TensorShape([]))
# 准备数据 dataset = tf.data.TFRecordDataset([file1, file2, file3, ...]) # 数据操作 dataset = dataset.shuffle(1000) dataset = dataset.repeat(100) dataset = dataset.batch(128) dataset = dataset.map(lambda x: tf.one_hot(x, 10)) #转化为 one-hot encoding # 取数据 iterator = dataset.make_one_shot_iterator() # 一种获取iterator的方式,后面还有更通用的 X, Y = iterator.get_next() # 如果上面batch过,一次就是取一个batch,否则就是一个sample(x,y)
iterator = tf.data.Iterator.from_structure(train_data.output_types, train_data.output_shapes) img, label = iterator.get_next() train_init = iterator.make_initializer(train_data) # initializer for train_data test_init = iterator.make_initializer(test_data) # initializer for train_data # ... sess.run(train_init) # 系统会自动加载training set的img,label # ... sess.run(test_init) # 加载的是 testing set的 img,labels # 最上面的 img, label = iterator.get_next() 完全不存在同名的冲突,因为为init的控制隔离
2. optimizer速记
# create an optimizer. optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) # compute the gradients for a list of variables. grads_and_vars = optimizer.compute_gradients(loss, <list of variables>) # grads_and_vars is a list of tuples (gradient, variable). Do whatever you # need to the 'gradient' part, for example, subtract each of them by 1. subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars] # ask the optimizer to apply the subtracted gradients. optimizer.apply_gradients(subtracted_grads_and_vars)
stop_gradient( input, name=None )
- 应用场景举例:
- When you train a GAN (Generative Adversarial Network) where no backprop should happen through the adversarial example generation process.
- The EM algorithm where the M-step should not involve backpropagation through the output of the E-step
- 应用场景举例:
tf.gradients( ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None, stop_gradients=None )
Technical detail: This is especially useful when training only parts of a model. For example, we can use tf.gradients() to take the derivative G of the loss w.r.t. to the middle layer. Then we use an optimizer to minimize the difference between the middle layer output M and M + G. This only updates the lower half of the network.(冻结某些层,只训练一些层,比如说:fine-tune过程)