cs20_4-1
1. eager速记
-
可以用大部分python的方法,而不仅限于tf.xxx
-
is compatible with Python debugging tools:
pdb.set_trace()
-
provides immediate error reporting,而不需要sess.run(xxxerror)
-
permits use of Python data structures, 但是我觉得
tf.data
很好用,这一点待看 -
enables you to use and differentiate through Python control flow ,这一点值得期待,因为tf.cond比较繁琐且有冗余设计之嫌
-
目前至少1.8支持eager, 只需要加几行:
import tensorflow as tf import tensorflow.contrib.eager as tfe tfe.enable_eager_execution() # Call this at program start-up
-
一个最简单的示例:
x = [[2.]] # No need for placeholders! m = tf.matmul(x, x) print(m) # No sessions! # tf.Tensor([[4.]], shape=(1, 1), dtype=float32)
-
2. 两个示例
- LinearRegression in eager mode
- word2vec in eager mode
3. eager的一些特色
-
gradient
-
举例1:
def square(x): return x ** 2 grad = tfe.gradients_function(square) # 感觉更像是求了一组【导函数】 print(square(3.)) # tf.Tensor(9., shape=(), dtype=float32) print(grad(3.)) # [tf.Tensor(6., shape=(), dtype=float32))]
-
举例2:
x = tfe.Variable(2.0) # eager中推荐的变量定义方式 def loss(y): return (y - x ** 2) ** 2 # 用python写公式 grad = tfe.implicit_gradients(loss) # 隐式得到一组偏导函数(我的比喻) print(loss(7.)) # tf.Tensor(9., shape=(), dtype=float32) print(grad(7.)) # [(<tf.Tensor: -24.0, shape=(), dtype=float32>, <tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>)]
-
比较有趣的是,即使eager mode没有enable,下面几组求导相关的函数依旧可用
tfe.gradients_function() tfe.value_and_gradients_function() # 相比上面的得到gradient,还能返回value tfe.implicit_gradients() tfe.implicit_value_and_gradients()
-
4. word2vec为例的很多知识点
-
NCE Loss
-
理解word2vec & embedding
-
参考:
-
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
Note that sample-based approaches, whether it’s negative sampling or NCE, are only useful at training time -- during inference, the full softmax still needs to be computed to obtain a normalized probability. (NCE or negtive sampling仅仅在train时用来加速,但是test阶段还是要用 fully softmax以获得所有词的概率)
-
-
基于OOP构建一个网络并训练
-
pipeline:
Phase 1: assemble your graph
-
Import data (either with tf.data or with placeholders)
-
Define the weights
-
Define the inference model
-
Define loss function
-
Define optimizer
Phase 2: execute the computation
-
Initialize all model variables for the first time.
-
Initialize iterator / feed in the training data.
-
Execute the inference model on the training data, so it calculates for each training input example the output with the current model parameters.
-
Compute the cost
-
Adjust the model parameters to minimize/maximize the cost depending on the model.
-
-
tempelate code:
class SkipGramModel: """ Build the graph for word2vec model """ def __init__(self, params): pass def _import_data(self): """ Step 1: import data """ pass def _create_embedding(self): """ Step 2: in word2vec, it's actually the weights that we care about """ pass def _create_loss(self): """ Step 3 + 4: define the inference + the loss function """ pass def _create_optimizer(self): """ Step 5: define optimizer """ pass
-
-
可视化技术:t-sne, 以及tf.summary, tf.saver, te.restore, 有点复杂,后面再做一次再理解
-
name scope: 最核心的功能:计算图中同一个name scope中的tensor会被分为一组(合并为一个超节点),计算图更简洁易懂
- tensorboard中的三种边:(1)solid grey arrows(灰色粗线条),表示data flow(数据流); (2)solid orange arrows(橙色粗线条),表示参考关系或影响关系,e.g. optimizer节点会通过bp来影响w,b; (3)dotted arrows(虚线),表示依赖关系或控制关系,e.g. weights 节点只有在init之后才会生效work
-
Variable scope: 和name scope功能不同,它主要是:变量共享,更好的重用代码,同时也具有name scope的功能。下面是一个范例:
def fully_connected(x, output_dim, scope): with tf.variable_scope(scope) as scope: # 支持下面的变量为共享重用的 w = tf.get_variable("weights", [x.shape[1], output_dim], initializer=tf.random_normal_initializer()) b = tf.get_variable("biases", [output_dim], initializer=tf.constant_initializer(0.0)) return tf.matmul(x, w) + b def two_hidden_layers(x): h1 = fully_connected(x, 50, 'h1') h2 = fully_connected(h1, 10, 'h2') with tf.variable_scope('two_layers') as scope: logits1 = two_hidden_layers(x1) scope.reuse_variables() # 允许变量共享重用,反复以不同x来使用中层的two_hidden_layers logits2 = two_hidden_layers(x2)
5. 一些其他的点:
-
Graph collections
-
访问一些变量:tf.get_collection(key, scope=None) //key being the name of the collection, scope is the scope of the variables.
-
默认所有的变量,都放在
tf.GraphKeys.GLOBAL_VARIABLES
这个collection中。tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='my_scope')
-
默认所有具有
trainable=True
的变量被收集在:tf.GraphKeys.TRAINABLE_VARIABLES
-
创建自定义集合:
tf.add_to_collection(name, value)
, 任何op放进去都可以,不一定要是variable -
一些系统默认行为: 更多
predefined graph keys: tf.GraphKeys
在 tf系统中有一些默认的行为。e.g. tf.train.Optimizer subclasses default to optimizing the variables collected under tf.GraphKeys.TRAINABLE_VARIABLES if none is specified, but it is also possible to pass an explicit list of variables.
tf.GraphKeys: GLOBAL_VARIABLES / LOCAL_VARIABLES / MODEL_VARIABLES / TRAINABLE_VARIABLES / SUMMARIES / QUEUE_RUNNERS / MOVING_AVERAGE_VARIABLES / REGULARIZATION_LOSSES ref-link: https://www.tensorflow.org/api_docs/python/tf/GraphKeys
-
-
Manage experiments:
-
motivation: (1)任何时刻可以停止训练保存断点,下次训练从断点处开始;(2)为了确保实验结论可重现,其中一个关键是:控制随机数(control this random factor in our models)
-
tf.train.Saver()
-
tf.train.Saver()默认保存所有的变量
-
saver.restore()默认恢复最后一次保存的model info
-
tf.train.Saver.save( sess, save_path, global_step=None, latest_filename=None, meta_graph_suffix='meta', write_meta_graph=True, write_state=True ) # a example # define model # create a saver object saver = tf.train.Saver() # launch a session to execute the computation with tf.Session() as sess: # actual training loop for step in range(training_steps): sess.run([optimizer]) if (step + 1) % 1000 == 0: saver.save(sess, 'checkpoint_directory/model_name', global_step=global_step) # 自动拼接 模型文件名为 model_name-global_step
-
核心问题:saver.save()默认保存的是model的所有variables,saver.restore()默认恢复所有的variables的值,所以我在想一个问题:在恢复变量值之前,graph是不是应该先恢复好???靠谁去恢复graph???【答:还是需要用户先自己创建graph, 然后再restore variables】。问题是,这个创建graph要怎样才能保持和【之前saver()时的graph】一致呢,还是不用考虑太多,只是简单的create graph吗?
-
上述问题待解决!!!
-
这个问题的一个答案:https://www.jianshu.com/p/8850127ed25d
https://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/
-
自定义需要保存的变量:
v1 = tf.Variable(..., name='v1') v2 = tf.Variable(..., name='v2') # pass the variables as a dict: saver = tf.train.Saver({'v1': v1, 'v2': v2}) # pass them as a list saver = tf.train.Saver([v1, v2]) # passing a list is equivalent to passing a dict with the variable op names # as keys //即dict和list两个方案其实差不多 saver = tf.train.Saver({v.op.name: v for v in [v1, v2]})
-
-
tf.summary()
-
我个人觉得:没必要再使用matplot做可视化,tf.summary足够强大和方便(多敲两条指令:tensorboard, SSH -L)
-
举例:
# 程序最开始部分,开启一个witer writer = tf.summary.FileWriter('graphs/word2vec/lr' + str(self.lr), sess.graph) def _create_summaries(self): with tf.name_scope("summaries"): tf.summary.scalar("loss", self.loss) tf.summary.scalar("accuracy", self.accuracy) tf.summary.histogram("histogram loss", self.loss) # because you have several summaries, we should merge them all # into one op to make it easier to manage self.summary_op = tf.summary.merge_all() # 获得本step的loss result 到 summary loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], feed_dict=feed_dict) # 把本step的loss result的summary写入writer writer.add_summary(summary, global_step=step) # 等到不再需要再写summary了,再把witer中所有内容写入磁盘 writer.close()
-
目前最厉害的作用:可视化观测调参过程:比如做两次训练(学习率分别是0.5,1.0),然后基于summary一个图中画出两次的loss曲线,就能直观感受进行调参
-
还有一个功能:显示图片(随时写一个writer, summary, 两条指令:tensorboard, SSH -L)就能代替matplot显示图片
tf.summary.image(name, tensor, max_outputs=3, collections=None)
-
https://www.tensorflow.org/guide/summaries_and_tensorboard?hl=zh-cn
-
-
Control randomization
-
motivation: control TensorFlow’s random state to get stable results for your experiments
-
(1) Set random seed at operation level:
c = tf.random_uniform([], -10, 10, seed=2) d = tf.random_uniform([], -10, 10, seed=2) with tf.Session() as sess: print sess.run(c) # >> 3.57493 print sess.run(d) # >> 3.57493 ########################################### # 几类情况: # 1. c = tf.random_uniform([], -10, 10, seed=2) with tf.Session() as sess: print sess.run(c) # >> 3.57493 # 每次初始化sess,都会产生一样的seed with tf.Session() as sess: print sess.run(c) # >> 3.57493 # 又是初始化sess的时刻 # 2. c = tf.random_uniform([], -10, 10, seed=2) with tf.Session() as sess: print sess.run(c) # >> 3.57493 # sess刚刚初始化 print sess.run(c) # >> -5.97319 # sess初始化之后产生的就是不同的seed
-
(2) Set random seed at graph level with tf.Graph.seed
-
tf.set_random_seed(seed)
这个级别的seed的意义:确保其他人的图和当前这个图的随机数是相同的,并不关心op-level的随机性 -
举例:
# a.py import tensorflow as tf tf.set_random_seed(2) # 只给了图级别的seed,以方便其他图复现本图的随机数 c = tf.random_uniform([], -10, 10) # 没有给 op-level的seed d = tf.random_uniform([], -10, 10) with tf.Session() as sess: print sess.run(c) # -4.00752 print sess.run(d) # -2.98339 # op-level没有控制随机性相同! # b.py import tensorflow as tf tf.set_random_seed(2) # 只给了图级别的seed,以方便其他图复现本图的随机数 c = tf.random_uniform([], -10, 10) # 没有给 op-level的seed d = tf.random_uniform([], -10, 10) with tf.Session() as sess: print sess.run(c) # -4.00752 # 这张图和a.py中那张图的随机数结果完全相同 print sess.run(d) # -2.98339 # 这张图和a.py中那张图的随机数结果完全相同
-
-
-
-
Autodiff
-
tf.gradient(),举例:
tf.gradients(ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None) # tf.gradients是手动指定对某些变量求导,可用于【固定某些层不训练,指定某些层训练】 # x = tf.Variable(2.0) y = 2.0 * (x ** 3) # 一元函数:dy/dx grad_y = tf.gradients(y, x) with tf.Session() as sess: sess.run(x.initializer) print sess.run(grad_y) # >> 24.0 # x = tf.Variable(2.0) y = 2.0 * (x ** 3) z = 3.0 + y ** 2 # 多元函数,求两个偏导:dz/dx, dz/dy grad_z = tf.gradients(z, [x, y]) with tf.Session() as sess: sess.run(x.initializer) print sess.run(grad_z) # >> [768.0, 32.0] # 768 is the gradient of z with respect to x, 32 with respect to y
-