TensorFlow tf.gradients的用法详细解析以及具体例子
tf.gradients
官方定义:
tf.gradients(
ys,
xs,
grad_ys=None,
name='gradients',
stop_gradients=None,
)
Constructs symbolic derivatives of sum of ys
w.r.t. x in xs
.
ys
and xs
are each a Tensor
or a list of tensors. grad_ys
is a list of Tensor
, holding the gradients received by theys
. The list must be the same length as ys
.
gradients()
adds ops to the graph to output the derivatives of ys
with respect to xs
. It returns a list of Tensor
of length len(xs)
where each tensor is the sum(dy/dx)
for y in ys
.
grad_ys
is a list of tensors of the same length as ys
that holds the initial gradients for each y in ys
. When grad_ys
is None, we fill in a tensor of '1's of the shape of y for each y in ys
. A user can provide their own initial grad_ys
to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).
stop_gradients
is a Tensor
or a list of tensors to be considered constant with respect to all xs
. These tensors will not be backpropagated through, as though they had been explicitly disconnected using stop_gradient
. Among other things, this allows computation of partial derivatives as opposed to total derivatives.
翻译:
1. xs和ys可以是一个张量,也可以是张量列表,tf.gradients(ys,xs) 实现的功能是求ys(如果ys是列表,那就是ys中所有元素之和)关于xs的导数(如果xs是列表,那就是xs中每一个元素分别求导),返回值是一个与xs长度相同的列表。
例如ys=[y1,y2,y3], xs=[x1,x2,x3,x4],那么tf.gradients(ys,xs)=[d(y1+y2+y3)/dx1,d(y1+y2+y3)/dx2,d(y1+y2+y3)/dx3,d(y1+y2+y3)/dx4].具体例子见下面代码第16-17行。
2. grad_ys 是ys的加权向量列表,和ys长度相同,当grad_ys=[q1,q2,g3]时,tf.gradients(ys,xs,grad_ys)=[d(g1*y1+g2*y2+g3*y3)/dx1,d(g1*y1+g2*y2+g3*y3)/dx2,d(g1*y1+g2*y2+g3*y3)/dx3,d(g1*y1+g2*y2+g3*y3)/dx4].具体例子见下面代码第19-21行。
3. stop_gradients使得指定变量不被求导,即视为常量,具体的例子见官方例子,此处省略。
1 import tensorflow as tf 2 w1 = tf.Variable([[1,2]]) 3 w2 = tf.Variable([[3,4]]) 4 res = tf.matmul(w1, [[2],[1]]) 5 6 #ys必须与xs有关,否则会报错 7 # grads = tf.gradients(res,[w1,w2]) 8 #TypeError: Fetch argument None has invalid type <class 'NoneType'> 9 10 # grads = tf.gradients(res,[w1]) 11 # # Result [array([[2, 1]])] 12 13 res2a=tf.matmul(w1, [[2],[1]])+tf.matmul(w2, [[3],[5]]) 14 res2b=tf.matmul(w1, [[2],[4]])+tf.matmul(w2, [[8],[6]]) 15 16 # grads = tf.gradients([res2a,res2b],[w1,w2]) 17 #result:[array([[4, 5]]), array([[11, 11]])] 18 19 grad_ys=[tf.Variable([[1]]),tf.Variable([[2]])] 20 grads = tf.gradients([res2a,res2b],[w1,w2],grad_ys=grad_ys) 21 # Result: [array([[6, 9]]), array([[19, 17]])] 22 23 with tf.Session() as sess: 24 tf.global_variables_initializer().run() 25 re = sess.run(grads) 26 print(re)