PyTorch【3】-Autograd

先做入门讲解，后面慢慢更新

本教程环境 pytorch 1.3以上

Variable

变量 variable 是对张量 tensor 的封装，所以它具有 tensor 的大部分属性方法；

variable 用来构建计算图；

variable 包括 data、grad、grad_fn 3 个属性；

　　// data 获取它的 Tensor 值，

　　// 梯度保存在 grad 中，

　　// grad_fn 记录生成该 variable 的函数，在反向传播时对该函数求导，

　　// requires_grad 是否需要求导，详见下文

　　// is_leaf 是否为叶子节点，详见下文

注意，在 1.0+ 版本中，variable 已经被废弃，但是还能用，它对应的功能已经移植到 Tensor 中

requires_grad

在 pytorch 中，每个 Tensor 都有 requires_grad 属性，默认值是 False，代表子图不参与梯度计算，这样做可以提高效率；　　　　【子图，计算图的概念，和 tf 类似，后期会补充】

示例

x = t.randn(3, 3)
y = t.randn(3, 3)
z = t.randn((3, 3), requires_grad=True)

print(x.requires_grad)      # False     ### 只有当全部输入 requires_grad 都为 False，输出的 requires_grad 才是 False，此时该节点不参与梯度
print(z.requires_grad)      # True
a = x + z
print(a.requires_grad)      # True      ### 如果一个节点对应的输入有一个 requires_grad 为 True，该节点的输出的 requires_grad 也是 True，

一个节点对应的输入有一个 requires_grad 为 True，该节点输出的 requires_grad 就为 True；

当 subgraph 中所有 Tensor 都不需要梯度时，在反向传播时就无需 backward computation 了；当你提前知道某些 Tensor 不需要参与梯度时，该属性很有用

注意：该属性只针对浮点型 Tensor

如果这样创建 Tensor

b = t.tensor(2, requires_grad=True)

在反向传播时会报如下错误

RuntimeError: Only Tensors of floating point dtype can require gradients

正确的方式是

b = t.tensor(2., requires_grad=True)

计算图

计算图是记录运算的有向无环图，包括节点和边，节点代表数据，即 Tensor，边代表运算；

is_leaf

叶子节点：用户创建的张量，可以理解为计算图中最底层的节点；

叶子节点的作用是，在反向传播过程中，如果是叶子节点，其梯度自动保存，

如果为非叶子节点，其梯度会自动释放，释放以后就无法再次获得它的 grad，

如果想让非叶子节点保存梯度，需要 tensor.retain_grad ，例如

w x 为叶子节点，a b y 为非叶子节点，a 设置了 retain_grad，故 w x a 有梯度， b y 无梯度

Autograd

autograd 的意思是对一个函数自动求导，返回导函数；

在 pytorch 中，autograd 是一个模块，Variable 类就属于这个模块；

backward

def backward(self, gradient=None, retain_graph=None, create_graph=False):
        r"""Computes the gradient of current tensor w.r.t. graph leaves.

        Arguments:
            gradient (Tensor or None): Gradient w.r.t. the
                tensor. If it is a tensor, it will be automatically converted
                to a Tensor that does not require grad unless ``create_graph`` is True.
                None values can be specified for scalar Tensors or ones that
                don't require grad. If a None value would be acceptable then
                this argument is optional.
            retain_graph (bool, optional): If ``False``, the graph used to compute
                the grads will be freed. Note that in nearly all cases setting
                this option to True is not needed and often can be worked around
                in a much more efficient way. Defaults to the value of
                ``create_graph``.
            create_graph (bool, optional): If ``True``, graph of the derivative will
                be constructed, allowing to compute higher order derivative
                products. Defaults to ``False``.
        """
        torch.autograd.backward(self, gradient, retain_graph, create_graph)

Variable 对象可以调用 backward 方法实现反向传播，自动计算梯度；

gradient：它的形状要与 Variable 保持一致；

也就是说，如果 y 是标量，无需 gradient，如果 y 是向量，需要设置 gradient

y 是标量

import torch
from torch.autograd import Variable
x = Variable(torch.Tensor([16]), requires_grad=True) # 需要求导数
y = x * x
print(y)        # tensor([256.], grad_fn=<MulBackward0>)
### y 是标量，无需额外参数
y.backward()
print(x.grad)   # tensor([32.])  ### 2x=32

y 是矢量

torch.manual_seed(10000)
a = torch.ones(2, 2, requires_grad=True)
b = torch. ones(2, 2, requires_grad=True)
c = a + 2 * b
print(c)                ### c 非标量，也就是向量，也就是 非叶节点
# tensor([[3., 3.],
#         [3., 3.]], grad_fn=<AddBackward0>)
### 此时需要添加参数 gradient
d = torch.randn(2, 2)   ### gradient 与 y 形状相同
print(d)
# tensor([[ 2.0065,  1.9535],
#         [ 0.1517, -0.4269]])
c.backward(d)
print(a.grad)
# tensor([[ 2.0065,  1.9535],
#         [ 0.1517, -0.4269]])          ### a.grad = d * c'/a' = d * 1 = d
print(b.grad)
# tensor([[ 4.0129,  3.9069],
#         [ 0.3034, -0.8538]])          ### b.grad = d * c'/b' = d * 2 = 2d

直接拿 gradient 乘以对应梯度即可

其实可以理解为多梯度权重

如下，有多个 loss，gradient 对每个 loss 进行加权

如果 y 是矢量，却没有指定 gradient，报如下错误

RuntimeError: grad can be implicitly created only for scalar outputs

retain_graph：如果为 True，再次求导时，导数会被累加，如果为 False，再次求导时报错

反向传播需要缓存一些中间结果，反向传播完成后，这些缓存被释放，如果为 True，代表不释放缓存，故累加；

默认释放，False

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x.pow(2)print(y)        # tensor([4.])

y.backward(retain_graph=True)
print(x.grad)   # tensor([4.])

### 如果上次 backward 没有retain_graph=True，再次调用 backward 会报如下错误
# RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
y.backward()
print(x.grad)   # tensor([8.])      ### 梯度累加

特殊示例

下面的例子中可以多次调用 backward，而无需 retain_graph = True

import torch as t
from torch.autograd import Variable

x = Variable(t.ones(2, 2), requires_grad=True)  ### 对 Tensor 对象进行封装，形成 autograd 的 Variable 对象
print(x)
# tensor([[1., 1.],
#         [1., 1.]], requires_grad=True)

y = x.sum()     ### Variable 对象可以直接调用 Tensor 对象的方法
print(y)                    # tensor(4., grad_fn=<SumBackward0>)
print(y.data)               # tensor(4.)
print(y.grad_fn)            # <SumBackward0 object at 0x0000000001E9F2E8>
print(y.requires_grad)      # True

y.backward()    ### 反向传播，计算梯度，相当于开启梯度之门，其实啥也没干,只是每次计算梯度都要开一次门
### 开一次门，可以多次计算，但是结果是一样的
print(x.grad)   ### 计算在 x 处的梯度
print(x.grad)
# tensor([[1., 1.],
#         [1., 1.]])

### grad 在反向传播的过程中是累加的，意思是每次迭代计算梯度，是本次梯度与之前所有梯度的累计和
y.backward()    ### 重新开门
print(x.grad)
# tensor([[2., 2.],
#         [2., 2.]])

### 如果不想累积所有梯度和，需要清零操作
x.grad.data.zero_()     ### 梯度清 0
y.backward()            ### 再次开门
print(x.grad)           ### 梯度没有累加
# tensor([[1., 1.],
#         [1., 1.]])

就是因为 x 是叶子节点，其梯度不自动释放，可多次获取

注意几点：

1. 每次计算梯度，都需要显示调用 backward 方法

2. 梯度值的计算会自动累加，如果不想累加，需要显示调用 grad.data.zero_ 方法

3. 上面的 Variable 对象可以替换成 Tensor 对象

x = t.ones(2, 2, requires_grad=True)        ### 不用 Variable 对象，直接用 Tensor 对象也可以

梯度

梯度不多讲，这里简单验证下 backward 的梯度计算是否正确

示例

import torch as t

x = t.Tensor([1])
a = t.tensor(2., requires_grad=True)
b = t.tensor(2., requires_grad=True)
c = t.tensor(3., requires_grad=True)

loss = a * a * x + b * x + c

print(a.grad, b.grad, c.grad)       # None None None
loss.backward()
print(a.grad, b.grad, c.grad)       # tensor(4.) tensor(1.) tensor(1.)

手动计算梯度

对 a 求导，a^2x 求导为 x2a，x=1， a=4，故导数为 4

对 b 求导，bx 求导为 x，x=1，故导数为 1

求导的另一种方式

import torch as t
from torch import autograd

x = t.Tensor([1])
a = t.tensor(2., requires_grad=True)
b = t.tensor(2., requires_grad=True)
c = t.tensor(3., requires_grad=True)
loss = a * a * x + b * x + c

print(a.grad, b.grad, c.grad)       # None None None
grads = autograd.grad(loss, [a, b, c])
print(grads)                        # (tensor(4.), tensor(1.), tensor(1.))

注意，不要和 backward 混用

2022-7-22 补充

仔细读读注释

'''
Pytorch 在 autograd 模块中实现了计算图功能，autograd 核心数据结构是 variable,
variable 对 tensor 进行了封装，并通过记录对 tensor 的操作 来构建计算图；

variable 属性
1.data：对应的 tensor 值
2.grad：data 对应的梯度
3.grad_fn：记录variable的操作历史，以构建计算图；如果一个变量是用户创建，则它是叶子节点，grad_fn为None

variable 的创建
1.tensor：指定tensor值
2.requires_grad:是否需要求导
3.volatile：译为“挥发”，如果为True，构建在该 variable 上的 计算题 都不会求导，专为 推理阶段 设计

variable 支持 大部分 tensor 操作，但不支持部分 inplace 操作，因为 inplace 会修改 tensor 本身；

'''

import torch as t
from torch.autograd import Variable as V


a = V(t.ones(2, 3), requires_grad=True)
print(a.data, a.grad, a.grad_fn)
# tensor([[1., 1., 1.],
#         [1., 1., 1.]]) None None

b = V(t.ones(2, 3))
print(b.data, b.grad, b.grad_fn)
# tensor([[1., 1., 1.],
#         [1., 1., 1.]]) None None

c = t.add(a, b).pow(2)
print(c.data, c.grad, c.grad_fn, c.requires_grad)
# tensor([[2., 2., 2.],
#         [2., 2., 2.]]) None <AddBackward0 object at 0x000001BD538C9390>   True
'''
小结
1. 叶子节点 a b grad_fn 为 None，而 c 的 grad_fn 为 add 操作
2. 在 没有反向传播 backward 时， a b c grad 都是 None，没有反向传播何来导数呢
'''

print(type(c.data.sum()))   # <class 'torch.Tensor'>
print(type(c.sum()))        # <class 'torch.Tensor'>
d = c.sum()     # d 转成标量

'''
def backward(self, gradient=None, retain_graph=None, create_graph=False, inputs=None)
有 中间变量 loss’/y' * y'/s' * s'/w'；无 中间变量 loss’/w'
retain_graph: 保留计算图；默认值None，表示在每次 backward 后，会把 计算图 释放掉；
    一般情况下，每次迭代需 一次 forward 和 一次 backward，但不排除如 自定义loss的复杂性，需一次 forward，
    多个不同 loss 的 backward 来累积同一个网络的 grad；
    于是，如果在 backward 后不再执行 forward 就可以接着 backward，需要在第一次 backward 时指定保留计算图，True;
    *** 如果 graph 中只有 叶子节点，则无需 保留计算图也能获取梯度，因为 叶子节点自动保存梯度
'''

d.backward(retain_graph=True)
# d.backward()
print(a.grad)
# tensor([[1., 1., 1.],
#         [1., 1., 1.]])
print(c.grad)               # None 缓存被清空
'''
在 backward 后 c 的 grad 为 None，
因为 c 不是叶子节点，它的 grad 是用来计算 a 的 grad，故其 grad 用完之后被释放
'''
print(a.grad)
# tensor([[1., 1., 1.],
#         [1., 1., 1.]])

a.grad.data.zero_()     # 多次 backward grad 不累加
d.backward()
print(a.grad)   # 叶子节点自动保存梯度
'''
1.如果 c = a + b，则 a 的梯度 是 1，相当于graph 中 只有叶子节点，此时 第一次 backward 无需 保留计算图也可以
2.如果 c = (a + b)^2，则 a 的梯度是 2(a+b)，计算图不止叶子节点，故须 第一次 backward 保留计算图
'''

torch.no_grad

无需计算梯度，常用于推理阶段

### t.no_grad 禁止梯度的计算,或者说无需计算梯度
# torch.no_grad 用于代替旧版本的 volatile=True
a = t.tensor([1.0, 2.0], requires_grad=True)
with t.no_grad():
    b = a.pow(2).sum()      # a 的 requires_grad=True,得到的 b 却是 false

print(b.requires_grad)  # False

### torch.enable_grad() 需要求导,与 no_grad 相反

参考资料：

https://blog.csdn.net/g11d111/article/details/83035270

https://blog.csdn.net/byron123456sfsfsfa/article/details/92210253

https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_autograd.html

https://www.cnblogs.com/luckyscarlett/p/10552747.html

https://www.cnblogs.com/marsggbo/p/11549631.html

https://www.baidu.com/link?url=kBjQHZOgepw8_bODqt9YcDv2l2AcfZx0Zdub1EKz4uVZXF5W-8G6bVDldspceWLX&wd=&eqid=8ed20b9600038f17000000065e1d7a21

https://blog.csdn.net/shiheyingzhe/article/details/83054238　　Pytorch中backward函数

发表于 2020-01-15 17:20 努力的孔子阅读(1063) 评论(0) 编辑收藏举报

刷新页面返回顶部