torch.nn ------ 参数Parameter与Module容器

作者：elfin 参考资料来源：torch.nn

一、Parameter
二、torch.nn之容器
- 2.1 Module

torch.nn是构建计算图的基础模块，model.train()、model.val()分别为模型的训练、评估模式。

一、Parameter

nn.parameter.Parameter(data=None, requires_grad=True)

将张量加入模型，通过requires_grad=True来进行控制是否可更新参数！与torch.tensor直接设置requires_grad=True的区别是直接设置不会将数据保存到model.parameter()中，那么你在保存模型参数的时候很可能就遗漏了关键数据，导致模型训练效果较好，用同样的数据测试，推理效果却很差！

参数介绍：

data: 参数张量
requires_grad: 是否计算梯度进行参数更新

关于另外两个：UninitializedParameter、UninitializedBuffer实际意义不大，我们最多用到from torch.nn.parameter import Parameter，这两个如果有特殊用处可以点击阅读原文告诉我！

Top---Bottom

二、torch.nn之容器

2.1 Module

所有神经网络模型的基类，你的模型应该是它的子类。模块还可以包含其他模块，允许将它们嵌套在树结构中。您可以将子模块分配为常规属性，下面我们以深度卷积depth-wise为例进行讲解：

class DepthWiseConv(nn.Module):
    """基于group分组实现的深度卷积"""

    def __init__(self, dim=768):
        super(DepthWiseConv, self).__init__()
        self.DWConv = nn.Conv2d(
            in_channels=dim, out_channels=dim,
            kernel_size=3, stride=1, padding=1,
            bias=True, groups=dim
        )

    def forward(self, x):
        x = self.DWConv(x)
        return x

这样在上面两个函数中实现的代码都会被注册，参数会保存到DepthWiseConv.parameter()中，注意如果你是自己实现了一个可训练的参数，必须使用Parameter进行封装！

属性变量training

控制模块是训练模式还是评估模式

>>> DepthWiseConv.training
AttributeError: type object 'DepthWiseConv' has no attribute 'training'
>>> DepthWiseConv.training = False
>>> DepthWiseConv.training
False

上面的代码告诉我们，模块本身是没有这个属性变量的！明显我们的代码中没有声明这个变量，那么实例化之后会不会就有了呢？

>>> model = DepthWiseConv()
>>> model.training
True

经过测试我们得出：模块实例化之后，默认是有属性变量training的，而且默认值是True，注意即使我设置了DepthWiseConv该属性为False，初始化实例后还是为True。这是因为这个参数是在Module类中实现的，即使我们自己写模块的时候声明了这个变量，在调用父类的__init__()时，这个变量还是会被初始化为True。

Top---Bottom

2.2.1 add_model

参数说明：

name: 添加的子模块名字
module：一个nn.Module子类模块

将子模块添加到当前模块，可以使用给定名称作为属性访问模块。如我们在自定义一个模块时，有时会有一系列操作，如果我们按照下面的方式给出，子模型将不会被注册：

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.layer = [nn.Linear(64, 64) for _ in range(5)]
    
    def forward(self, x):
    	x = self.layer(x)
        return x

我们调用这个模块：

>>> test = Test()
>>> list(test.modules())
[Test()]

不难发现layer子模块是没有被注册的！遇到这种情况我们可以选择使用ModuleList类进行封装，也可以使用add_model进行添加：

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.layer = [nn.Linear(64, 64) for _ in range(5)]
        for i, layer in self.layer:
            self.add_module(f"layer_{i}", layer)
    
    def forward(self, x):
    	x = self.layer(x)
        return x

2.2.2 apply

对所有子模块传给\(fn\)处理，\(fn\)是我们传给apply的函数体！

class Mlp(nn.Module):
    """
    定制多层感知器
    """

    def __init__(self, in_features, hidden_features=None, out_features=None,
                 act_layer=nn.GELU, drop=0., linear=False):
        super(Mlp, self).__init__()
        # 根据输入通道数进行隐层、输出的通道数
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.FC1 = nn.Conv2d(in_features, hidden_features, 1)
        self.DW_Conv = DepthWiseConv(in_features)
        self.ACT = act_layer()
        self.FC2 = nn.Conv2d(hidden_features, out_features, 1)
        self.DROP = nn.Dropout(drop)
        self.LINEAR = linear
        if self.LINEAR:
            self.ReLU = nn.ReLU(inplace=True)
        self.apply(self._init_weights)

    @staticmethod
    def _init_weights(m):
        if isinstance(m, nn.Linear):
            trunc_normal_(m.weight, std=.02)
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)
        elif isinstance(m, nn.Conv2d):
            fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
            fan_out //= m.groups
            m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))
            if m.bias is not None:
                m.bias.data.zero_()
        elif isinstance(m, nn.LayerNorm):
            nn.init.constant_(m.bias, 0)
            nn.init.constant_(m.weight, 1.)
        pass

    def forward(self, x):
        x = self.FC1(x)
        if self.LINEAR:
            x = self.ReLU(x)
        x = self.DW_Conv(x)
        x = self.ACT(x)
        x = self.DROP(x)
        x = self.FC2(x)
        x = self.DROP(x)
        return x

根据权值初始化函数我们可以知道，函数体每次处理的是子模块\(m\)，你可以使用model.children()进行子模块的查看！

Top---Bottom

2.2.3 buffers模型缓存

buffers模型缓存主要是将一些不训练的张量保存进state_dict中，如果普通张量没有使用set_buffers()接口进行注册，那么我们保存加载就不会有这个变量。

与buffers相关的类方法有：

buffers(): 获取所有已经注册的buffer对象；
get_buffer(target): 获取特定的buffer对象，target是buffer的名字；
named_buffers(prefix='', recurse=True): 返回带前缀的buffer迭代器，recurse参数控制是否循环获取子模块，如果为False，则只返回当前模块，不会寻找子模块；
register_buffer(name, tensor, persistent=True): 将张量注册为一个buffer，persistent控制模型的state_dict是否包含这个变量。

以上是模块关于缓存buffer的所有接口，我们下面来进行简单测试：

class BuffersTest(nn.Module):
    def __init__(self):
        super(BuffersTest, self).__init__()
        self.data1 = torch.tensor([1, 2, 3, 4])
        self.register_buffer("data2", torch.tensor([5, 6, 7, 8]))
        
    def forward(self, x):
        x = self.data1 * x + self.data2
        return x

现在我们实例化，查看模型的参数、缓存、state_dict：

>>> buffer_test = BuffersTest()      # 实例化
>>> list(buffer_test.buffers())      # 查看buffer
[tensor([5, 6, 7, 8])]
>>> list(buffer_test.parameters())
[]
>>> list(buffer_test.state_dict())
['data2']

我们添加一个可训练参数，再看三者发生什么变换：

>>> buffer_test.data3 = Parameter(data=torch.tensor([9, 10]).float())
>>> list(buffer_test.buffers())      # 查看buffer
[tensor([5, 6, 7, 8])]
>>> list(buffer_test.parameters())
[Parameter containing:
 tensor([ 9., 10.], requires_grad=True)]
>>> list(buffer_test.state_dict())
['data3', 'data2']

经过前后实验对比，我们不难发现，parameters与buffers都会在state_dict中出现，但是buffers()与parameters()是不同的对象，对于可训练的数据我们应该使用Parameter封装，对于不参与更新的张量且需要保存在模型里的我们需要使用register_buffer进行注册！

Top---Bottom

2.2.4 parameter相关方法

parameter相关方法是对需要梯度计算进行参数更新的数据集成的一类类方法。

parameters(recurse=True)
get_parameter(target)
named_parameters(prefix='', recurse=True)
register_parameter(name, param)

这些方法我们不再进行解释，完全可以参考buffer的相关方法！

Top---Bottom

2.2.5 module相关方法

所有的相关方法：

modules: 前面我们已经多次提到，这是我们必须掌握的基础方法
get_submodule(target)
register_module(name, module): add_module的别称
named_modules(memo=None, prefix='', remove_duplicate=True)

module相关方法和buffer类似，但是又不是完全一样的规则。主要是获取子模型上，因为module是当前模块，我们查找的时候是找它的子模块，所以接口是get_submodule(target)，同样地named_modules同样可以获取所有子模块，但是它是返回全部的，所以我们查询单个时需要使用get_submodule(target)降低复杂度。

named_modules方法和其他类型接口也不一样，这里除前缀参数外，有两个不同的参数：

memo=None memo是备忘实录，我们不用管这个参数
remove_duplicate=True 是否删除结果中重复的模块实例

原始的named_modules返回：

>>> for i, m in enumerate(model.named_modules()):
...:    print(i, "-->", m)
...:
0 --> ('', DepthWiseConv(
  (DWConv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
))
1 --> ('DWConv', Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768))

添加前缀的named_modules返回：

>>> for i, m in enumerate(model.named_modules(prefix="DW")):
...:    print(i, "-->", m)
...:
0 --> ('DW', DepthWiseConv(
  (DWConv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
))
1 --> ('DW.DWConv', Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768))

很明显子模块的名字前都加了目标前缀！

Top---Bottom

2.2.6 children和named_children

>>> list(model.children())
[Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)]
>>> list(model.named_children())
[('DWConv',
  Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768))]

named_children与children相比，前者返回是一个元组，包含了子模块实例的名字！

Top---Bottom

2.2.7 模块的数据类型改变

方法	Desc	inplace
bfloat16()	TPU专用数据类型，相当于float32截取前16位	True
double()	改变元数据为双精度	True
float()	将数据类型转换为float32	True
half()	将数据转换为半精度数据	True

2.2.8 设备选择

通过model.cpu()将模块置于cpu处理；通过model.cuda(device=None)选择将模块置于选择的显卡上进行处理！

2.2.9 钩子方法

register_backward_hook: 在模块上注册一个后向钩子
register_forward_hook
register_forward_pre_hook
register_full_backward_hook

方法解读参考：

https://blog.csdn.net/foneone/article/details/107099060 (hook机制理解及模块中间层输出)
https://blog.csdn.net/winycg/article/details/100695373 （torch获取中间层信息）

使用钩子函数我们可以获取模型中间层的信息，这样方便我们查看信息！

Top---Bottom

2.2.10 其他方法介绍

set_extra_state

设置额外的状态信息。如model.set_extra_state(state)，state是一个字典。我们可以使用gets_extra_state()获取添加的额外状态。

extra_repr

要打印自定义的额外信息，您应该在自己的模块中重新实现此方法。单行和多行字符串都可以接受。

def extra_repr(self):
    res = """
    正在打印:
    MODEL: DepthWiseConv
    """
    print(res)
    for m, v in self.named_children():
        print(m)
        print(v)

实现的内容将在实例化的时候进行打印！

>>> model = DepthWiseConv(6)
        正在打印:
        MODEL: DepthWiseConv
        
DWConv
Conv2d(6, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=6)

综上，不建议重构此方法！

requires_grad_

model.requires_grad_()默认开启参数更新，即梯度传播！返回模型本身！

zero_grad

将所有模型参数的梯度设置为零，set_to_none是否将梯度设置为none，默认是False。

type

inplace修改模型的参数和缓存数据类型

>>> model.type(torch.int32)
DepthWiseConv(
  (DWConv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=768)
)

修改模型参数的设备、数据类型，它有多种调用方式：

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

此方法的参数：

device： torch.device对象
dtype: torch.dtype对象
tensor：满足参数、缓存数据类型的张量
memory_format：此模块中 4D 参数和缓冲区的所需内存格式

案例：

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

to_empty

将参数和缓存移动到指定设备而不复制存储

dump_patches

如果从模块中添加(删除)了新的参数(缓存)，则该版本将发生冲突，并且模块的_load_from_state_dict方法可以比较版本号，如果状态字典来自更改之前，则可以进行适当的更改。

share_memory

将底层存储移动到共享内存，对于 CUDA 张量如果底层存储已经在共享内存中并且，这是一个空操作。共享内存中的张量无法调整大小。

xpu

将所有模型参数和缓冲区移动到 XPU。XPU资讯参考：https://blog.csdn.net/ybhuangfugui/article/details/116616954

同cuda()、cpu()方法类似！

Top---Bottom

完！

posted @ 2022-03-31 16:23 巴蜀秀才阅读(1437) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

巴蜀秀才

torch.nn ------ 参数Parameter与Module容器

torch.nn ------ 参数Parameter与Module容器

一、Parameter

二、torch.nn之容器

2.1 Module

2.2.1 add_model

2.2.2 apply

2.2.3 buffers模型缓存

2.2.4 parameter相关方法

2.2.5 module相关方法

2.2.6 children和named_children

2.2.7 模块的数据类型改变

2.2.8 设备选择

2.2.9 钩子方法

2.2.10 其他方法介绍

公告