PyTorch torch.nn.Module 类构建模型

PyTorch torch.nn.Module 类：官方说明文档

用于构建模型
自定义网络层（可以为模块、层、损失函数、激活函数等）

1. 模型基本框架

框架如下，其中 __init__() 初始化函数和 forward() 为必需。loss_func() 等其他函数，可以根据实际需要设置。

class MyModel(torch.nn.Module):
    def __init__(self, params):
        # params 传入模型的参数
        super(MyModel, self).__init__()
	# 放入需要学习的参数，一般由 nn.Layer() 或 nn.Parameter() 定义
    
    # 正向传播
    def forward(self, x):
        # x: 为传入的数据，第1个维度（x.size()[0]）为 batch size
        # 根据定义的参数，确定数据的传入顺序，构建模型
        return out
    
    # 损失函数
    def loss_func(self, out, target):
        # out：模型的输出，一般为预测值
        # target: 输出所对应的真实值
        return loss
    
    # 预测类别，用于分类模型
    def pred_label(self, prob):
        # 分类模型的 out 通常为 [0,1] 之间的概率形式，通过 torch.argmax() 函数也获取概率最大的标签
        label = torch.argmax(prob, dim, keepdim=False)
        return label

2. 方法

2.1 `torch.nn.Module()` 类主要方法

.parameters(recurse=True)：返回一个包含模型所有参数的迭代器，一般用于传递给优化器

.modules()：返回一个包含模型所有模块和子模块的迭代器

.zero_grad(set_to_none=False)：将模型参数的梯度归零

.to(device=None)：将模型部署在指定的设备（device）上

.cpu()：等价于 .to('cpu')
.cuda()：等价：.to(0)，.to('cuda:0')，.cuda('0')，.cuda('cuda:0')

.train(mode=True): 启动模型训练模式（training）

.eval(mode=True): 启动模型评估模型（evaluation）；等价于 .train(mode=False)

.requires_grad_(requires_grad=True)：改变模型参数是 Autograd 模式，即是否开启对模型参数的梯度追踪

通过 .requires_grad_(Fasle) 方法关闭某些层的学习参数的梯度计算，可以实现冻结层，在训练的过程中不会改变相应的学习参数。

2.2 `.zero_grad()` 方法使用

.zero_grad(set_to_none=False)：将模型参数的梯度归零

注意：当使用 optimizer（torch.optim.Optimizer）时，模型的参数（model.parameters()）会传入到优化器，此时，使用 optimizer.zero_grad() 的方法，实现对传入到 optimizer 的参数的梯度归零

实例参见博文：Pytorch torch.optim 模块优化器第 1.2.1 节关于 optimizer.zero_grad() 和 model.zero_grad() 的使用区别

2.3 训练模式和评估模型

.train(mode=True): 启动模型训练模式（training），Dropout, BatchNorm 将会起作用

.eval(mode=True): 启动模型评估模型（evaluation），Dropout, BatchNorm 将不会起作用

等价于 .train(mode=False)

2.4 GPU 部署相关

关于PyTorch 在 GPU 部署的具体内容，参考这篇博文：Pytorch GPU 加速

.to(device=None)：将模型部署在指定的设备（device）上；用于将模型部署在 GPU 上

.cpu()：等价于 .to('cpu')
.cuda()：等价：.to(0)，.to('cuda:0')，.cuda('0')，.cuda('cuda:0')

2.5 获取模型参数和模型结构

参考这篇博文的第 0 节，了解 torch.nn 模块的结构，理解模块、容器、学习参数、和模型的概念：Pytorch torch.nn 模块 - Layer 总结

2.5.1 获取参数

A. 遍历所有参数

.parameters()：返回所有学习参数（nn.parameter 类）的迭代器，可进一步对返回对象取 .data 属性（或 .detach() 方法）获取 tensor 类型的参数值

import torch
import torch.nn as nn
import torch.nn.functional as F

# 获取 nn.Linear() 层的参数尺寸
for param in layer_linear.parameters():
    print(type(param))
    print(param.data.size())
# Output:  <class 'torch.nn.parameter.Parameter'>
# Output:  torch.Size([5, 3])
# Output:  <class 'torch.nn.parameter.Parameter'>
# Output:  torch.Size([5])

# 对于 nn.Sigmoid() 层，由于没有学习参数，则为空
for tensor in layer_act.parameters():
    print(tensor.data.size())
# 不会输出任何东西

.named_parameters()：返回包含学习参数名和参数的迭代器

学习参数名的命名方式：<module_name>.<variable.name>

# 获取 nn.Linear() 层的参数名，以及参数尺寸
layer_linear = nn.Linear(3, 5)
for name, param in layer_linear.named_parameters():
    print(name, param.data.size())
# Output: <class 'torch.nn.parameter.Parameter'>
# Output: weight torch.Size([5, 3])
# Output: <class 'torch.nn.parameter.Parameter'>
# Output: bias torch.Size([5])

# 对于 nn.Sigmoid() 层，由于没有学习参数，则为空
layer_act = nn.Sigmoid()
for name, param in layer_act.named_parameters():
    print(name, param.data.size())
# 不会输出任何东西

.state_dict()：返回一个 collections.OrderedDict() 类型，key 为学习参数名，value 为 tensor 类型的参数值

可通过 .item() 方法枚举

for name, tensor in layer_linear.state_dict().items():
    print(type(tensor))
    print(name, tensor.size())
# Output: <class 'torch.Tensor'>
# Output: weight torch.Size([5, 3])
# Output: <class 'torch.Tensor'>
# Output: bias torch.Size([5])

# 只遍历参数名
for name in layer_linear.state_dict():
    print(name)
# Output: weight
# Output: bias

B. 获取特定的学习参数

根据迭代器的序数，学习参数名，可通过以下三种方式获得指定的学习参数

# .parameters() 方法的返回值，指定索引
list(layer_linear.parameters())[0].data

# .named_parameters() 方法的返回值，指定学习参数名
dict(layer_linear.named_parameters())['weight'].data

# .state_dict() 方法的返回值，指定学习参数名
layer_linear.state_dict()['weight']

2.5.2 获取模块

A. 遍历所有模块

.modules()：返回包含所有模块的迭代器，这个模块是指模型（model）中所有 nn.module（即，层类）的模块、容器类、以及其子模块。

.named_modules()：返回包含所有模块名称和模块的迭代器

.named_children()：返回模型的直接子模块的模块名称和子模块的迭代器

与 .named_modules() 方法的区别：不包括当前模型，也不会对子模块进行进一步的搜索

contain_seq = nn.Sequential(
    nn.Linear(3, 5), 
    nn.Sigmoid(), 
    nn.Sequential(
        nn.Linear(5, 2), 
        nn.Sigmoid()
    ))

# 获取模型中的所有参数
for param in contain_seq.parameters():
    print(type(param))
    print(param.data.size())

for name, param in contain_seq.named_parameters():
    print(type(param))
    print(name, param.data.size())

for name, tensor in contain_seq.state_dict().items():
    print(type(tensor))
    print(name, tensor.size())

# 获取模型中的 所有 模块
for module in contain_seq.modules():
    print(type(module))
    print(module)

# 获取模型中的 所有 模块名 和 模块
for name, module in contain_seq.named_modules():
    print(type(module))
    print(name, module)

# 获取模型中的 直接 子模块名 和 子模块
for name, module in contain_seq.named_children():
    print(type(module))
    print(name, module)

B. 获取指定的模块

对于模型类：可以直接通过类的属性方法获取在模型类中定义的模块（即，层）

根据迭代器的序数，模块名称，可通过以下三种方式获得指定的模块

# .modules() 方法的返回值，指定索引
list(contain_seq.modules())[0]

# .named_modules() 方法的返回值，指定模块名称
dict(contain_seq.named_modules())['2.0']

C. 获取指定的模块的参数

当获得模块对象后，可进一步根据 2.4.1 节的方式获得指定模块的参数

2.6 关闭梯度（冻结层）

.requires_grad_(requires_grad=True)：改变模型参数是 Autograd 模式，即是否开启对模型参数的梯度追踪

通过 .requires_grad_(Fasle) 方法关闭某些模块的学习参数的梯度计算，可以实现冻结层（或冻结学习参数），在训练的过程，不会更新相应的参数。多用于模型的微调
这是所有 nn.module 类（包括模型，学习参数，模块，容器）都有的方法

另外一种冻结参数的方法，只向优化器传递需要学习的参数，参见这篇博客的第 1.2.2 节

3. 实例：BPNN 神经网络的 4 种构建方法及分析

3.1 BPNN 神经网络的 4 种构建方法

方法 1：使用 `nn.Sequential()`

import torch
import torch.nn as nn

class BPNNModeler(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(BPNNModeler, self).__init__()

        self.model = nn.Sequential(
            nn.Linear(input_dim, hidden_dim), 
            nn.Sigmoid(),
            nn.Linear(hidden_dim, output_dim), 
            nn.Sigmoid(),
            nn.Flatten(0, -1)
        )
    
    def forward(self, x):
        out = self.model(x)
        return out

方法 2：全部使用层类

class BPNNModeler2(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(BPNNModeler2, self).__init__()
        # Layer 1
        self.layer1_linear = nn.Linear(input_dim, hidden_dim)
        self.layer1_sigmoid = nn.Sigmoid()
        # Layer 2
        self.layer2_linear = nn.Linear(hidden_dim, output_dim)
        self.layer2_sigmoid = nn.Sigmoid()
        # Output
        self.layer2_flattern = nn.Flatten(0, -1)
    
    def forward(self, x):
        # Layer 1
        out_layer1 = self.layer1_linear(x)
        out_layer1 = self.layer1_sigmoid(out_layer1)
        # Layer 2
        out_layer2 = self.layer2_linear(out_layer1)
        out_layer2 = self.layer2_sigmoid(out_layer2)
        # Output
        out = self.layer2_flattern(out_layer2)
        return out

方法 3：使用函数类型：`torch.sigmoid()` 和 `torch.flatten()`

class BPNNModeler3(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(BPNNModeler3, self).__init__()
        # Layer 1
        self.layer1_linear = nn.Linear(input_dim, hidden_dim)
        # Layer 2
        self.layer2_linear = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Layer 1
        out_layer1 = torch.sigmoid(self.layer1_linear(x))
        # Layer 2
        out_layer2 = torch.sigmoid(self.layer2_linear(out_layer1))
        # Output
        out = torch.flatten(out_layer2)
        return out

方法 4：使用 `nn.Parameter()`

class BPNNModeler4(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(BPNNModeler4, self).__init__()
        # Layer 1
        self.w1 = nn.Parameter(torch.rand((input_dim, hidden_dim)))
        self.b1 = nn.Parameter(torch.rand(hidden_dim))
        # Layer 2
        self.w2 = nn.Parameter(torch.rand((hidden_dim, output_dim)))
        self.b2 = nn.Parameter(torch.rand(hidden_dim))

    def forward(self, x):
        # Layer 1
        out_layer1 = torch.mm(x, self.w1) + self.b1
        out_layer1 = torch.sigmoid(out_layer1)
        # Layer 2
        out_layer2 = torch.mm(out_layer1, self.w2) + self.b2
        out_layer2 = torch.sigmoid(out_layer2)
        # Output
        out = torch.flatten(out_layer2)
        return out

3.2 区别

注意区分模型内，定义的模块，学习参数。

定以下函数：

def print_info(model):
    print("---------.parameters() 方法-------------------")
    for param in model.parameters():
        print(type(param), param.data.size())
    print("---------.named_parameters() 方法-------------")
    for name, param in model.named_parameters():
        print(type(param), name, param.data.size())
    print("---------.state_dict() 方法-------------------")
    for name, tensor in model.state_dict().items():
        print(type(tensor), name, tensor.size())
    print("---------.modules() 方法----------------------")
    for module in model.modules():
        print(type(module), module)
    print("---------.named_modules() 方法----------------")
    for name, module in model.named_modules():
        print(type(module), name, module)
    print("---------.named_children() 方法----------------")
    for name, module in model.named_children():
        print(type(module), name, module)

通过 print 模型的参数，模块，来理解 4 种模型的区别以及上述几种模型学习参数相关函数的用法及意义。

# 神经网络模型参数
HIDDEN_DIM = 10
INPUT_DIM = 5     
OUTPUT_DIM = 1

# 实例化一个神经网络模型
model = BPNNModeler(input_dim=INPUT_DIM, hidden_dim=HIDDEN_DIM, output_dim=OUTPUT_DIM)
model2 = BPNNModeler2(input_dim=INPUT_DIM, hidden_dim=HIDDEN_DIM, output_dim=OUTPUT_DIM)
model3 = BPNNModeler3(input_dim=INPUT_DIM, hidden_dim=HIDDEN_DIM, output_dim=OUTPUT_DIM)
model4 = BPNNModeler4(input_dim=INPUT_DIM, hidden_dim=HIDDEN_DIM, output_dim=OUTPUT_DIM)

print_info(model)
print_info(model2)
print_info(model3)
print_info(model4)

4. 模型结果评估

模型的学习参数的梯度是打开的（即 .requires_grad=True），因此，输入数据经由模型计算输出得到的数据的也是梯度是打开的。在评估模型时，需要取消梯度追踪，减少计算开销，有两种方式：

Tensor.detach()：Tensor 表示具体的一个 tensor 实例。该方法返回一个与计算图分离的 Tensor。返回的 Tensor 与原 Tensor 共用内存，用于取消追踪梯度。
- 如果 DetaSet 的 tensor 梯度是关闭（应当如此），则从 DataLoader 中加载的 tensor，自动梯度是关闭的（即 .requires_grad=False）
torch.no_grad()

实例： 评估验证集的损失函数

# 方式 1：
valid_total_loss = 0. 
with torch.no_grad():  #  取消梯度跟踪
    for X_valid, Y_valid in dl_valid:      # dl_valid: 需要定义的 dataloader 类
        out = model(X_valid)               # model: 需要定义的 nn.Model 类
        loss = loss_func(out, Y_valid)     # loss_func：损失函数
        valid_total_loss += loss.item()    # .item() 方法：将tensor类型转换为 python 原生的数字数据类型，只用于只有一个元素的tensor

# 方式 2：
valid_total_loss = 0. 
for X_valid, Y_valid in dl_valid:   # 从 DataLoader 中取出的 tensor，梯度是关闭的（即 .requires_grad = False）   
    out = model(X_valid).detach()   # 由于模型学习参数（即 model.parameters()）的梯度是打开的，所以，计算结果（out）的梯度也是打开的，需要通过 .detach() 返回关闭梯度的 tensor8
    loss = loss_func(out, Y_valid.flatten())
    valid_total_loss += loss.item()

参考资料

代码：Colab, Github

Pytorch Tutorial, MODULE, site

posted @ 2022-05-24 12:56 veager 阅读(571) 评论(0) 编辑收藏举报

刷新页面返回顶部

veager

PyTorch torch.nn.Module 类 构建模型