[自然语言处理先修]五、卷积神经网络

五、卷积神经网络

学习路线参考:

https://blog.51cto.com/u_15298598/3121189

https://github.com/Ailln/nlp-roadmap

https://juejin.cn/post/7113066539053482021

本节学习使用工具&阅读文章:

https://easyai.tech/ai-definition/cnn/

https://blog.csdn.net/weixin_44912159/article/details/105345760

1. 概述

卷积神经网络是神经网络的一个分支,其特色为包含卷积计算。CNN可以进行监督学习和非监督学习,有着较强的数据特征提取能力,且机器学习效果稳定,不依赖特征工程。多用于图像处理。

CNN中除了全连接层还存在着两种特有的网络层:卷积层、池化层。

CNN结构

2. 卷积层

用于提取输入矩阵的特征。

卷积层的计算原理是卷积核在输入矩阵上进行滑动,每滑动一次,就将滑动区域的元素和自身相乘并累加,从而计算出输出矩阵中对应位置的元素。

卷积核是可学习的参数,相当于权重。

卷积层计算过程

假设输入矩阵的规模为\(m*n\),滑动步长为\(t\),卷积核规模为\(k*k\),则卷积层的输出矩阵尺寸为\((m-k+t)*(n-k+t)\)。通常滑动步长为1。

  • 前向传播

    假设输入集合为\(X=\{X_1,X_2,…,X_n\}\),卷积核矩阵集合为\(W=\{W_1,W_2,…,W_n\}\),卷积输出矩阵为\(S\),其规模为\(i*j\),偏置量为\(b\)。则卷积过程可以表示为:\(S=(X*W)+b=\sum^n_k(X_k*W_k)+b\)

    假设卷积层具有激活函数\(\theta(x)\),则卷积层的输出结果为\(\theta(S)=\theta(\sum^n_k(X_k*W_k)+b)\)

3. 池化层

用于对信息进行抽样,简化输入数据的同时保证特征不变性。

卷积层的计算原理是池化核在输入矩阵上进行滑动,按照池化规则进行计算。通常池化核的计算方法一般有最大值池化和平均值池化两种。

池化层计算过程

4. Pytorch实现

  1. 数据准备(使用MNIST数据集)

    import torch
    from torch import nn
    from torch.utils.data import DataLoader
    from torchvision import datasets
    from torchvision.transforms import ToTensor, Lambda, Compose
    import matplotlib.pyplot as plt
    
    # 载入训练集
    training_data = datasets.MNIST(
        root="data",
        train=True,
        download=True,
        transform=ToTensor()
    )
    
    # 载入测试集
    test_data = datasets.MNIST(
        root="data",
        train=False,
        download=True,
        transform=ToTensor()
    )
    
    print(training_data.train_data.size())   # [60000,28,28]
    plt.imshow(training_data.train_data[0].numpy()) # 展示第一张图片
    plt.show()
    

    展示第一张图片

    batch_size = 128
    
    # 创建数据管道
    train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True)
    test_dataloader = DataLoader(test_data, batch_size=batch_size)
    
    # 检查数据形状
    for X, y in test_dataloader:
        print("Shape of X [N, C, H, W]: ", X.shape, X.dtype)
        print("Shape of y: ", y.shape, y.dtype)
        break
        
    # N: 一个batch中的data实例数量
    # C: 通道数
    # [H, W]: 图片的高和宽
    
  2. 网络搭建

    class CNN(nn.Module):
        def __init__(self):
            super(CNN, self).__init__()
            self.conv1 = nn.Sequential(
                nn.Conv2d(1, 16, kernel_size=3, padding=1), 
                # input:1 * 28 * 28, output:16 * 28 * 28 
                nn.ReLU(),
                nn.MaxPool2d(2) # input:16 * 28 * 28, output:16 * 14 * 14 
                ) 
    
            self.conv2 = nn.Sequential(
                nn.Conv2d(16, 32, kernel_size=3, padding=1), 
                # input:16 * 14 * 14, output:32 * 14 * 14 
                nn.ReLU(),
                nn.MaxPool2d(2) # input:32 * 14 * 14, output:32 * 7 * 7 
                ) 
    
            self.classifier = nn.Sequential(
                nn.Linear(32 * 7 * 7, 10) # 将输出分成10类
                )
    
        def forward(self, x):
          x = self.conv1(x)
          x = self.conv2(x)
          x = x.view(x.size(0), -1) # [batch, 32, 7, 7] → [batch, 32*7*7]
          out = self.classifier(x)
          return out
    
    model = CNN()
    print(model)
    
    CNN(
      (conv1): Sequential(
        (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
        (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (conv2): Sequential(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
        (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (classifier): Sequential(
        (0): Linear(in_features=1568, out_features=10, bias=True)
      )
    )
    
  3. 损失函数与优化器定义

    loss_fn = nn.CrossEntropyLoss() # 交叉熵
    
    optimizer = torch.optim.Adam(model.parameters()) # Adam优化
    
  4. 模型训练

    # epochs: 迭代次数
    epochs=10
    
    for i in range(epochs): # 每个epoch的迭代
        model.train() # 训练模式
        train_loss=0
        for j, (X, y) in enumerate(train_dataloader): # 每个batch的迭代
            # 前向传播
            pred = model(X)
            # 计算损失
            loss = loss_fn(pred, y)
            train_loss += loss.item()
            # 反向传播
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
            # 每100个batch输出损失值
            if j % 100 == 0:
                loss = loss.item()
                print(f"epoch {i} batch {j} loss: {loss/batch_size:>7f}")
        
        # 每次迭代结束后输出测试结果  
        with torch.no_grad():
            model.eval() # 评估模式
            test_loss=0
            hit=0
            for (X, y) in test_dataloader:
                pred = model(X)
                test_loss += loss_fn(pred, y).item()
                hit += (pred.argmax(1) == y).sum().item()
            print(f"epoch {i}, train loss: {train_loss/len(train_dataloader.dataset):>7f} test loss: {test_loss/len(test_dataloader.dataset):>7f} accuracy: {hit/len(test_dataloader.dataset) :>7f}")    
    
    • optimizer.zero_grad()

      清空历史梯度。

      根据pytorch中的backward()函数的计算,当网络参量进行反馈时,梯度是被积累的而不是被替换掉。batch之间并不需要累积梯度,因此每个batch都要zero_grad。

    • loss.backward()

      进行反向传播,并计算梯度。

    • optimizer.step()

      优化器对权重值进行更新。

    • with torch.no_grad()

      停止autograd模块的工作,以起到加速和节省显存的作用。它的作用是将该with语句包裹起来的部分停止梯度的更新。

  5. 结果

    epoch 0 batch 0 loss: 0.017991
    epoch 0 batch 100 loss: 0.018041
    epoch 0 batch 200 loss: 0.018068
    epoch 0 batch 300 loss: 0.018066
    epoch 0 batch 400 loss: 0.018037
    epoch 0, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 1 batch 0 loss: 0.018085
    epoch 1 batch 100 loss: 0.017887
    epoch 1 batch 200 loss: 0.018049
    epoch 1 batch 300 loss: 0.018109
    epoch 1 batch 400 loss: 0.018097
    epoch 1, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 2 batch 0 loss: 0.017934
    epoch 2 batch 100 loss: 0.017987
    epoch 2 batch 200 loss: 0.018088
    epoch 2 batch 300 loss: 0.017946
    epoch 2 batch 400 loss: 0.018093
    epoch 2, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 3 batch 0 loss: 0.017929
    epoch 3 batch 100 loss: 0.017884
    epoch 3 batch 200 loss: 0.018053
    epoch 3 batch 300 loss: 0.017972
    epoch 3 batch 400 loss: 0.017919
    epoch 3, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 4 batch 0 loss: 0.018011
    epoch 4 batch 100 loss: 0.017994
    epoch 4 batch 200 loss: 0.017952
    epoch 4 batch 300 loss: 0.018021
    epoch 4 batch 400 loss: 0.018020
    epoch 4, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 5 batch 0 loss: 0.018102
    epoch 5 batch 100 loss: 0.018018
    epoch 5 batch 200 loss: 0.018042
    epoch 5 batch 300 loss: 0.018090
    epoch 5 batch 400 loss: 0.017981
    epoch 5, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 6 batch 0 loss: 0.017944
    epoch 6 batch 100 loss: 0.017992
    epoch 6 batch 200 loss: 0.018033
    epoch 6 batch 300 loss: 0.018052
    epoch 6 batch 400 loss: 0.018094
    epoch 6, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 7 batch 0 loss: 0.018016
    epoch 7 batch 100 loss: 0.018022
    epoch 7 batch 200 loss: 0.018112
    epoch 7 batch 300 loss: 0.018066
    epoch 7 batch 400 loss: 0.018044
    epoch 7, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 8 batch 0 loss: 0.018011
    epoch 8 batch 100 loss: 0.018112
    epoch 8 batch 200 loss: 0.018002
    epoch 8 batch 300 loss: 0.018023
    epoch 8 batch 400 loss: 0.018140
    epoch 8, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    epoch 9 batch 0 loss: 0.018014
    epoch 9 batch 100 loss: 0.017930
    epoch 9 batch 200 loss: 0.018003
    epoch 9 batch 300 loss: 0.017927
    epoch 9 batch 400 loss: 0.017920
    epoch 9, train loss: 0.018034 test loss: 0.018228 accuracy: 0.099000
    

posted @ 2023-03-07 21:47  无机呱子  阅读(19)  评论(0编辑  收藏  举报