[转载]手把手微调一个模型

手把手微调一个模型

原文链接:https://blog.csdn.net/Ying_M/article/details/117932055

原文链接:https://blog.csdn.net/qq_36825778/article/details/104213056,这个更详细一些

官方代码:https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

模型微调步骤

  • 1 获取预训练模型参数(可以认为是源任务中学习到知识)
  • 2 加载模型(load_state_dict)
  • 3 修改输出层 ,以适应新的任务

模型微调训练方法

  • 1 固定预训练的参数(requires_grad = False;lr = 0)

    比如,图中的卷积神经网络,我们有时候会固定特征提取的部分,也就是这一系列卷积层的参数,固定它们不进行训练,这是因为有的时候,新任务的数据量比较小,我们不足以去训练那么多的参数,同时,我们也认为前面的特征提取的部分,它们的参数是非常有共性的,所以可以固定这些参数,让这些参数不更新,而这个具体的操作在pytorch中有两种方法:

    ①可以设置requires_grad = False,也就是说这些参数不需要计算梯度,即不会再进行更新;

    ②设置学习率lr为0,即更新的步伐为0,所以也不会去更新。这就完成了参数的固定。

  • 2 Features Extractor较小学习率(params_group)
    将特征提取这部分设置较小的学习率,这时就需要用到params_group(在优化器那部分介绍过),优化器可以对不同的参数组设置不同的超参数,这里就可以对不同的参数设置不同的学习率,让特征提取部分的学习率较小,而全连接层部分的学习率较大,这就实现了不同的参数设置不同的学习率。

【举例】
如何在pytorch中实现模型的finetune呢?下面的例子采用Resnet-18进行Finetune,用一个在image net训练好的Resnet-18,然后应用到新任务中,我们的目标任务(即新任务)是一个蚂蚁蜜蜂二分类任务。
蚂蚁蜜蜂二分类数据:

训练集:各120~张
验证集:各70~张
可以看出这是一个数据量非常小的任务,将用一个Resnet-18进行Finetune。

数据:https://download.pytorch.org/tutorial/hymenoptera_data.zip,下载到data目录并解压

模型:Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /Users/xuehuiping/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth,自动下载即可

具体代码参考:

https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

(1)Finetuning the convnet

Load a pretrained model and reset final fully connected layer.

# 微调模型。将原始的类别改为2个类别
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_ft.fc = nn.Linear(num_ftrs, 2)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

It should take around 15-25 min on CPU. On GPU though, it takes less than a minute.

CPU需要15-25分钟,GPU需要不到一分钟。

# 训练模型
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25)

我在笔记本运行,大约1个小时

Epoch 0/24
----------
train Loss: 0.6266 Acc: 0.7049
val Loss: 0.2417 Acc: 0.9281

Epoch 1/24
----------
train Loss: 0.5094 Acc: 0.7664
val Loss: 0.2195 Acc: 0.9216

Epoch 2/24
----------
train Loss: 0.5164 Acc: 0.8033
val Loss: 1.4429 Acc: 0.7255

Epoch 3/24
----------
train Loss: 0.6872 Acc: 0.8115
val Loss: 0.8149 Acc: 0.7190

Epoch 4/24
----------
train Loss: 0.5251 Acc: 0.8033
val Loss: 0.6797 Acc: 0.7908

Epoch 5/24
----------
train Loss: 0.4976 Acc: 0.8115
val Loss: 0.2709 Acc: 0.9281

Epoch 6/24
----------
train Loss: 0.4489 Acc: 0.8361
val Loss: 1.1289 Acc: 0.6471

Epoch 7/24
----------
train Loss: 0.4221 Acc: 0.8361
val Loss: 0.2723 Acc: 0.8954

Epoch 8/24
----------
train Loss: 0.3519 Acc: 0.8443
val Loss: 0.2639 Acc: 0.9150

Epoch 9/24
----------
train Loss: 0.3226 Acc: 0.8770
val Loss: 0.2624 Acc: 0.9216

Epoch 10/24
----------
train Loss: 0.4007 Acc: 0.8074
val Loss: 0.2254 Acc: 0.9085

Epoch 11/24
----------
train Loss: 0.3117 Acc: 0.8607
val Loss: 0.2576 Acc: 0.9150

Epoch 12/24
----------
train Loss: 0.3058 Acc: 0.8689
val Loss: 0.2468 Acc: 0.9150

Epoch 13/24
----------
train Loss: 0.3937 Acc: 0.8402
val Loss: 0.2383 Acc: 0.9150

Epoch 14/24
----------
train Loss: 0.3074 Acc: 0.8689
val Loss: 0.2310 Acc: 0.9281

Epoch 15/24
----------
train Loss: 0.2507 Acc: 0.8893
val Loss: 0.2339 Acc: 0.9150

Epoch 16/24
----------
train Loss: 0.2654 Acc: 0.8934
val Loss: 0.2625 Acc: 0.9150

Epoch 17/24
----------
train Loss: 0.3485 Acc: 0.8607
val Loss: 0.2403 Acc: 0.9150

Epoch 18/24
----------
train Loss: 0.2292 Acc: 0.8893
val Loss: 0.2235 Acc: 0.9085

Epoch 19/24
----------
train Loss: 0.2193 Acc: 0.9139
val Loss: 0.2482 Acc: 0.9216

Epoch 20/24
----------
train Loss: 0.2983 Acc: 0.8484
val Loss: 0.2230 Acc: 0.9216

Epoch 21/24
----------
train Loss: 0.3287 Acc: 0.8320
val Loss: 0.2266 Acc: 0.9150

Epoch 22/24
----------
train Loss: 0.2779 Acc: 0.8689
val Loss: 0.2431 Acc: 0.9085

Epoch 23/24
----------
train Loss: 0.2427 Acc: 0.8934
val Loss: 0.2430 Acc: 0.9150

Epoch 24/24
----------
train Loss: 0.1997 Acc: 0.9057
val Loss: 0.2354 Acc: 0.9150

Training complete in 54m 17s
Best val Acc: 0.928105

(2)ConvNet as fixed feature extractor

model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

这个需要前面方法的一半时间

model_conv = train_model(model_conv, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=25)

更进一步

https://pytorch.org/tutorials/intermediate/quantized_transfer_learning_tutorial.html

posted on 2022-04-20 11:09  宋岳庭  阅读(522)  评论(0编辑  收藏  举报