


  2. torch.load:使用pickle的unpickling功能将pickle对象文件反序列化到内存。此功能还可以有助于设备加载数据。
  3. torch.nn.Module.load_state_dict:使用反序列化函数 state_dict 来加载模型的参数字典。





# 定义模型 class TheModelClass(nn.Module): def __init__(self): super(TheModelClass, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x # 初始化模型 model = TheModelClass() # 初始化优化器 optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9) # 打印模型的状态字典 print("Model's state_dict:") for param_tensor in model.state_dict(): print(param_tensor, "\t", model.state_dict()[param_tensor].size()) # 打印优化器的状态字典 print("Optimizer's state_dict:") for var_name in optimizer.state_dict(): print(var_name, "\t", optimizer.state_dict()[var_name])
  • 输出
Model's state_dict: conv1.weight torch.Size([6, 3, 5, 5]) conv1.bias torch.Size([6]) conv2.weight torch.Size([16, 6, 5, 5]) conv2.bias torch.Size([16]) fc1.weight torch.Size([120, 400]) fc1.bias torch.Size([120]) fc2.weight torch.Size([84, 120]) fc2.bias torch.Size([84]) fc3.weight torch.Size([10, 84]) fc3.bias torch.Size([10]) Optimizer's state_dict: state {} param_groups [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [4675713712, 4675713784, 4675714000, 4675714072, 4675714216, 4675714288, 4675714432, 4675714504, 4675714648, 4675714720]}]


2.1 保存/加载state_dict(推荐使用)

  • 保存
copy, PATH)
  • 加载
model = TheModelClass(*args, **kwargs) model.load_state_dict(torch.load(PATH)) model.eval()


在 PyTorch 中最常见的模型保存使‘.pt’或者是‘.pth’作为模型文件扩展名。

请记住,在运行推理之前,务必调用model.eval()去设置 dropout 和 batch normalization 层为评估模式。如果不这么做,可能导致

  • 注意

你保存的state_dict。例如,你无法通过 model.load_state_dict(PATH)来加载模型。

2.2 保存/加载完整模型

  • 保存
copy, PATH)
  • 加载
# 模型类必须在此之前被定义 model = torch.load(PATH) model.eval()

此部分保存/加载过程使用最直观的语法并涉及最少量的代码。以 Python `pickle 模块的方式来保存模型。这种方法的缺点是序列化数据受

在 PyTorch 中最常见的模型保存使用‘.pt’或者是‘.pth’作为模型文件扩展名。

请记住,在运行推理之前,务必调用model.eval() 设置 dropout 和 batch normalization 层为评估模式。如果不这么做,可能导致模型推断结果不一致。

3. 保存和加载 Checkpoint 用于推理/继续训练

  • 保存
copy{ 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss, ... }, PATH)
  • 加载
model = TheModelClass(*args, **kwargs) optimizer = TheOptimizerClass(*args, **kwargs) checkpoint = torch.load(PATH) model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) epoch = checkpoint['epoch'] loss = checkpoint['loss'] model.eval() # - or - model.train()

当保存成 Checkpoint 的时候,可用于推理或者是继续训练,保存的不仅仅是模型的 state_dict 。保存优化器的 state_dict 也很重要,

要保存多个组件,请在字典中组织它们并使用来序列化字典。PyTorch 中常见的保存checkpoint 是使用 .tar 文件扩展名。


请记住在运行推理之前,务必调用model.eval()去设置 dropout 和 batch normalization 为评估。如果不这样做,有可能得到不一致的推断结果。

4. 在一个文件中保存多个模型

  • 保存
copy{ 'modelA_state_dict': modelA.state_dict(), 'modelB_state_dict': modelB.state_dict(), 'optimizerA_state_dict': optimizerA.state_dict(), 'optimizerB_state_dict': optimizerB.state_dict(), ... }, PATH)
  • 加载
modelA = TheModelAClass(*args, **kwargs) modelB = TheModelBClass(*args, **kwargs) optimizerA = TheOptimizerAClass(*args, **kwargs) optimizerB = TheOptimizerBClass(*args, **kwargs) checkpoint = torch.load(PATH) modelA.load_state_dict(checkpoint['modelA_state_dict']) modelB.load_state_dict(checkpoint['modelB_state_dict']) optimizerA.load_state_dict(checkpoint['optimizerA_state_dict']) optimizerB.load_state_dict(checkpoint['optimizerB_state_dict']) modelA.eval() modelB.eval() # - or - modelA.train() modelB.train()

当保存一个模型由多个torch.nn.Modules组成时,例如GAN(对抗生成网络)、sequence-to-sequence (序列到序列模型), 或者是多个模
型融合, 可以采用与保存常规检查点相同的方法。换句话说,保存每个模型的 state_dict 的字典和相对应的优化器。如前所述,可以通

PyTorch 中常见的保存 checkpoint 是使用 .tar 文件扩展名。


请记住在运行推理之前,务必调用model.eval()去设置 dropout 和 batch normalization 为评估。如果不这样做,有可能得到不一致的推断结果。

5. 使用在不同模型参数下的热启动模式

  • 保存
copy, PATH)
  • 加载
modelB = TheModelBClass(*args, **kwargs) modelB.load_state_dict(torch.load(PATH), strict=False)


无论是从缺少某些键的 state_dict 加载还是从键的数目多于加载模型的 state_dict , 都可以通过在load_state_dict()函数中将strict参数设置为 False 来忽略非匹配键的函数。

如果要将参数从一个层加载到另一个层,但是某些键不匹配,主要修改正在加载的 state_dict 中的参数键的名称以匹配要在加载到模型中的键即可。

6. 通过设备保存/加载模型

6.1 保存到 CPU、加载到 CPU

  • 保存
copy, PATH)
  • 加载
device = torch.device('cpu') model = TheModelClass(*args, **kwargs) model.load_state_dict(torch.load(PATH, map_location=device))

当从CPU上加载模型在GPU上训练时, 将torch.device('cpu')传递给torch.load()函数中的map_location参数.在这种情况下,使用

6.2 保存到 GPU、加载到 GPU

  • 保存
copy, PATH)
  • 加载
device = torch.device("cuda") model = TheModelClass(*args, **kwargs) model.load_state_dict(torch.load(PATH)) # 确保在你提供给模型的任何输入张量上调用input =

当在GPU上训练并把模型保存在GPU,只需要使用'cuda')),将初始化的 model 转换为 CUDA 优化模型。另外,请

6.3 保存到 CPU,加载到 GPU

  • 保存
copy, PATH)
  • 加载
device = torch.device("cuda") model = TheModelClass(*args, **kwargs) model.load_state_dict(torch.load(PATH, map_location="cuda:0")) # Choose whatever GPU device number you want # 确保在你提供给模型的任何输入张量上调用input =

指定的GPU设备。接下来,请务必调用'cuda'))将模型的参数张量转换为 CUDA 张量。最后,确保在所有模型输入上使用
因此, 请手动覆盖张量my_tensor ='cuda'))

6.4 保存 torch.nn.DataParallel 模型

  • 保存
copy, PATH)
  • 加载
# 加载任何你想要的设备

torch.nn.DataParallel是一个模型封装,支持并行GPU使用。要普通保存 DataParallel 模型, 请保存model.module.state_dict()



from __future__ import print_function #%matplotlib inline import argparse import os import random import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn import torch.optim as optim import import torchvision.datasets as dset import torchvision.transforms as transforms import torchvision.utils as vutils import numpy as np import matplotlib.pyplot as plt import matplotlib.animation as animation from IPython.display import HTML # 为再现性设置随机seem manualSeed = 999 #manualSeed = random.randint(1, 10000) # 如果你想要新的结果就是要这段代码 print("Random Seed: ", manualSeed) random.seed(manualSeed) torch.manual_seed(manualSeed)
  • 输出结果:
Random Seed: 999
# 数据集的根目录 dataroot = "data/celeba" # 加载数据的工作线程数 workers = 2 # 训练期间的batch大小 batch_size = 128 # 训练图像的空间大小。所有图像将使用变压器调整为此大小。 image_size = 64 # 训练图像中的通道数。对于彩色图像,这是3 nc = 3 # 潜在向量 z 的大小(例如: 生成器输入的大小) nz = 100 # 生成器中特征图的大小 ngf = 64 # 判别器中的特征映射的大小 ndf = 64 # 训练epochs的大小 num_epochs = 5 # 优化器的学习速率 lr = 0.0002 # 适用于Adam优化器的Beta1超级参数 beta1 = 0.5 # 可用的GPU数量。使用0表示CPU模式。 ngpu = 1
# 我们可以按照设置的方式使用图像文件夹数据集。 # 创建数据集 dataset = dset.ImageFolder(root=dataroot, transform=transforms.Compose([ transforms.Resize(image_size), transforms.CenterCrop(image_size), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ])) # 创建加载器 dataloader =, batch_size=batch_size, shuffle=True, num_workers=workers) # 选择我们运行在上面的设备 device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu") # 绘制部分我们的输入图像 real_batch = next(iter(dataloader)) plt.figure(figsize=(8,8)) plt.axis("off") plt.title("Training Images") plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))

3.3.1 权重初始化

DCGAN论文中,作者指出所有模型权重应从正态分布中随机初始化,mean = 0,stdev = 0.02。weights_init函数将初始化模型作为

# custom weights initialization called on netG and netD def weights_init(m): classname = m.__class__.__name__ if classname.find('Conv') != -1: nn.init.normal_(, 0.0, 0.02) elif classname.find('BatchNorm') != -1: nn.init.normal_(, 1.0, 0.02) nn.init.constant_(, 0)

3.3.2 生成器

  • 生成器代码
# 生成器代码 class Generator(nn.Module): def __init__(self, ngpu): super(Generator, self).__init__() self.ngpu = ngpu self.main = nn.Sequential( # 输入是Z,进入卷积 nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False), nn.BatchNorm2d(ngf * 8), nn.ReLU(True), # state size. (ngf*8) x 4 x 4 nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf * 4), nn.ReLU(True), # state size. (ngf*4) x 8 x 8 nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf * 2), nn.ReLU(True), # state size. (ngf*2) x 16 x 16 nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False), nn.BatchNorm2d(ngf), nn.ReLU(True), # state size. (ngf) x 32 x 32 nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False), nn.Tanh() # state size. (nc) x 64 x 64 ) def forward(self, input): return self.main(input)


# 创建生成器 netG = Generator(ngpu).to(device) # 如果需要,管理multi-gpu if (device.type == 'cuda') and (ngpu > 1): netG = nn.DataParallel(netG, list(range(ngpu))) # 应用weights_init函数随机初始化所有权重,mean= 0,stdev = 0.2。 netG.apply(weights_init) # 打印模型 print(netG)
  • 输出结果:
Generator( (main): Sequential( (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU(inplace=True) (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (8): ReLU(inplace=True) (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (11): ReLU(inplace=True) (12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (13): Tanh() ) )

3.3.3 判别器

  • 判别器代码
class Discriminator(nn.Module): def __init__(self, ngpu): super(Discriminator, self).__init__() self.ngpu = ngpu self.main = nn.Sequential( # input is (nc) x 64 x 64 nn.Conv2d(nc, ndf, 4, 2, 1, bias=False), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf) x 32 x 32 nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False), nn.BatchNorm2d(ndf * 2), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf*2) x 16 x 16 nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False), nn.BatchNorm2d(ndf * 4), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf*4) x 8 x 8 nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False), nn.BatchNorm2d(ndf * 8), nn.LeakyReLU(0.2, inplace=True), # state size. (ndf*8) x 4 x 4 nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False), nn.Sigmoid() ) def forward(self, input): return self.main(input)


# 创建判别器 netD = Discriminator(ngpu).to(device) # Handle multi-gpu if desired if (device.type == 'cuda') and (ngpu > 1): netD = nn.DataParallel(netD, list(range(ngpu))) # 应用weights_init函数随机初始化所有权重,mean= 0,stdev = 0.2 netD.apply(weights_init) # 打印模型 print(netD)
  • 输出结果:
Discriminator( (main): Sequential( (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (1): LeakyReLU(negative_slope=0.2, inplace=True) (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (4): LeakyReLU(negative_slope=0.2, inplace=True) (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (7): LeakyReLU(negative_slope=0.2, inplace=True) (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (10): LeakyReLU(negative_slope=0.2, inplace=True) (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False) (12): Sigmoid() ) )

3.3.4 损失函数和优化器

# 初始化BCELoss函数 criterion = nn.BCELoss() # 创建一批潜在的向量,我们将用它来可视化生成器的进程 fixed_noise = torch.randn(64, nz, 1, 1, device=device) # 在训练期间建立真假标签的惯例 real_label = 1 fake_label = 0 # 为 G 和 D 设置 Adam 优化器 optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999)) optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

3.3.4 训练

# Training Loop # Lists to keep track of progress img_list = [] G_losses = [] D_losses = [] iters = 0 print("Starting Training Loop...") # For each epoch for epoch in range(num_epochs): # 对于数据加载器中的每个batch for i, data in enumerate(dataloader, 0): ############################ # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z))) ########################### ## Train with all-real batch netD.zero_grad() # Format batch real_cpu = data[0].to(device) b_size = real_cpu.size(0) label = torch.full((b_size,), real_label, device=device) # Forward pass real batch through D output = netD(real_cpu).view(-1) # Calculate loss on all-real batch errD_real = criterion(output, label) # Calculate gradients for D in backward pass errD_real.backward() D_x = output.mean().item() ## Train with all-fake batch # Generate batch of latent vectors noise = torch.randn(b_size, nz, 1, 1, device=device) # Generate fake image batch with G fake = netG(noise) label.fill_(fake_label) # Classify all fake batch with D output = netD(fake.detach()).view(-1) # Calculate D's loss on the all-fake batch errD_fake = criterion(output, label) # Calculate the gradients for this batch errD_fake.backward() D_G_z1 = output.mean().item() # Add the gradients from the all-real and all-fake batches errD = errD_real + errD_fake # Update D optimizerD.step() ############################ # (2) Update G network: maximize log(D(G(z))) ########################### netG.zero_grad() label.fill_(real_label) # fake labels are real for generator cost # Since we just updated D, perform another forward pass of all-fake batch through D output = netD(fake).view(-1) # Calculate G's loss based on this output errG = criterion(output, label) # Calculate gradients for G errG.backward() D_G_z2 = output.mean().item() # Update G optimizerG.step() # Output training stats if i % 50 == 0: print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f' % (epoch, num_epochs, i, len(dataloader), errD.item(), errG.item(), D_x, D_G_z1, D_G_z2)) # Save Losses for plotting later G_losses.append(errG.item()) D_losses.append(errD.item()) # Check how the generator is doing by saving G's output on fixed_noise if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)): with torch.no_grad(): fake = netG(fixed_noise).detach().cpu() img_list.append(vutils.make_grid(fake, padding=2, normalize=True)) iters += 1
  • 输出结果:
Starting Training Loop... [0/5][0/1583] Loss_D: 2.0937 Loss_G: 5.2059 D(x): 0.5704 D(G(z)): 0.6680 / 0.0090 [0/5][50/1583] Loss_D: 0.3567 Loss_G: 12.2064 D(x): 0.9364 D(G(z)): 0.1409 / 0.0000

3.3.5 结果



plt.figure(figsize=(10,5)) plt.title("Generator and Discriminator Loss During Training") plt.plot(G_losses,label="G") plt.plot(D_losses,label="D") plt.xlabel("iterations") plt.ylabel("Loss") plt.legend()


记住在每个训练epoch之后我们如何在fixed_noise batch中保存生成器的输出。现在,我们可以通过动画可视化G的训练进度。按播放按钮

#%%capture fig = plt.figure(figsize=(8,8)) plt.axis("off") ims = [[plt.imshow(np.transpose(i,(1,2,0)), animated=True)] for i in img_list] ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True) HTML(ani.to_jshtml())

真实图像 vs 伪图像


# 从数据加载器中获取一批真实图像 real_batch = next(iter(dataloader)) # 绘制真实图像 plt.figure(figsize=(15,15)) plt.subplot(1,2,1) plt.axis("off") plt.title("Real Images") plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0))) # 在最后一个epoch中绘制伪图像 plt.subplot(1,2,2) plt.axis("off") plt.title("Fake Images") plt.imshow(np.transpose(img_list[-1],(1,2,0)))


