pytorch常规操作-张量扩增、维度扩展、梯度取反、求梯度、CNN与LSTM输入输出维度含义等等
0,有时间看源码还是看看源码吧,不然永远是个菜鸡。。。虽然看了也还是菜鸡。。。
0,常用方法总结
'''========================================1,资源配置========================================'''
torch.cuda.is_available()
.to(torch.device('cuda:0'))
.cuda()
'''========================================2,tensor========================================'''
torch.tensor(**, requries_grad=True)
tensor.to(torch.float)
tensor[None, None, :]
torch.randn(2, 10)
.item()
'''========================================3,数据加载========================================'''
torch.utils.data.Dataset/DataLoader
class myDataset(Dataset):
def __init__(self, files):
...
def __len__(self):
return len(...)/...shape[0]
def __getitem__(self,index):
...
return data, label
traingData = myDataset(files)
trainingDataloader = Dataloader(traingData, batch_size=, shuffle=True)
#************************************获取数据和标签************************************
data, label = next(iter(trainingDataloader))
for (data, label) in trainingDataloader:
for batch, (data, label) in enumerate(trainingDataloader):
'''========================================4,基本模型库========================================'''
#************************************模型构造方法1 nn基础模型************************************
myNet = torch.nn.Linear(10, 10)
myNet.weight
myNet.weight.grad
myNet.bias
myNet.bias.grad
myNet.parameters()
#************************************模型构造方法2.1 nn.Sequential************************************
myNet = torch.nn.Sequential(
nn.Linear(10, 10),
nn.Tanh(),
nn.Linear(10, 10)
)
[param.shape for param in myNet.parameters()]
[(name, param.shape) for (name, param) in myNet.named_parameters()]
#************************************模型构造方法2.2 collections.OrderedDict 与 nn.Sequential 结合,为子模块命名************************************
from collections import OrderedDict
myNet = nn.Sequential(OrderedDict([
('hidden_linear',nn.Linear(10,10)),
('hidden_activation', nn.Tanh()),
('output_linear', nn.Linear(10,10))
]))
[param.shape for param in myNet.parameters()]
[(name, param.shape) for (name, param) in myNet.named_parameters()]
myNet.hidden_linear.weight
myNet.hidden_linear.weight.grad
myNet.hidden_linear.bias
myNet.hidden_linear.bias.grad
#************************************动态添加子模块************************************
nn.Sequential().add_module()
#************************************模型构造方法3 nn.Module************************************
torch.nn.Module
class myModule(nn.Module):
def __init__(self):
super().__init__()
def forward(self, inputs):
...
myNet = myModule()
[(name, param.shape) for (name, param) in myNet.named_parameters()]
'''========================================5,优化器========================================'''
#************************************四大类优化器************************************
optimizer = torch.optim.SGD(model.parameters(), lr=learning)
torch.optim.Momentum()
torch.optim.Adagrad()
torch.optim.Adam()
#************************************5.1:针对模型不同层设置不同的学习率,optimizer中传入的是parameters,筛选用named_parameters中的name************************************
model = torchvision.models.vgg16()
my_list = ['classifier.3.weight', 'classifier.3.bias']
params = [p[1] for p in filter(lambda kv: kv[0] in my_list,
model.named_parameters())]
base_params = [p[1] for p in filter(lambda kv: kv[0] not in my_list,
model.named_parameters())]
optimizer = torch.optim.SGD([{'params': base_params},
{'params': params, 'lr': 1e-4}],
lr=3e-6)
#************************************5.2:自定义根据 epoch 改变学习率************************************
def adjust_learning_rate(optimizer, epoch):
"""Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
lr = lr * (0.1 ** (epoch // 30))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
for epoch in range(10):
adjust_learning_rate(optimizer,epoch)
train(...)
validate(...)
#************************************5.3:手动设置学习率衰减区间************************************
def adjust_learning_rate(optimizer, lr):
for param_group in optimizer.param_groups:
param_group['lr'] = lr
for epoch in range(60):
lr = 30e-5
if epoch > 25:
lr = 15e-5
if epoch > 30:
lr = 7.5e-5
if epoch > 35:
lr = 3e-5
if epoch > 40:
lr = 1e-5
adjust_learning_rate(optimizer, lr)
#************************************5.4:变学习率API************************************
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
T_max = (epochs // 9) + 1)
for epoch in range(epochs):
scheduler.step(epoch)
'''========================================6,损失函数========================================'''
torch.nn.CrossEntropyLoss()
torch.nn.functional.cross_entropy()
'''========================================7,训练========================================'''
myNet.train()
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler.step()
'''========================================8,保存========================================'''
#************************************方法一************************************
torch.save(optimizer.state_dict(), '')
torch.save(myNet.state_dict(), '')
#************************************方法二************************************
torch.save(optimizer, '')
torch.save(myNet, '')
'''========================================9.加载========================================'''
#保存方法一的对应加载
myNet = myModule()
model_dict = torch.load()
myNet.load_dict(model_dict)
#保存方法二的对应加载
myNet = torch.load()
optimizer = torch.load()
'''========================================10,测试========================================'''
myNet.eval()
'''========================================11,计算准确率========================================'''
'''========================================12,视觉库========================================'''
torchvision
torchvision.models
torchvision.transforms
'''========================================其他========================================'''
model.parameters()
model.named_parameters()
model.state_dict()
optimizer.param_groups()
optimizer.state_dict()
1,张量扩增(expand, repeat)
expand将tensor作为整体扩充(填入的是放大后的维度数,除非tensor的某个dim的shape=1,此时可做到同纬度扩充,否则只能升维),repeat也是将tensor做为整体扩充(填入的是放大的倍数,使用起来要灵活的多)
>>> a = torch.randn(2, 4)
>>> a
tensor([[-0.1346, 0.3429, -1.3040, -0.6949],
[-0.0433, 1.7080, -1.8213, -1.6689]])
>>> a.expand(2,2,4)
tensor([[[-0.1346, 0.3429, -1.3040, -0.6949],
[-0.0433, 1.7080, -1.8213, -1.6689]],
[[-0.1346, 0.3429, -1.3040, -0.6949],
[-0.0433, 1.7080, -1.8213, -1.6689]]])
>>> a.repeat(1,2)
tensor([[-0.1346, 0.3429, -1.3040, -0.6949, -0.1346, 0.3429, -1.3040, -0.6949],
[-0.0433, 1.7080, -1.8213, -1.6689, -0.0433, 1.7080, -1.8213, -1.6689]])
>>> a.repeat(2,1,1)
tensor([[[-0.1346, 0.3429, -1.3040, -0.6949],
[-0.0433, 1.7080, -1.8213, -1.6689]],
[[-0.1346, 0.3429, -1.3040, -0.6949],
[-0.0433, 1.7080, -1.8213, -1.6689]]])
整体扩充和交替扩充
>>> a = torch.randn(2,2)
>>> a
tensor([[ 0.2356, 0.0189],
[-0.3703, -0.0547]])
>>> a.repeat(2,1)
tensor([[ 0.2356, 0.0189],
[-0.3703, -0.0547],
[ 0.2356, 0.0189],
[-0.3703, -0.0547]])
>>> a.repeat(1,2).reshape(-1, a.shape[1])
tensor([[ 0.2356, 0.0189],
[ 0.2356, 0.0189],
[-0.3703, -0.0547],
[-0.3703, -0.0547]])
注意,torch和numpy中的repeat效果不一致;numpy中有tile而torch中没有;numpy中的tile和torch中的repeat效果一致
'''numpy'''
>>> a = np.array([1,2,3,4])
>>> np.tile(a, 10)
array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
>>> a.repeat(10)
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
>>> a.expand(10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'expand'
'''torch'''
>>> a = torch.arange(5)
>>> a
tensor([0, 1, 2, 3, 4])
>>> a.repeat(10)
tensor([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3,
4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2,
3, 4])
>>> torch.tile(a, 10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'torch' has no attribute 'tile'
2,维度扩展(unsqueeze,切片)
>>> a = torch.randn(2)
>>> a
tensor([2.0488, 0.5997])
>>> b = a.unsqueeze(-1)
>>> b
tensor([[2.0488],
[0.5997]])
>>> b.shape
torch.Size([2, 1])
>>> b = a[:, None]
>>> b
tensor([[2.0488],
[0.5997]])
>>> b.shape
torch.Size([2, 1])
'''并且切片可以一次扩展多个维度'''
>>> b = a[None, :, None]
>>> b
tensor([[[2.0488],
[0.5997]]])
>>> b.shape
torch.Size([1, 2, 1])
3,梯度取反(Function)
import torch
from torch.autograd import Function
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm
from matplotlib import pyplot as plt
class ReverseLayer(Function):
@staticmethod
def forward(ctx, x):
return x
@staticmethod
def backward(ctx, grad_output):
return grad_output.neg()
class Net(nn.Module):
def __init__(self):
super().__init__()
self.parameter1 = nn.Parameter(torch.ones(10, 10))
self.parameter2 = nn.Parameter(torch.ones(10, 10))
self.parameter3 = nn.Parameter(torch.ones(10, 10))
def forward(self, x):
return x @ self.parameter1 @ self.parameter2 @ self.parameter3
class ReverseNet(nn.Module):
def __init__(self):
super().__init__()
self.parameter1 = nn.Parameter(torch.ones(10, 10))
self.parameter2 = nn.Parameter(torch.ones(10, 10))
self.parameter3 = nn.Parameter(torch.ones(10, 10))
def forward(self, x):
x1 = x @ self.parameter1
x2 = ReverseLayer.apply(x1 @ self.parameter2)
return x2 @ self.parameter3
dataInput = torch.randn(2, 10)
dataTarget = torch.randn(2, 10)
net1 = Net()
net2 = ReverseNet()
loss1 = torch.mean(net1(dataInput) - dataTarget)
loss1.backward()
loss2 = torch.mean(net2(dataInput) - dataTarget)
loss2.backward()
print('=======================PARAMETER1============================')
print(net1.parameter1.grad[0])
print(net2.parameter1.grad[0])
print('=======================PARAMETER2============================')
print(net1.parameter2.grad[0])
print(net2.parameter2.grad[0])
print('=======================PARAMETER3============================')
print(net1.parameter3.grad[0])
print(net2.parameter3.grad[0])
'''
It can be seen that due to the chain rule,
the derivative of all the layers before the reverse layer is taken to be negative
'''
optim1 = optim.Adam(net1.parameters())
optim2 = optim.Adam(net2.parameters())
loss1List = []
loss2List = []
epoch = 100
for i in tqdm(range(epoch)):
net1.zero_grad()
net2.zero_grad()
loss1 = torch.mean(net1(dataInput) - dataTarget)
loss1List.append(loss1.item())
loss1.backward()
optim1.step()
loss2 = torch.mean(net2(dataInput) - dataTarget)
loss2List.append(loss2.item())
loss2.backward()
optim2.step()
plt.subplot(2, 1, 1)
plt.plot(loss1List)
plt.subplot(2, 1, 2)
plt.plot(loss2List)
plt.show()
'''
It can be seen that
Without reverselayer, loss decreases (min)
With reverselayer, the loss increases (max)
'''
'''========================应用场景:网络拼接========================'''
'''========================不取反========================'''
import torch
import torch.nn as nn
myNet1 = nn.Linear(10, 10)
myNet2 = nn.Linear(10, 10)
loss = nn.PairwiseDistance(p=2)
optimizer = torch.optim.Adam(myNet1.parameters(), lr=1e-2)
epoch = 500
dataIn = torch.randn(1, 10)
dataOut = torch.ones(1, 10)
print(myNet2(myNet1(dataIn)))
for i in range(epoch):
optimizer.zero_grad()
l = loss(myNet2(myNet1(dataIn)), dataOut)
l.backward()
optimizer.step()
print(myNet2(myNet1(dataIn)))
'''========================应用:取反========================'''
import torch
import torch.nn as nn
from torch.autograd import Function
class ReverseLayerF(Function):
@staticmethod
def forward(ctx, x):
return x
@staticmethod
def backward(ctx, grad_output):
return grad_output.neg()
myNet1 = nn.Linear(10, 10)
myNet2 = nn.Linear(10, 10)
loss = nn.PairwiseDistance(p=2)
optimizer = torch.optim.Adam(myNet1.parameters(), lr=1e-2)
epoch = 500
dataIn = torch.randn(1, 10)
dataOut = torch.ones(1, 10)
print(myNet2(myNet1(dataIn)))
for i in range(epoch):
optimizer.zero_grad()
l = loss(myNet2(ReverseLayerF.apply(myNet1(dataIn))), dataOut)
l.backward()
optimizer.step()
print(myNet2(myNet1(dataIn)))
4,求梯度
'''v1'''
gradients = autograd.grad(outputs=dataOut, inputs=dataIn,
grad_outputs=torch.ones(dataIn.size()).cuda(),
create_graph=True, retain_graph=True, only_inputs=True)[0]
'''v2'''
dataOut.backward(torch.ones_like(dataOut))
5,CNN与LSTM输入输出维度含义
-
CNN
卷积data的四个维度: batch, input channel, height, width
Conv2d的四个维度: input channel, output channel, kernel, stride -
LSTM
时间序列data(输入)的三个维度: sequential-length(近似于NLP中一句话里几个单词), batch, input-size(一个单词几个字母)
LSTM的三个维度: input-size, output-size, layers
h0的三个维度: layers, batch, output-size
c0的三个维度: layers, batch, output-size
output的三个维度: sequential-length, batch, output-size
6,一维向量的转换-对角矩阵(diag)、one-hot标签(torch.nn.functional.one_hot)
转对角矩阵
diagonalMatrix = torch.diag(tensor)
转one-hot标签
torch.nn.functional.one_hot(tensor/Long Tensor, num_classes)
7,手动修改网络参数(load_state_dict)
- model.state_dict()返回的只是module类内部state dict对象的一个copy,如果只是在此拷贝上进行修改会发现对原先的model并不会有影响
- 解决方案是将model.state_dict() 字典赋值给给一个变量model_dict, 然后在model_dict上修改,最后model.load_state_dict(model_dict)
'''直接修改无变化'''
import torch
import torch.nn as nn
net = nn.Linear(10, 10)
optimizer = torch.optim.Adam(net.parameters(), lr=1e-2)
loss = nn.PairwiseDistance(p=2)
dataIn = torch.randn(2, 10)
dataOut = torch.ones(2, 10)
epoch = 200
for i in range(epoch):
optimizer.zero_grad()
l = loss(net(dataIn), dataOut).mean()
l.backward()
optimizer.step()
print(f'\033[33m{net(dataIn)}\033[0m')
for key in net.state_dict():
print(net.state_dict()[key])
net.state_dict()[key].data = torch.randn(net.state_dict()[key].shape)
for key in net.state_dict():
print(net.state_dict()[key])
print(f'\033[34m{net(dataIn)}\033[0m')
'''先赋值再加载'''
import torch
import torch.nn as nn
net = nn.Linear(10, 10)
optimizer = torch.optim.Adam(net.parameters(), lr=1e-2)
loss = nn.PairwiseDistance(p=2)
dataIn = torch.randn(2, 10)
dataOut = torch.ones(2, 10)
epoch = 200
for i in range(epoch):
optimizer.zero_grad()
l = loss(net(dataIn), dataOut).mean()
l.backward()
optimizer.step()
print(f'\033[33m{net(dataIn)}\033[0m')
model_dict = net.state_dict()
for key in model_dict:
print(model_dict[key])
model_dict[key] = torch.randn(net.state_dict()[key].shape)
net.load_state_dict(model_dict)
for key in net.state_dict():
print(net.state_dict()[key])
print(f'\033[34m{net(dataIn)}\033[0m')
8,显示模型结构的方法
'''显示模块'''
myNet
list(myNet.children())
'''显示参数名称及数值'''
mynet.state_dict().keys()
mynet.state_dict()
'''显示参数数值'''
list(mynet.parameters())
9,显示前k个最大值的索引
'''k=1'''
tensor.argmax()
'''k>1'''
tansor.argsort()[:k]
10,打乱(tensor[torch.randperm(tensor.shape[0])])
dataRandomIndex = torch.randperm(data.shape[0])
data[dataRandomIndex]
11,可视化:特征图(feture map)、卷积核权重、卷积核最匹配样本、类别激活图(Class Activation Map/CAM)、网络结构
https://www.cnblogs.com/tensorzhang/p/15053885.html
12,不使用optim进行训练的步骤
for i in range(epoch):
data = nn.Parameter()
l = loss
l.backward()
data = data - lr*data.grad
data = nn.Parameter(data.detach())
13,生成one-hot标签 nn.funcitonal.one_hot(label, nums_classes=N)
oneHotLabel = torch.nn.functional.ones_hot(label, num_classes=N).to(torch.float)
'''label从0开始'''
np.eye(num_classes)[arr]
np.eye(7)[np.ones(10, dtype=np.int)]
14,GPU自动匹配gpu = torch.device(f'cuda:0' if torch.cuda.is_available() else 'cpu')
15,矩阵乘法
a * b,要求两个矩阵维度完全一致,即两个矩阵对应元素相乘,输出的维度也和原矩阵维度相同
torch.mul(a, b)是矩阵a和b对应位相乘,a和b的维度必须相等,比如a的维度是(1, 2),b的维度是(1, 2),返回的仍是(1, 2)的矩阵
a@b,要求两个矩阵维度是(n×m)和(m×p),即普通二维矩阵乘法
torch.mm(a,b),要求两个矩阵维度是(n×m)和(m×p),即普通二维矩阵乘法
torch.matul(a,b),matmul可以进行张量乘法,输入可以是高维,当输入是多维时,把多出的一维作为batch提出来,其他部分做矩阵乘法
当输入是二维矩阵时,torch.mm(a,b)和torch.matul(a,b)是一样的
参考博客:https://blog.csdn.net/lijiaming_99/article/details/114642093
16,pytorch将cpu训练好的模型参数load到gpu上,或者gpu->cpu上
假设我们只保存了模型的参数(model.state_dict())到文件名为modelparameters.pth, model = Net()
1. cpu -> cpu或者gpu -> gpu:
checkpoint = torch.load('modelparameters.pth')
model.load_state_dict(checkpoint)
2. cpu -> gpu 1
torch.load('modelparameters.pth', map_location=lambda storage, loc: storage.cuda(1))
3. gpu 1 -> gpu 0
torch.load('modelparameters.pth', map_location={'cuda:1':'cuda:0'})
4. gpu -> cpu
torch.load('modelparameters.pth', map_location=lambda storage, loc: storage)
17,循环生成子网络并自动注册到主体网络中 nn.ModuleList
self.LayerList = nn.ModuleList() #自动注册到主网络中
for i in range(k):
self.LayerList.append()
https://zhuanlan.zhihu.com/p/75206669
18,添加自定义参数到主体网络中 nn.Parameter register_parameter
19,pytorch中的乘法
pytorch中的乘法一共有四种,有*
乘,torch.mul
,torch.mm
,torch.matmul
。其中*
乘和torch.mul
都是element-wise的乘法,也就对应元素相乘,而后两种是数学上的矩阵乘法。
20,tensor维度互换
transpose()
:该函数可以交换tensor的任意两个维度,但是该函数一次只有两个参数,即一次只能交换两个维度。
import torch
x = torch.randn(8, 3, 5, 4)
y = x.transpose(1,2) # 交换第二与第三维度
print(y.shape)
permute()
:该函数可以随意交换任意维度,并且可以重新排列整合维度
z1 = x.permute(0,2,1,3) # 交换第二与第三维度
print(z1.shape)
z2 = x.permute(3,0,2,1) # 对原维度重新排列整合
print(z2.shape)
21,PyTorch使用cpu与gpu之间模型相互加载调用
22,查看模型的中间计算结果
23,获取中间计算结果及梯度-hook
- https://www.jianshu.com/p/69e57e3526b3
- Pytorch的hook编程可以在不改变网络结构的基础上有效获取、改变模型中间变量以及梯度等信息。
- hook可以提取或改变Tensor的梯度,也可以获取nn.Module的输出和梯度(这里不能改变)。有3个hook函数用于实现以上功能:
- Tensor.register_hook(hook_fn),
- nn.Module.register_forward_hook(hook_fn),
- nn.Module.register_backward_hook(hook_fn).
24,清空显存占用
torch.cuda.empty_cache()
gc.collect()