【深度学习】Pytorch 学习笔记
学习网址:https://www.youtube.com/watch?v=ogZi5oIo4fI
有道云笔记:http://note.youdao.com/noteshare?id=d86bd8fc60cb4fe87005a2d2e2d5b70d&sub=6911732F9FA44C68AD53A09072155ED3
Pytorch Leture 05: Linear Rregression in the Pytorch Way
第一部分,使用一个类来构建你的模型,需要写forward函数
import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt
x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0]]))
y_data = Variable(torch.Tensor([[2.0], [4.0], [6.0]]))
class Model(torch.nn.Module):
def __init__(self):
"""
In the constructor we instantiate two nn.Linear module
"""
super(Model, self).__init__()
self.linear = torch.nn.Linear(1, 1) # One in and one out
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Variables.
"""
y_pred = self.linear(x)
return y_pred
# our model
model = Model()
第二部分,构建loss和优化器来进行参数计算
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
# criterion 标准准则 主要用来计算loss
criterion = torch.nn.MSELoss(size_average=False)
# 优化器
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
第三部分,进行训练,forward -> backward -> update parameters
# Training loop
for epoch in range(1000):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x_data)
# Compute and print loss
loss = criterion(y_pred, y_data)
print(epoch, loss.data[0])
# Zero gradients, perform a backward pass, and update the weights.
# initialize the gradients
optimizer.zero_grad()
# 反向传递
loss.backward()
# 更新优化器中的权重,即model.parrameters
optimizer.step()
第四部分,测试
# After training
hour_var = Variable(torch.Tensor([[4.0]]))
y_pred = model(hour_var)
print("predict (after training)", 4, model(hour_var).data[0][0])
总结一下基本的训练框架:
- 通过写一个类,来构造你的模型
- 构建loss和优化器
- 开始训练 Forward -> compute loss -> backward -> update
- Forward: y_pred = model(x_data)
- Compute loss: loss = criterion(y_pred,y_data)
- Backward: optimizer.zero_grad() && loss.backward()
- Update: optimizer.step()
作业测试其他optimizers:
- torch.optim.Adagrad
- torch.optim.Adam
- torch.optim.Adamax
- torch.optim.ASGD
- torch.optim.LBFGS
- torch.optim.RRRMSprop
- torch.optim.Rprop
- torch.optim.SGD
Logistic Regression 逻辑回归 - 二分类
原来的:
graph LR
x-->Linear
Linear-->y
\hat{y} = x * w + b
loss = \frac{1}{N}\sum_{n=1}^{N}(\hat{y_n}-y_n)^2
激活函数:
using sigmoid functions:
graph LR
x --> Linear
Linear --> Sigmoid
Sigmoid --> y
Y 介于 [0,1] 之间, 这样做可以用来压缩计算量,让计算更加容易
\sigma(z) = \frac{1}{1+e^{-z}}
\hat{y} = \sigma(x*w+b)
loss=-\frac{1}{N}\sum_{n=1}^{N}y_nlog\hat{y_n} + (1-y_n)log(1-\hat{y_n})
代码:
import torch
from torch.autograd import Variable
import torch.nn.functional as F
x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0],[5.0]]))
y_data = Variable(torch.Tensor([[0.], [0.], [1.], [1.],[1.]]))
class Model(torch.nn.Module):
def __init__(self):
"""
In the constructor we instantiate nn.Linear module
"""
super(Model, self).__init__()
self.linear = torch.nn.Linear(1, 1) # One in and one out
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data.
"""
y_pred = F.sigmoid(self.linear(x))
return y_pred
# our model
model = Model()
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# Training loop
for epoch in range(400):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x_data)
# Compute and print loss
loss = criterion(y_pred, y_data)
print(epoch, loss.data[0])
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
# After training
hour_var = Variable(torch.Tensor([[0.0]]))
print("predict 1 hour ", 0.0, model(hour_var).data[0][0] > 0.5)
hour_var = Variable(torch.Tensor([[7.0]]))
print("predict 7 hours", 7.0, model(hour_var).data[0][0] > 0.5)
新增激活函数:
- Design your model using class
y_Pred = F.sigmoid(self.linear(x))
- Construct loss and optimizer
change loss into:
criterion = torch.nn.BCELoss(size_average=True)
- Training cycle (forward,backward,update)
作业:尝试其他激活函数:
- ReLu
ReLU是修正线性单元(The Rectified Linear Unit)的简称,近些年来在深度学习中使用得很多,可以解决梯度弥散问题,因为它的导数等于1或者就是0。相对于sigmoid和tanh激励函数,对ReLU求梯度非常简单,计算也很简单,可以非常大程度地提升随机梯度下降的收敛速度。(因为ReLU是线性的,而sigmoid和tanh是非线性的)。但ReLU的缺点是比较脆弱,随着训练的进行,可能会出现神经元死亡的情况,例如有一个很大的梯度流经ReLU单元后,那权重的更新结果可能是,在此之后任何的数据点都没有办法再激活它了。如果发生这种情况,那么流经神经元的梯度从这一点开始将永远是0。也就是说,ReLU神经元在训练中不可逆地死亡了。
- ReLu6
- ELU
ELU在正值区间的值为x本身,这样减轻了梯度弥散问题(x>0区间导数处处为1),这点跟ReLU、Leaky ReLU相似。而在负值区间,ELU在输入取较小值时具有软饱和的特性,提升了对噪声的鲁棒性
- SELU
- PReLU
- LeakyReLu
Leaky ReLU主要是为了避免梯度消失,当神经元处于非激活状态时,允许一个非0的梯度存在,这样不会出现梯度消失,收敛速度快。它的优缺点跟ReLU类似。
- Threshold
- Hardtanh
tanh函数将输入值压缩至-1到1之间。该函数与Sigmoid类似,也存在着梯度弥散或梯度饱和的缺点。
- Sigmoid
这应该是神经网络中使用最频繁的激励函数了,它把一个实数压缩至0到1之间,当输入的数字非常大的时候,结果会接近1,当输入非常大的负数时,则会得到接近0的结果。在早期的神经网络中使用得非常多,因为它很好地解释了神经元受到刺激后是否被激活和向后传递的场景(0:几乎没有被激活,1:完全被激活),不过近几年在深度学习的应用中比较少见到它的身影,因为使用sigmoid函数容易出现梯度弥散或者梯度饱和。当神经网络的层数很多时,如果每一层的激励函数都采用sigmoid函数的话,就会产生梯度弥散的问题,因为利用反向传播更新参数时,会乘以它的导数,所以会一直减小。如果输入的是比较大或者比较小的数(例如输入100,经Sigmoid函数后结果接近于1,梯度接近于0),会产生饱和效应,导致神经元类似于死亡状态。
- Tanh
Lecture07: How to make netural network wide and deep ?
graph LR
a-->Linear
b-->Linear
Linear-->Sigmoid
Sigmoid-->y
多维度,更层次的网络,主要在Design your model using class 中进行的改变
import torch
from torch.autograd import Variable
import numpy as np
xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:, 0:-1]))
y_data = Variable(torch.from_numpy(xy[:, [-1]]))
print(x_data.data.shape)
print(y_data.data.shape)
class Model(torch.nn.Module):
def __init__(self):
"""
In the constructor we instantiate two nn.Linear module
"""
super(Model, self).__init__()
self.l1 = torch.nn.Linear(8, 6)
self.l2 = torch.nn.Linear(6, 4)
self.l3 = torch.nn.Linear(4, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Variables.
"""
out1 = self.sigmoid(self.l1(x))
out2 = self.sigmoid(self.l2(out1))
y_pred = self.sigmoid(self.l3(out2))
return y_pred
# our model
model = Model()
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
#criterion = torch.nn.BCELoss(size_average=True)
criterion = torch.nn.BCELoss(reduction='elementwise_mean')
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
# Training loop
for epoch in range(1200000):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x_data)
# Compute and print loss
loss = criterion(y_pred, y_data)
print(epoch, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
作业:
- 10层以上的更深层测的网络进行训练
发现并没有因为更深,效果变好 - 更改激励函数
Lecture 08: Pytorch DataLoader
构造Datasets主要分为三个过程:
继承自Dataset
- download, rerad data etc
- return one item on the index
- return the data length
实例化一个dataset,在Dataloader中使用:
train_loader = DataLoader(dataset=dataset,
batch_size=1,
shuffle=True,
num_workers=1)
Code:
# References
# https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/pytorch_basics/main.py
# http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class
import torch
import numpy as np
from torch.autograd import Variable
from torch.utils.data import Dataset, DataLoader
class DiabetesDataset(Dataset):
""" Diabetes dataset."""
# Initialize your data, download, etc.
def __init__(self):
xy = np.loadtxt('./data/diabetes.csv.gz',
delimiter=',', dtype=np.float32)
self.len = xy.shape[0]
self.x_data = torch.from_numpy(xy[:, 0:-1])
self.y_data = torch.from_numpy(xy[:, [-1]])
def __getitem__(self, index):
return self.x_data[index], self.y_data[index]
def __len__(self):
return self.len
dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset,
batch_size=1,
shuffle=True,
num_workers=1)
for epoch in range(2):
for i, data in enumerate(train_loader, 0):
# get the inputs
inputs, labels = data
# wrap them in Variable
inputs, labels = Variable(inputs), Variable(labels)
# Run your training process
print(epoch, i, "inputs", inputs.data, "labels", labels.data)
课后作业:
使用其他数据集,MNIST,参考了官网的代码:
总结一下训练的思路:
- 构造继承自Dataset的自己的datasets类
Lecture 09: softmax Classifier
part one
MNist softmax
before:
graph LR
x{x} --> Linear
Linear --> Activation
Activation --> ...
... --> Linear2
Linear2-->Activation2
Activation2-->h{y}
now:
graph LR
x{x} --> Linear
Linear --> Activation
Activation --> ...
... --> Linear2
Linear2-->Activation2
Activation2-->P_y=0
Activation2-->P_y=1
Activation2-->....
Activation2-->P_y=10
what is softmax?
\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}} for j=1,2,...,k
using softmax to get probabilities.
what is corss entropy?
loss = \frac{1}{N}\sum_i D(Softmax(wx_i+b),Y_i)
D(\hat{Y},Y) = -Ylog\hat{Y}
整个过程:
graph LR
x--LinearModel-->Z
Z--Softmax-->y'
y'--Cross_Entropy-->Y
Pytorch中的实现:
loss = torch.nn.CrossEntropyLoss()
这个中既包括了Softmax也包括了Cross_Entropy
graph LR
X--Softmax-->y'
y'--Cross_Entropy-->Y
Code:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
# Cross entropy example
import numpy as np
# One hot
# 0: 1 0 0
# 1: 0 1 0
# 2: 0 0 1
Y = np.array([1, 0, 0])
Y_pred1 = np.array([0.7, 0.2, 0.1])
Y_pred2 = np.array([0.1, 0.3, 0.6])
print("loss1 = ", np.sum(-Y * np.log(Y_pred1)))
print("loss2 = ", np.sum(-Y * np.log(Y_pred2)))
################################################################################
# Softmax + CrossEntropy (logSoftmax + NLLLoss)
loss = nn.CrossEntropyLoss()
# target is of size nBatch
# each element in target has to have 0 <= value < nClasses (0-2)
# Input is class, not one-hot
Y = Variable(torch.LongTensor([0]), requires_grad=False)
# input is of size nBatch x nClasses = 1 x 4
# Y_pred are logits (not softmax)
Y_pred1 = Variable(torch.Tensor([[2.0, 1.0, 0.1]]))
Y_pred2 = Variable(torch.Tensor([[0.5, 2.0, 0.3]]))
l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)
print("PyTorch Loss1 = ", l1.data, "\nPyTorch Loss2=", l2.data)
print("Y_pred1=", torch.max(Y_pred1.data, 1)[1])
print("Y_pred2=", torch.max(Y_pred2.data, 1)[1])
################################################################################
"""Batch loss"""
# target is of size nBatch
# each element in target has to have 0 <= value < nClasses (0-2)
# Input is class, not one-hot
Y = Variable(torch.LongTensor([2, 0, 1]), requires_grad=False)
# input is of size nBatch x nClasses = 2 x 4
# Y_pred are logits (not softmax)
Y_pred1 = Variable(torch.Tensor([[0.1, 0.2, 0.9],
[1.1, 0.1, 0.2],
[0.2, 2.1, 0.1]]))
Y_pred2 = Variable(torch.Tensor([[0.8, 0.2, 0.3],
[0.2, 0.3, 0.5],
[0.2, 0.2, 0.5]]))
l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)
print("Batch Loss1 = ", l1.data, "\nBatch Loss2=", l2.data)
作业:CrossEntropyLoss VS NLLLoss ?
part two : real problem - MNIST input
MNIST Network
graph LR
inputLayer -.-> HiddenLayer
HiddenLayer -.-> OutputLayer
Code:
# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
# Training settings
batch_size = 16
# MNIST Dataset
train_dataset = datasets.MNIST(root='./mnist_data/',
train=True,
transform=transforms.ToTensor(),
download=True)
test_dataset = datasets.MNIST(root='./mnist_data/',
train=False,
transform=transforms.ToTensor())
# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.l1 = nn.Linear(784, 520)
self.l2 = nn.Linear(520, 320)
self.l3 = nn.Linear(320, 240)
self.l4 = nn.Linear(240, 120)
self.l5 = nn.Linear(120, 10)
def forward(self, x):
x = x.view(-1, 784) # Flatten the data (n, 1, 28, 28)-> (n, 784)
x = F.relu(self.l1(x))
x = F.relu(self.l2(x))
x = F.relu(self.l3(x))
x = F.relu(self.l4(x))
return self.l5(x)
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))
def test():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
# sum up batch loss
test_loss += criterion(output, target).data[0]
# get the index of the max
pred = output.data.max(1, keepdim=True)[1]
correct += pred.eq(target.data.view_as(pred)).cpu().sum()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
for epoch in range(1, 10):
train(epoch)
test()
作业:
Use DataLoader
Lecture 10 : basic CNN
Simple convolution layer
for Example:
graph LR
3*3*1_image-->2*2*1_filter_W
3*3*1_image-->1*1_Stride
3*3*1_image-->NoPadding
NoPadding-->2*2_featureMap
2*2*1_filter_W-->2*2_featureMap
1*1_Stride-->2*2_featureMap
How to compute multi-dimension pictures ?
- 32 * 32 * 3 image
- 5 * 5 * 3 filter W
w^T + b
Get: 28 * 28 * 1 feature map * N (how many filters you used)
计算公式
OutputSize = \frac{(InputSize+PaddingSize*2-FilterSize)}{Stride} + 1
几个需要解释的参数:
CONV
卷积层,需要配合激活函数使用
filter and padding and filterSize using function above to calculate
torch.nn.Conv2d(in_channels,out_channels,kernel_size)
self.conv1=nn.Conv2d(1,10,kernel_size=5)
激活函数
activate functions
Max Pooling
选取一个n*m的Filter中最大的值作为pooling的结果
还有类似的avg Pooling
nn.MaxPool2d(kernel_size)
self.mp = nn.MaxPool2d(2)
全连接层
self.fc = nn.Linear(320,10)
CNN & Fully Connected network 区别
CNN中的神经元不是跟每个像素都相连
Fully Connected network中的神经元是跟每个像素都相连。
implement of Simple CNN
graph TB
ConvolutionalLayer1 --> PoolingLayer1
PoolingLayer1 --> ConvolutionalLayer2
ConvolutionalLayer2 --> PoolingLayer2
PoolingLayer2 --> Fully-ConnectedLayer
Model:
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.conv1 = nn.Conv2d(1,10,kernel_size=5)
self.conv2 = nn.Conv2d(10,20,kernel_size=5)
self.mp = nn.MaxPool2d(2)
self.fc = nn.Linear(???,10)
def forward(self,x):
in_size = x.size(0)
x = F.relu(self.mp(self.conv1(x)))
x = F.relu(self.mp(self.conv2(x)))
x = x.view(in_size,-1) # flatten the tensor
x = self.fc(x)
return F.log_softmax(x)
???
处如何填写
???
处可以随意先填一个数值,然后通过程序的报错来填写- 还可以在forward函数中print(x.size())得到tensor的维度
作业:
尝试更深层次的网络,更深的全连接层
Lecture 11 Advanced CNN
Why 1*1 convolution ?
using 32 1*1 filters to turn 64-dimension pic into 32-dimension pic.
using 1*1 filters can significantly save our computations.
Inception Module
graph LR
Filter_concat_in --> 1*1Conv0_16
Filter_concat_in --> 1*1Conv1_16
Filter_concat_in --> 1*1Conv2_16
Filter_concat_in --> AvgPooling
AvgPooling --> 1*1Conv3_16
1*1Conv0_16 --> 3*3Conv0_24
3*3Conv0_24 --> 3*3Conv1_24
3*3Conv1_24 --> Filter_Concat_out
1*1Conv1_16 --> 5*5Conv_24
5*5Conv_24 --> Filter_Concat_out
1*1Conv3_16 --> Filter_Concat_out
1*1Conv2_16 --> Filter_Concat_out
Implement
- 最下边的实现(第四道)
self.brach1x1 = nn.Conv2d(in_channels,16,kernel_size=1)
branch1x1 = self.branch1x1(x)
- 倒数第二道
self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1)
branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1)
branch_pool = self.branch_pool(branch_pool)
- 正数第二道
self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1)
self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=1,padding=2)
branch5x5 = self.branch5x5_1(x)
branch5x5 = self.branch5x5_2(branch5x5)
- 第一道
self.branch3x3_1=nn.Conv2d(in_channels,16,kernel_size=1)
self.branch3x3_2=nn.Conv2d(16,24,kernel_size=3,padding=1)
self.branch3x3_3=nn.Conv2d(24,24,kernel_size=3,padding=1)
branch3x3 = self.branch3x3_1(x)
branch3x3 = self.branch3x3_2(branch3x3)
branch3x3 = self.branch3x3_3(branch3x3)
- output
outputs = [branch1x1,branch_pool,branch5x5,branch3x3]
ALL CODE:
# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
# Training settings
batch_size = 64
# MNIST Dataset
train_dataset = datasets.MNIST(root='./data/',
train=True,
transform=transforms.ToTensor(),
download=True)
test_dataset = datasets.MNIST(root='./data/',
train=False,
transform=transforms.ToTensor())
# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
class InceptionA(nn.Module):
def __init__(self, in_channels):
super(InceptionA, self).__init__()
self.branch1x1 = nn.Conv2d(in_channels, 16, kernel_size=1)
self.branch5x5_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
self.branch5x5_2 = nn.Conv2d(16, 24, kernel_size=5, padding=2)
self.branch3x3dbl_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
self.branch3x3dbl_2 = nn.Conv2d(16, 24, kernel_size=3, padding=1)
self.branch3x3dbl_3 = nn.Conv2d(24, 24, kernel_size=3, padding=1)
self.branch_pool = nn.Conv2d(in_channels, 24, kernel_size=1)
def forward(self, x):
branch1x1 = self.branch1x1(x)
branch5x5 = self.branch5x5_1(x)
branch5x5 = self.branch5x5_2(branch5x5)
branch3x3dbl = self.branch3x3dbl_1(x)
branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)
branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
branch_pool = self.branch_pool(branch_pool)
outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
return torch.cat(outputs, 1)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(88, 20, kernel_size=5)
self.incept1 = InceptionA(in_channels=10)
self.incept2 = InceptionA(in_channels=20)
self.mp = nn.MaxPool2d(2)
self.fc = nn.Linear(1408, 10)
def forward(self, x):
in_size = x.size(0)
x = F.relu(self.mp(self.conv1(x)))
x = self.incept1(x)
x = F.relu(self.mp(self.conv2(x)))
x = self.incept2(x)
x = x.view(in_size, -1) # flatten the tensor
x = self.fc(x)
return F.log_softmax(x)
model = Net()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))
def test():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, size_average=False).data[0]
# get the index of the max log-probability
pred = output.data.max(1, keepdim=True)[1]
correct += pred.eq(target.data.view_as(pred)).cpu().sum()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
for epoch in range(1, 10):
train(epoch)
test()
Lecture 12: RNN
Recurrrent NN
graph LR
X1 --> A1
A1 --> h1
X2 --> A2
A2 --> h2
X3 --> A3
A3 --> h3
X4 --> A4
A4 --> h4
A1 --> A2
A2 --> A3
A3 --> A4
Pytorch提供了RNN函数,可以直接使用
different RNN implementations
cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)
cell = nn.GRU(input_size=4,hidden_size=2,batch_first=True)
cell = nn.LSTM(input_size=4,hidden_size=2,batch_first=True)
How to use RNN?
cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)
inputs = ... # batch_size, seq_len,inputSize
hidden = (...) # numLayers,batch_size, hidden_size
out, hidden = cell(inputs,hidden)
有两个输出,一个是output, 一个是hidden layer的output
# Lab 12 RNN
import sys
import torch
import torch.nn as nn
from torch.autograd import Variable
torch.manual_seed(777) # reproducibility
# 0 1 2 3 4
idx2char = ['h', 'i', 'e', 'l', 'o']
# Teach hihell -> ihello
x_data = [0, 1, 0, 2, 3, 3] # hihell
one_hot_lookup = [[1, 0, 0, 0, 0], # 0
[0, 1, 0, 0, 0], # 1
[0, 0, 1, 0, 0], # 2
[0, 0, 0, 1, 0], # 3
[0, 0, 0, 0, 1]] # 4
y_data = [1, 0, 2, 3, 3, 4] # ihello
x_one_hot = [one_hot_lookup[x] for x in x_data]
# As we have one batch of samples, we will change them to variables only once
inputs = Variable(torch.Tensor(x_one_hot))
labels = Variable(torch.LongTensor(y_data))
num_classes = 5
input_size = 5 # one-hot size
hidden_size = 5 # output from the RNN. 5 to directly predict one-hot
batch_size = 1 # one sentence
sequence_length = 1 # One by one
num_layers = 1 # one-layer rnn
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.rnn = nn.RNN(input_size=input_size,
hidden_size=hidden_size, batch_first=True)
def forward(self, hidden, x):
# Reshape input (batch first)
x = x.view(batch_size, sequence_length, input_size)
# Propagate input through RNN
# Input: (batch, seq_len, input_size)
# hidden: (num_layers * num_directions, batch, hidden_size)
out, hidden = self.rnn(x, hidden)
return hidden, out.view(-1, num_classes)
def init_hidden(self):
# Initialize hidden and cell states
# (num_layers * num_directions, batch, hidden_size)
return Variable(torch.zeros(num_layers, batch_size, hidden_size))
# Instantiate RNN model
model = Model()
print(model)
# Set loss and optimizer function
# CrossEntropyLoss = LogSoftmax + NLLLoss
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# Train the model
for epoch in range(100):
optimizer.zero_grad()
loss = 0
hidden = model.init_hidden()
sys.stdout.write("predicted string: ")
for input, label in zip(inputs, labels):
# print(input.size(), label.size())
hidden, output = model(hidden, input)
val, idx = output.max(1)
sys.stdout.write(idx2char[idx.data[0]])
loss += criterion(output, label)
print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.data[0]))
loss.backward()
optimizer.step()
print("Learning finished!")