Pytorch nn.Linear的基本用法与原理详解

原文：Pytorch nn.Linear的基本用法与原理详解_iioSnail的博客-CSDN博客

nn.Linear的基本定义

nn.Linear定义一个神经网络的线性层，方法签名如下：

 torch.nn.Linear(in_features, # 输入的神经元个数
           out_features, # 输出神经元个数
           bias=True # 是否包含偏置
           )

Linear其实就是对输入 $X_{n \times i}$ 执行了一个线性变换,既：

Y_{n \times o} = X_{n \times i} W_{i \times o} + b

其中 $W$ 是模型要学习的参数, $W$ 的维度为 $W_{i \times o}$ , $b$ 是o维的向量偏置, $n$ 为输入向量的行数(例如,你想一次输入10个样本, 即batch_size为10,则 $n = 10$ ), $i$ 为输入神经元的个数(例如你的样本特征数为5,则 $i = 5$ ), $o$ 为输出神经元的个数。

使用演示：

 from torch import nn
import torch
 
model = nn.Linear(2, 1) # 输入特征数为2，输出特征数为1
 
input = torch.Tensor([1, 2]) # 给一个样本，该样本有2个特征（这两个特征的值分别为1和2）
output = model(input)
output
 
tensor([-1.4166], grad_fn=<AddBackward0>)

我们的输入为[1,2]，输出了[-1.4166]。可以查看模型参数验证一下上述的式子：

 # 查看模型参数
for param in model.parameters():
    print(param)
 
 
Parameter containing:
tensor([[ 0.1098, -0.5404]], requires_grad=True)
Parameter containing:
tensor([-0.4456], requires_grad=True)

可以看到，模型有3个参数，分别为两个权重和一个偏执。计算可得:

y = [1, 2] * [0.1098, - 0.5404]^{T} - 0.4456 = - 1.4166

实战

假设我们的一次输入三个样本A,B,C（即batch_size为3），每个样本的特征数量为5：

 A: [0.1,0.2,0.3,0.3,0.3]
B: [0.4,0.5,0.6,0.6,0.6]
C: [0.7,0.8,0.9,0.9,0.9]

则我们的输入向量 $X_{3 \times 5}$ 为:

 X = torch.Tensor([
    [0.1,0.2,0.3,0.3,0.3],
    [0.4,0.5,0.6,0.6,0.6],
    [0.7,0.8,0.9,0.9,0.9],
])
X

 tensor([[0.1000, 0.2000, 0.3000, 0.3000, 0.3000],
        [0.4000, 0.5000, 0.6000, 0.6000, 0.6000],
        [0.7000, 0.8000, 0.9000, 0.9000, 0.9000]])

定义线性层, 我们的输入特征为5,所以 in_feature=5,我们想让下一层的神经元个数为10,所以 out feature=10, 则模型参数为: $W_{5 \times 10}$

 model = nn.Linear(in_features=5, out_features=10, bias=True)

经过线性层，其实就是做了一件事，即：

Y_{3 \times 10} = X_{3 \times 5} W_{5 \times 10} + b

具体表示则为：

[\begin{matrix} Y_{00} & Y_{01} & \dots & Y_{08} & Y_{09} \\ Y_{10} & Y_{11} & \dots & Y_{18} & Y_{19} \\ Y_{20} & Y_{21} & \dots & Y_{28} & Y_{29} \end{matrix}] = [\begin{matrix} X_{00} & X_{01} & X_{02} & X_{03} & X_{04} \\ X_{10} & X_{11} & X_{12} & X_{13} & X_{14} \\ X_{20} & X_{21} & X_{22} & X_{23} & X_{24} \end{matrix}] [\begin{matrix} W_{00} & W_{01} & \dots & W_{08} & W_{09} \\ W_{10} & W_{11} & \dots & W_{18} & W_{19} \\ W_{20} & W_{21} & \dots & W_{28} & W_{29} \\ W_{30} & W_{31} & \dots & W_{38} & W_{39} \\ W_{40} & W_{41} & \dots & W_{48} & W_{49} \end{matrix}] + b

其中 $X_{i}$ .就表示第 $i$ 个样本, $W_{\cdot j}$ 表示所有输入神经元到第 $j$ 个输出神经元的权重。

注意: 这里图有点问题, 应该是 $W_{00}, W_{01}, W_{02}, . . ., W_{07}, W_{08}, W_{09}$

因为有三个样本,所以相当于依次进行了三次 $Y_{1 \times 10} = X_{1 \times 5} W_{5 \times 10}$ ,然后再将三个 $Y_{1 \times 10}$ 叠在一起经过线性层后,我们最终的到了 $3 \times 10$ 维的矩阵,即输入3个样本,每个样本维度为5,输出为3个样本,将每个样本扩展成了10维

 model(X).size()
# torch.Size([3, 10])

Pytorch版本线性回归模型

 import torch
from torch import nn
from torch import optim
import numpy as np
from matplotlib import pyplot as plt
 
# 1. 定义数据
x = torch.rand([50,1])
y = x*3 + 0.8
 
#2 .定义模型
class Lr(nn.Module):
    def __init__(self):
        super(Lr,self).__init__()
        # 因为简单的一维线性回归x的特征只有1，我们要预测的y也只有一个特征
        self.linear = nn.Linear(1,1)
    # 定义前向传播过程
    def forward(self, x):
        out = self.linear(x)
        return out
 
# 2. 实例化模型，loss，和优化器
model = Lr()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)
#3. 训练模型
for i in range(30000):
    out = model(x) #3.1 获取预测值
    loss = criterion(y,out) #3.2 计算损失
    optimizer.zero_grad()  #3.3 梯度归零
    loss.backward() #3.4 计算梯度
    optimizer.step()  # 3.5 更新梯度
    if (i+1) % 20 == 0:
        print('Epoch[{}/{}], loss: {:.6f}'.format(i,30000,loss.data))
 
#4. 模型评估
model.eval() #设置模型为评估模式，即预测模式
predict = model(x)
predict = predict.data.numpy()
plt.scatter(x.data.numpy(),y.data.numpy(),c="r")
plt.plot(x.data.numpy(),predict)
plt.show()

posted @ 2023-10-03 16:19 饮一杯天上水阅读(2283) 评论(0) 编辑收藏举报

jzYe

Pytorch nn.Linear的基本用法与原理详解

Pytorch nn.Linear的基本用法与原理详解

nn.Linear的基本定义

实战

Pytorch版本线性回归模型

公告

搜索

常用链接

我的标签

随笔分类

随笔档案

阅读排行榜

推荐排行榜

	torch.nn.Linear(in_features, # 输入的神经元个数
	out_features, # 输出神经元个数
	bias=True # 是否包含偏置
	)

	from torch import nn
	import torch

	model = nn.Linear(2, 1) # 输入特征数为2，输出特征数为1

	input = torch.Tensor([1, 2]) # 给一个样本，该样本有2个特征（这两个特征的值分别为1和2）
	output = model(input)
	output

	tensor([-1.4166], grad_fn=<AddBackward0>)

	# 查看模型参数
	for param in model.parameters():
	print(param)


	Parameter containing:
	tensor([[ 0.1098, -0.5404]], requires_grad=True)
	Parameter containing:
	tensor([-0.4456], requires_grad=True)

	A: [0.1,0.2,0.3,0.3,0.3]
	B: [0.4,0.5,0.6,0.6,0.6]
	C: [0.7,0.8,0.9,0.9,0.9]

	X = torch.Tensor([
	[0.1,0.2,0.3,0.3,0.3],
	[0.4,0.5,0.6,0.6,0.6],
	[0.7,0.8,0.9,0.9,0.9],
	])
	X

	tensor([[0.1000, 0.2000, 0.3000, 0.3000, 0.3000],
	[0.4000, 0.5000, 0.6000, 0.6000, 0.6000],
	[0.7000, 0.8000, 0.9000, 0.9000, 0.9000]])

	import torch
	from torch import nn
	from torch import optim
	import numpy as np
	from matplotlib import pyplot as plt

	# 1. 定义数据
	x = torch.rand([50,1])
	y = x*3 + 0.8

	#2 .定义模型
	class Lr(nn.Module):
	def __init__(self):
	super(Lr,self).__init__()
	# 因为简单的一维线性回归x的特征只有1，我们要预测的y也只有一个特征
	self.linear = nn.Linear(1,1)
	# 定义前向传播过程
	def forward(self, x):
	out = self.linear(x)
	return out

	# 2. 实例化模型，loss，和优化器
	model = Lr()
	criterion = nn.MSELoss()
	optimizer = optim.SGD(model.parameters(), lr=1e-3)
	#3. 训练模型
	for i in range(30000):
	out = model(x) #3.1 获取预测值
	loss = criterion(y,out) #3.2 计算损失
	optimizer.zero_grad() #3.3 梯度归零
	loss.backward() #3.4 计算梯度
	optimizer.step() # 3.5 更新梯度
	if (i+1) % 20 == 0:
	print('Epoch[{}/{}], loss: {:.6f}'.format(i,30000,loss.data))

	#4. 模型评估
	model.eval() #设置模型为评估模式，即预测模式
	predict = model(x)
	predict = predict.data.numpy()
	plt.scatter(x.data.numpy(),y.data.numpy(),c="r")
	plt.plot(x.data.numpy(),predict)
	plt.show()