PyTorch 损失函数

PyTorch 损失函数 Loss Functions

0 概述

PyTorch 中，损失函数有两种形式，与激活函数类似：

层（模块）的形式：需要先定义，再使用。为 torch.nn 模块下的类，官方文档
函数形式：直接使用。为 torch.nn.functional 模块中的函数，官方文档

PyTorch 中，对于激活层有以下性质

没有学习参数

PyTorch 中，损失函数，y_pred 在前，y_true 在后，与 TensorFlow 相反

实例：引入相关模块

import torch
import torch.nn as nn
import torch.nn.functional as F

0.1 主要参数

input: 模型输出，即模型预测值
target: 目标值，即实际值。input 和 target 的第一个维度的尺寸应该相等，为批样本数量，即 input.size()[0] = target.size()[0] = batch_size
reduction: {'none', 'mean', 'sum'}，默认为 'mean'
- 'none': 不做任何处理，返回的为每个样本的损失函数
- 'mean': 返回所有样本损失函数的平均值，默认值
- 'sum': 返回所有样本损失函数的总和

0.2 批损失函数与总损失函数

注意：在模型训练的过程，往往通过批（batch）加载数据，计算损失，每一批样本的损失被定义为批损失（batch loss）。因此，需要注意每一 epoch 的总损失（total loss）的实际意义。如：如果设置 reduction='mean'，则总损失函数表示为所有 batch 的平均损失的和，而不是训练集的平均样本损失。

实例：batch loss 与 total loss

from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader
torch.manual_seed(0)

n_sample = 100  # 随机生成 100 对数据
y_pred = torch.rand(n_sample)
y_true = torch.rand(n_sample)
print('MSE', F.mse_loss(y_pred, y_true).item())

dl = DataLoader(TensorDataset(y_pred, y_true), batch_size=17)
# 每一 batch 的损失函数设置 reduction='mean'
total_loss = 0.
for i, (y_pred_batch, y_true_batch) in enumerate(dl):                       
    batch_loss = loss_func(y_pred_batch, y_true_batch, reduction='mean')  # 返回的 loss 为 一个 batch 的平均损失
    total_loss += batch_loss.item()  # total lost 为所有 batch 的平均损失的和，而不是整个训练集的平均损失
print('Sum of batch loss:', total_loss)
# 每一 batch 的损失函数设置 reduction='sum'
total_loss = 0.
for i, (y_pred_batch, y_true_batch) in enumerate(dl):                      
    batch_loss = loss_func(y_pred_batch, y_true_batch, reduction='sum')  # 返回的 loss 为 一个 batch 的总损失
    total_loss += batch_loss.item()  # total lost 为所有 batch 的平均损失的和
total_loss = total_loss / n_sample
print('MSE:', total_loss)

1. 回归损失

1.0 定义符号

定义以下变量：

$n=1,2,\cdots,N$：样本索引，样本总数为 $N$
$x_n$：样本 $n$ 的预测值（即模型输出值）
$y_n$：样本 $n$ 的实际值
$l_n$：样本 $n$ 的某种误差 $l_n = l(x_n,y_n)$
$w_n$：样本 $n$ 的权重
$L$：样本集的总体误差（平均误差，总和误差，等）

1.1 `nn.MSELoss`

nn.MSELoss：均方误差（Mean squared error，MSE）损失函数，或者称为平方 L2（squared L2 norm）损失函数

等价函数形式：F.mse_loss()

A. 计算公式

对于每个样本的损失函数（或称为误差）为：

\[l_n \triangleq l(x_n,y_n) = (x_n-y_n)^2 \]

其中，$x_n$ 和 $y_n$ 分别表示第 $n$ 个样本的输出值（预测值）和实际值。

B. 主要参数

reduction：string of 'none', 'mean', 'sum'；default is mean

'none'：保留所有样本的损失值，不做进一步处理
'mean'：默认值；最终的计算结果为所有样本的误差的平均值：$L = \frac{1}{N} \sum_{n} l_n $
'sum'：最终的计算结果为所有样本的误差的之和：$L = \sum_{n} l_n $

实例：reduction='mean' 与 reduction='none' 对比

input = torch.randn(3, 5)
target = torch.randn(3, 5)

# 损失函数 1：reduction='mean'
output_1 = nn.MSELoss()(input, target)
print(output_1.size(), output_1)
# Output 1：torch.Size([]) tensor(1.1773)

# 损失函数 2：reduction='none'
output_2 = nn.MSELoss(reduction='none')(input, target)
print(output_2.size(), output_2)
# Output 2：torch.Size([3, 5]) tensor([[...

1.2 `nn.L1Loss`

nn.L1Loss：平均绝对值误差（mean absolute error，MAE）损失函数，或者称为 L1（L1 norm）损失函数。其计算公式为：

等价函数形式：F.l1_loss()

A. 计算公式

\[l_n \triangleq l(x_n,y_n) = |x_n-y_n| \]

B. 主要参数

主要参数:

reduction：同上

2 二分类损失

在设计神经网络时，需要注意根据模型的输出来确定选择哪个损失函数：

nn.BCELoss() 损失函数要求 $x_n$（即，模型的最终输出）在 0 到 1 之间
nn.BCEWithLogitsLoss() 由于会先对 $x_n$ 做 sigmoid 激活，将其映射在 0 到 1 之间，所以 $x_n$ 的取值没有限制

输入：

Input: $x_n$
Target: $y_n$

2.1 二分类交叉熵 `nn.BCELoss()`

nn.BCELoss ：二分类交叉熵（Binary Cross Entropy）损失函数

A. 计算公式

\[\begin{aligned} l_n(x_n, y_n) &= −w_n \cdot [y_n \cdot \ln x_n + (1−y_n) \cdot \ln(1−x_n)] \\ &= \begin{cases} −w_n \cdot \ln x_n, & \text{When } y_n=1 \\ −w_n \cdot \ln(1−x_n), & \text{When } y_n=0 \end{cases} \end{aligned} \]

其中：预测值 $x_n \in (0, 1)$ 为非负实数（概率值），实际值 $y_n \in \{0, 1\}$ 为 0-1 二元值。

B. 主要参数

主要参数：

weight：样本的权重
reduction：同上

2.2 `nn.BCEWithLogitsLoss()`

nn.BCEWithLogitsLoss ：对输入数据先进行 sigmoid 激活，再计算二分类交叉损失。

A. 计算公式

\[\begin{aligned} l_n(x_n, y_n) &= −w_n \cdot \left[y_n \cdot \ln \sigma(x_n) + (1−y_n) \cdot \ln(1−\sigma(x_n)) \right] \\ & = \begin{cases} −w_n \cdot \ln \sigma(x_n), & \text{When } y_n=1 \\ −w_n \cdot \ln \sigma(1−x_n), & \text{When } y_n=0 \end{cases} \end{aligned} \]

B. 主要参数

主要参数：

weight：样本权重
reduction：同上

实例：使用二分类损失的 4 种方式

torch.manual_seed(0)  # 固定随机数种子
input = torch.rand(5)  # 生成5个 [0,1) 之间的随机数
target = torch.empty(5).random_(2)
print(input, target)

# 方式一：使用层类：nn.BCEWithLogitsLoss() 
loss_layer1 = nn.BCEWithLogitsLoss()
output_1 = loss_layer1(input, target)
print(output_1)

# 方式二：使用层类：nn.BCELoss() + nn.Sigmoid()
loss_layer2 = nn.BCELoss()
sigmoid_layer = nn.Sigmoid()
input_2 = sigmoid_layer(input)
output_2 = loss_layer2(input_2, target)
print(output_2)

# 方式三：使用函数：F.binary_cross_entropy_with_logits() 
output_3 = F.binary_cross_entropy_with_logits(input, target)
print(output_3)

# 方式四：使用函数：torch.sigmoid() + F.binary_cross_entropy() 
input_4 = torch.sigmoid(input)
output_4 = F.binary_cross_entropy(input_4, target)
print(output_4)

3. 多分类损失

3.0 主要符号

$C$：样本类别总数
$x_n = \left[y_n^{(0)}, \cdots, y_n^{(C-1)} \right]^{\top} \in \mathbb{R}^{C}$: 样本 $n$ 属于各个类别的概率（即，为模型的输出）
$y_n = \left[y_n^{(0)}, \cdots, y_n^{(C-1)} \right]^{\top} \in \mathbb{R}^{C}$: 样本 $n$ 的实际所属类别；0-1 变量（即 one-hot 向量），即 $y_n^{(c)} \in \{0, 1 \}$ 且 $\sum \limits_{c=0}^{C-1} y_n^{(c)} = 1$
在 pytorch 中，$y_n$ 为类别变量（categorical variable），即 $y_n \in \{0,1,2,\cdots, C-1\}$

3.1 `nn.CrossEntropyLoss`

nn.CrossEntropyLoss：多分类交叉熵损失函数。

等价函数形式：F.cross_entropy()

A. 计算公式

交叉熵损失函数 $l_n$ 定义为：

\[l_n = l_n(\boldsymbol{x}_n,\boldsymbol{y}_n) = - \sum_{c=0}^{C-1} y_n^{(c)} \ln p_n^{(c)} = - \ln p_n^{(c^*)} \]

其中：$c^*$ 表示 $\boldsymbol{y}_n$ 中 $y_{n}^{(c)} = 1$ 的索引，$p_n^{(c)}$ 表示样本 $n$ 属于类别 $c$ 的概率：

\[p_{n}^{(c)} = \text{softmax}(Y=c|\boldsymbol{x}_n) = \frac{ \exp(\boldsymbol{w}^{\mathrm T}_{c} \boldsymbol{x}_{n} + b_c)}{\sum \limits_{i=0}^{C-1} \exp( \boldsymbol{w}^{\mathrm T}_{i} \boldsymbol{x}_{n} + b_i)}, \qquad c = 0,1,\cdots,C-1 \]

在 pytorch 的 nn.CrossEntropyLoss 类中，真实标签 $y_n$ 是为类别变量（categorical variable），即 $y_n \in \{0,1,2,\cdots, C-1\}$。则损失函数 $l_n$ 的计算公式为：

\[l_n = -w_{y_n} \cdot \ln \left[ \frac{\exp(x_{n,y_n})}{\sum_{c=0}^{C-1} \exp(x_{n,c})} \right] \cdot 1 \{y_n \neq \text{ignore_index} \} \]

上式中，$w_n$ 为样本权重；$\frac{\exp(x_{n,y_n})}{\sum_{c=0}^{C-1} \exp(x_{n,c})}$ 表示样本 $n$ 属于 $y_n$ 类别的概率；ignore_index 为函数的参数，指定被忽略的标签。数据集的总损失函数为：

\[L = \begin{cases} \sum \limits_{n=1}^{N} \frac{1}{\sum \limits_{n=1}^{N} w_{y_n} \cdot 1 \{y_n \neq \text{ignore_index} \}} l_n & \text{if reduction='mean'} \\ \sum \limits_{n=1}^{N} l_n, & \text{if reduction='sum'} \end{cases} \]

B. 主要参数

主要参数：

weight：tensor 类型，size of $C$；类别权重
ignore_index：int 类型
reduction：同前
label_smoothing：label smoothing 技术

输入输出：

输入：
- input：shape of $(C)$, 或 $(N,C)$, 或 $(N, C, d_1, d_2, \cdots, d_K)$
- target：shape of $()$ （即标量）, 或 $(N)$, 或 $(N, C, d_1, d_2, \cdots, d_K)$。
  - target：为类别变量（注意不是 one-hot 变量）

3.2 `torch.nn.NLLLoss`

torch.nn.NLLLoss：负对数似然损失（negative log likelihood loss）

等价函数形式：F.nll_loss()

A. 计算公式

\[l_n = - w_{y_n} \cdot x_{n,y_n} \cdot 1 \{y_n \neq \text{ignore_index} \} \]

可以看出，torch.nn.NLLLoss 与 nn.CrossEntropyLoss 类似，区别在于后者将 $x_{n,y_n}$ 通过 logsoftmax 激活。

B. 主要参数

主要参数：

同 nn.CrossEntropyLoss

实例：使用多分类损失的 4 种方式

input = torch.randn(7, 3)
target = torch.empty(7, dtype=torch.long).random_(3)
print(input)
print(target)

# 方式 1：nn.CrossEntropyLoss()
output_1 = nn.CrossEntropyLoss()(input, target)

# 方式 2：nn.LogSoftmax() + nn.NLLLoss() 
input_2 = nn.LogSoftmax(dim=1)(input)
output_2 = nn.NLLLoss()(input_2, target)

# 方式 3：cross_entropy()
output_3 = F.cross_entropy(input, target)

# 方式 4：torch.log_softmax() + F.nll_loss()
input_4 = torch.log_softmax(input, dim=1)
output_4 = F.nll_loss(input_4, target)

参考资料

文中代码：Colab, Github

posted @ 2022-05-26 19:23 veager 阅读(692) 评论(0) 编辑收藏举报

刷新页面返回顶部

veager

PyTorch 损失函数

0 概述

0.1 主要参数

0.2 批损失函数与总损失函数

1. 回归损失

1.0 定义符号

1.1 nn.MSELoss

A. 计算公式

B. 主要参数

1.2 nn.L1Loss

A. 计算公式

B. 主要参数

2 二分类损失

2.1 二分类交叉熵 nn.BCELoss()

A. 计算公式

B. 主要参数

2.2 nn.BCEWithLogitsLoss()

A. 计算公式

B. 主要参数

3. 多分类损失

3.0 主要符号

3.1 nn.CrossEntropyLoss

A. 计算公式

B. 主要参数

3.2 torch.nn.NLLLoss

A. 计算公式

B. 主要参数

参考资料

1.1 `nn.MSELoss`

1.2 `nn.L1Loss`

2.1 二分类交叉熵 `nn.BCELoss()`

2.2 `nn.BCEWithLogitsLoss()`

3.1 `nn.CrossEntropyLoss`

3.2 `torch.nn.NLLLoss`