Keras（二）损失函数loss

本章介绍Keras.loss，损失函数
从功能上分，可以分为以下三类：

Probabilistic losses，主要用于分类
Regression losses，用于回归问题
Hinge losses，又称"maximum-margin"分类，主要用作svm，最大化分割超平面的距离

损失函数的使用

用于模型的compile和fit

一般在模型compile的时候，将其作为loss的参数传进来。
有以下两种写法：

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Activation('softmax'))

loss_fn = keras.losses.SparseCategoricalCrossentropy()
model.compile(loss=loss_fn, optimizer='adam')

# pass optimizer by name: default parameters will be used
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

单独使用

损失函数也可以单独使用，只需要传入两个tensor作为y_true和y_pred输入即可，还有一个sample_weights为可选参数

tf.keras.losses.mean_squared_error(tf.ones((2, 2,)), tf.zeros((2, 2)))

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>

自定义loss

我们可以自定义一个函数用于loss的参数，定义也非常简单，只需要一个输入tensor和一个输出tensor即可

def my_loss_fn(y_true, y_pred):
    squared_difference = tf.square(y_true - y_pred)
    return tf.reduce_mean(squared_difference, axis=-1)  # Note the `axis=-1`

model.compile(optimizer='adam', loss=my_loss_fn)

add_loss()

add_loss是tf.layers提供的函数，我们在创建自定义的layer时，可以在call中调用这个函数，可以将损失添加到训练的过程中。

from tensorflow.keras.layers import Layer

class MyActivityRegularizer(Layer):
  """Layer that creates an activity sparsity regularization loss."""

  def __init__(self, rate=1e-2):
    super(MyActivityRegularizer, self).__init__()
    self.rate = rate

  def call(self, inputs):
    # We use `add_loss` to create a regularization loss
    # that depends on the inputs.
    self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
    return inputs

class SparseMLP(Layer):
  """Stack of Linear layers with a sparsity regularization loss."""

  def __init__(self, output_dim):
      super(SparseMLP, self).__init__()
      self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
      self.regularization = MyActivityRegularizer(1e-2)
      self.dense_2 = layers.Dense(output_dim)

  def call(self, inputs):
      x = self.dense_1(inputs)
      x = self.regularization(x)
      return self.dense_2(x)


mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))

print(mlp.losses)  # List containing one float32 scalar

[<tf.Tensor: shape=(), dtype=float32, numpy=0.800574>]

损失函数类型

Probabilistic losses

对于分类概率问题常用交叉熵来作为损失函数。

BinaryCrossentropy（BCE）

BinaryCrossentropy用于0,1类型的交叉熵。计算公式：

交叉熵描述了两个概率分布之间的距离，当交叉熵越小说明二者之间越接近。
关于BinaryCrossentropy的原理可以参考：
https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a
示例：

bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)
y_pred = [0, 1, 0, 0]
y_val = [0.1, 1.29, -1, -6.2]

bce(y_pred, y_val).numpy()

0.32571107

其中，from_logits在使用样本取[-inf, inf]，用正负代表接近0,1时取True，在样本取值在[0,1]时取False。

CategoricalCrossentropy（CCE）

CategoricalCrossentropy和SparseCategorialCrossentropy都用于多分类器，而CategoricalCrossentropy通常用于One-hot分类的问题中

cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
print(cce(y_true, y_pred).numpy())
y_pred = [[0.05, 0.95, 0], [0.1, 0.1, 0.8]]
print(cce(y_true, y_pred).numpy())

1.1769392
0.13721842

SparseCategorialCrossentropy（SCCE）

SparseCategorialCrossentropy用于数值标签的多分类器

y_true = [1, 2]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
# Using 'auto'/'sum_over_batch_size' reduction type.
scce = tf.keras.losses.SparseCategoricalCrossentropy()
print(scce(y_true, y_pred).numpy())
y_pred = [[0.05, 0.95, 0], [0.1, 0.1, 0.8]]
print(scce(y_true, y_pred).numpy())

1.1769392
0.13721849

Poisson

泊松损失，值为loss = y_pred - y_true * log(y_pred)

真实标签服从泊松分布的负对数似然损失。

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
p = tf.keras.losses.Poisson()
print(p(y_true, y_pred).numpy())
y_pred = [[0., 1.], [0., 0.]]
print(p(y_true, y_pred).numpy())

0.49999997
0.24999997

KLDivergence

Kullback-Leibler 散度损失，公式：loss = y_true * log(y_true / y_pred)

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
p = tf.keras.losses.KLDivergence()
print(p(y_true, y_pred).numpy())
y_pred = [[0., 1.], [0., 0.]]
print(p(y_true, y_pred).numpy())

-8.059048e-07
0.0

Regression losses

下面介绍回归问题中常见的损失函数。

MeanSquaredError（MSE）

最常见的损失函数，均方差loss = square(y_true - y_pred)

y_true = [0.8, 0.1, 0.3]
y_pred = [0.81, 0.101, 0.45]
mse = tf.keras.losses.MeanSquaredError()
print(mse(y_true, y_pred).numpy())

0.0075336643

MeanAbsoluteError（MAE）

loss = abs(y_true - y_pred)

y_true = [0.8, 0.1, 0.3]
y_pred = [0.81, 0.101, 0.45]
mae = tf.keras.losses.MeanAbsoluteError()
print(mae(y_true, y_pred).numpy())

0.053666655

MeanAbsolutePercentageError（MAPE）

loss = 100 * abs((y_true - y_pred) / y_true)

y_true = [0.8, 0.1, 0.3]
y_pred = [0.81, 0.101, 0.45]
mape = tf.keras.losses.MeanAbsolutePercentageError()
print(mape(y_true, y_pred).numpy())

17.416664

MeanSquaredLogarithmicError（MSLE）

loss = square(log(y_true + 1.) - log(y_pred + 1.))

y_true = [0.8, 0.1, 0.3]
y_pred = [0.81, 0.101, 0.45]
msle = tf.keras.losses.MeanSquaredLogarithmicError()
print(msle(y_true, y_pred).numpy())

0.003985339

CosineSimilarity

loss = -sum(l2_norm(y_true) * l2_norm(y_pred))
注意，它是一个介于-1和1之间的数字。当它是一个介于-1和0之间的负数时，0表示正交，接近-1的值表示更大的相似性。越接近1的值表示差异越大。

y_true = [0.8, 0.1, 0.3]
y_pred = [0.81, 0.101, 0.45]
cs = tf.keras.losses.CosineSimilarity()
print(cs(y_true, y_pred).numpy())

-0.9891268

Huber

Huber Loss 是一个用于回归问题的带参损失函数, 优点是能增强平方误差损失函数(MSE, mean square error)对离群点的鲁棒性。公式：

其中参数a即residual，y_pred - y_true, 在Keras中δ=1

y_true = [0.8, 0.1, 0.3]
y_pred = [0.81, 0.101, 0.45]
huber = tf.keras.losses.Huber()
print(huber(y_true, y_pred).numpy())

0.0037668322

LogCosh

logcosh = log((exp(x) + exp(-x))/2), 其中 x = y_pred - y_true

y_true = [0.8, 0.1, 0.3]
y_pred = [0.81, 0.101, 0.45]
lc = tf.keras.losses.LogCosh()
print(lc(y_true, y_pred).numpy())

0.0037528474

HingeLoss

Hinge

HingeLoss的定义：
loss = maximum(1 - y_true * y_pred, 0)

y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
h = tf.keras.losses.Hinge()
print(h(y_true, y_pred).numpy())

1.3

SquaredHinge

Hinge的基础上平方
loss = square(maximum(1 - y_true * y_pred, 0))

y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
h = tf.keras.losses.SquaredHinge()
h(y_true, y_pred).numpy()

1.86

CategoricalHinge

neg=max((1−正确值)×预测值)
pos=∑(正确值 × 预测值)
loss=max(neg−pos+1,0)

y_true = np.random.randint(0, 3, size=(2,))
y_true = tf.keras.utils.to_categorical(y_true, num_classes=3)
y_pred = np.random.random(size=(2, 3))
loss = tf.keras.losses.categorical_hinge(y_true, y_pred)
print(loss.numpy())
assert loss.shape == (2,)
pos = np.sum(y_true * y_pred, axis=-1)
neg = np.amax((1. - y_true) * y_pred, axis=-1)
assert np.array_equal(loss.numpy(), np.maximum(0., neg - pos + 1.))

posted @ 2022-06-03 21:55 Asp1rant 阅读(1657) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Asp1rant