损失函数及其梯度

Typical Loss
MSE
- Derivative
- MSE Gradient
Softmax
- Derivative

Typical Loss

Mean Squared Error
Cross Entropy Loss
- binary
- multi-class
- +softmax

MSE

$l o s s = \sum [y - (x w + b)]^{2}$
$L_{2 - n o r m} = | | y - (x w + b) | |_{2}$
$l o s s = n o r m (y - (x w + b))^{2}$

Derivative

$l o s s = \sum [y - f_{θ} (x)]^{2}$
$\frac{\nabla loss}{\nabla θ} = 2 \sum [y - f_{θ} (x)] * \frac{\nabla f_{θ} (x)}{\nabla θ}$

MSE Gradient

import tensorflow as tf

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])
with tf.GradientTape() as tape:

tape.watch([w, b])

prob = tf.nn.softmax(x @ w + b, axis=1)

loss = tf.reduce_mean(tf.losses.MSE(tf.one_hot(y, depth=3), prob))
grads = tape.gradient(loss, [w, b])

grads[0]

<tf.Tensor: id=92, shape=(4, 3), dtype=float32, numpy=
array([[ 0.01156707, -0.00927749, -0.00228957],
       [ 0.03556816, -0.03894382,  0.00337564],
       [-0.02537526,  0.01924876,  0.00612648],
       [-0.0074787 ,  0.00161515,  0.00586352]], dtype=float32)>

grads[1]

<tf.Tensor: id=90, shape=(3,), dtype=float32, numpy=array([-0.01552947,  0.01993286, -0.00440337], dtype=float32)>

Softmax

soft version of max
大的越来越大，小的越来越小、越密集

21-损失函数及其梯度-softmax.jpg

Derivative

p_{i} = \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}

\frac{\partial p_{i}}{\partial a_{j}} = \frac{\partial \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}}{\partial a_{j}} = p_{i} (1 - p_{j})

$i \neq j$

\frac{\partial p_{i}}{\partial a_{j}} = \frac{\partial \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}}{\partial a_{j}} = - p_{j} * p_{i}

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])
with tf.GradientTape() as tape:

tape.watch([w, b])

logits =x @ w + b

loss = tf.reduce_mean(

tf.losses.categorical_crossentropy(tf.one_hot(y, depth=3),

logits,

from_logits=True))
grads = tape.gradient(loss, [w, b])

grads[0]

<tf.Tensor: id=226, shape=(4, 3), dtype=float32, numpy=
array([[-0.38076094,  0.33844548,  0.04231545],
       [-1.0262716 , -0.6730384 ,  1.69931   ],
       [ 0.20613424, -0.50421923,  0.298085  ],
       [ 0.5800004 , -0.22329211, -0.35670823]], dtype=float32)>

grads[1]

<tf.Tensor: id=224, shape=(3,), dtype=float32, numpy=array([-0.3719653 ,  0.53269935, -0.16073406], dtype=float32)>

posted @ 2020-12-11 23:02 ABDM 阅读(364) 评论(0) 编辑收藏举报

刷新页面返回顶部

登鹳雀楼

白日依山尽，黄河入海流。欲穷千里目，更上一层楼。

损失函数及其梯度

Typical Loss

MSE

Derivative

MSE Gradient

Softmax

Derivative

公告

登鹳雀楼

白日依山尽，黄河入海流。 欲穷千里目，更上一层楼。

损失函数及其梯度

Typical Loss

MSE

Derivative

MSE Gradient

Softmax

Derivative

公告

白日依山尽，黄河入海流。欲穷千里目，更上一层楼。