Derivative of Softmax Loss Function

Derivative of Softmax Loss Function

A softmax classifier:

\[p_j = \frac{\exp{o_j}}{\sum_{k}\exp{o_k}} \]

It has been used in a loss function of the form

\[L = - \sum_{j} y_j \log p_j \]

where \(o\) is a vector. We need the derivative of \(L\) with respect to \(o\). We can get the partial of \(o_i\) :

\[\frac{\partial{p_j}}{\partial{o_i}} = p_i (1-p_i), \quad i = j \\ \frac{\partial{p_j}}{\partial{o_i}} = - p_i p_j, \quad i \ne j \]

Hence the derivative of Loss with respect to \(o\) is:

\[\begin{align} \frac{\partial{L}}{\partial{o_i}} & = - \sum_k y_k \frac{\partial{\log p_k}}{\partial{o_i}} \\ & = - \sum_k y_k \frac{1}{p_k} \frac{\partial{p_k}}{\partial{o_i}} \\ & = -y_i(1-p_i) - \sum_{k\ne i} y_k \frac{1}{p_k} (-p_kp_i) \\ & = -y_i + y_i p_i + \sum_{k\ne i} y_k p_i \\ & = p_i (\sum_k y_k) - y_i \\ \end{align} \]

Given that \(\sum_k y_k = 1\) as \(y\) is a vector with only one non-zero element, which is 1. By other words, this is a classification problem.

\[\frac{\partial L}{\partial o_i} = p_i - y_i \]

Reference

Derivative of Softmax loss function

posted @ 2019-03-21 16:42  健康平安快乐  阅读(319)  评论(0编辑  收藏  举报