softmax求导的计算
本文参考于(https://zhuanlan.zhihu.com/p/105758059)
大家可以参考上述链接,更加详细。
- softmax之前的输入为
\(z = [z_1,z_2,...,z_n]\) - 经过softmax之后,
\(a_i = \frac{e^{z_i}}{\sum_{k=1}^{n}e^{z_k}}\)
可得a向量\(a = [\frac{e^{z_1}}{\sum_{k=1}^{n}e^{z_k}},\frac{e^{z_2}}{\sum_{k=1}^{n}e^{z_k}},...,\frac{e^{z_n}}{\sum_{k=1}^{n}e^{z_k}}]\) - 目标向量为
y = [0,0,0,...,1,..0],假设\(y_j=1\)其余均为0 - 损失函数为交叉熵损失
\(L = -\sum_{i=1}^{n}y_i*lna_i\),又其他均为0,故可以简写成\(L = -y_j*lna_j = -lna_j\)
目标是标量L对向量z求导,\(\frac{\partial L}{\partial Z} = \frac{\partial L}{\partial a}*\frac{\partial a}{\partial z}\)
1 求\(\frac{\partial L}{\partial a}\)
由\(L = -lna_j\)得,loss只与a_j有关
\(\frac{\partial L }{\partial a} = [0,0,...,-\frac{1}{a_j},..0]\)
2 求\(\frac{\partial a}{\partial z}\)
a是一个向量,z是一个向量,\(\frac{\partial a}{\partial z} =
\left[
\begin{matrix}
\frac{\partial a_1}{\partial z_1} & \frac{\partial a_1}{\partial z_2} & \cdots & \frac{\partial a_1}{\partial z_n}\\
\frac{\partial a_2}{\partial z_1} & \frac{\partial a_2}{\partial z_2} & \cdots & \frac{\partial a_2}{\partial z_n}\\
\vdots & \vdots & \vdots & \vdots \\
\frac{\partial a_n}{\partial z_1} & \frac{\partial a_n}{\partial z_2} & \cdots & \frac{\partial a_n}{\partial z_n}\\
\end{matrix}
\right]
\)
由于\(\frac{\partial l}{\partial a}\)只有第j列不为0,我们只需要求\(\frac{\partial a}{\partial z}\)的第行,即\(\frac{\partial a_j}{\partial z}\)
\(\frac{\partial L}{\partial Z} = -\frac{1}{a_j}*\frac{\partial a_j}{\partial Z}\),其中\(a_j = \frac{e^{z_j}}{\sum_{i=1}^{n}e^{z_k}}\)
- 当\(i \not= j\)
\(\frac{\partial a_j}{\partial z_i} = \frac{0-e^{z_j}*e^{z_i}}{(\sum_{i=1}^{n}e^{z_k})^2} = -a_j*a_i\)
\(\frac{\partial L}{\partial z_i} = -\frac{1}{a_j}*\frac{\partial a_j}{\partial z} = -\frac{1}{a_j}*(-a_j*a_i) = a_i\) - 当\(i = j\)
\(\frac{\partial a_j}{\partial z_j} = \frac{e^{z_j}*\sum_{i=1}^{n}e^{z_k}-e^{z_j}*e^{z_j}}{(\sum_{i=1}^{n}e^{z_k})^2} = a_j- a_j^2\)
\(\frac{\partial L}{\partial z_j} = (a_j-a_j^2)*(-\frac{1}{a_j}) = a_j-1\)
所以\(\frac{\partial L}{\partial Z} = [a_1,a_2,...a_j-1,..a_n] = [a_1,a_2,,,,a_j,...,a_n] - [0,0,...,1,..0] = a - y\)