softmax求导的计算

本文参考于(https://zhuanlan.zhihu.com/p/105758059)
大家可以参考上述链接,更加详细。

  • softmax之前的输入为
    \(z = [z_1,z_2,...,z_n]\)
  • 经过softmax之后,
    \(a_i = \frac{e^{z_i}}{\sum_{k=1}^{n}e^{z_k}}\)
    可得a向量\(a = [\frac{e^{z_1}}{\sum_{k=1}^{n}e^{z_k}},\frac{e^{z_2}}{\sum_{k=1}^{n}e^{z_k}},...,\frac{e^{z_n}}{\sum_{k=1}^{n}e^{z_k}}]\)
  • 目标向量为
    y = [0,0,0,...,1,..0],假设\(y_j=1\)其余均为0
  • 损失函数为交叉熵损失
    \(L = -\sum_{i=1}^{n}y_i*lna_i\),又其他均为0,故可以简写成\(L = -y_j*lna_j = -lna_j\)

目标是标量L对向量z求导,\(\frac{\partial L}{\partial Z} = \frac{\partial L}{\partial a}*\frac{\partial a}{\partial z}\)

1 求\(\frac{\partial L}{\partial a}\)

\(L = -lna_j\)得,loss只与a_j有关
\(\frac{\partial L }{\partial a} = [0,0,...,-\frac{1}{a_j},..0]\)

2 求\(\frac{\partial a}{\partial z}\)

a是一个向量,z是一个向量,\(\frac{\partial a}{\partial z} = \left[ \begin{matrix} \frac{\partial a_1}{\partial z_1} & \frac{\partial a_1}{\partial z_2} & \cdots & \frac{\partial a_1}{\partial z_n}\\ \frac{\partial a_2}{\partial z_1} & \frac{\partial a_2}{\partial z_2} & \cdots & \frac{\partial a_2}{\partial z_n}\\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial a_n}{\partial z_1} & \frac{\partial a_n}{\partial z_2} & \cdots & \frac{\partial a_n}{\partial z_n}\\ \end{matrix} \right] \)
由于\(\frac{\partial l}{\partial a}\)只有第j列不为0,我们只需要求\(\frac{\partial a}{\partial z}\)的第行,即\(\frac{\partial a_j}{\partial z}\)
\(\frac{\partial L}{\partial Z} = -\frac{1}{a_j}*\frac{\partial a_j}{\partial Z}\),其中\(a_j = \frac{e^{z_j}}{\sum_{i=1}^{n}e^{z_k}}\)

  • \(i \not= j\)
    \(\frac{\partial a_j}{\partial z_i} = \frac{0-e^{z_j}*e^{z_i}}{(\sum_{i=1}^{n}e^{z_k})^2} = -a_j*a_i\)
    \(\frac{\partial L}{\partial z_i} = -\frac{1}{a_j}*\frac{\partial a_j}{\partial z} = -\frac{1}{a_j}*(-a_j*a_i) = a_i\)
  • \(i = j\)
    \(\frac{\partial a_j}{\partial z_j} = \frac{e^{z_j}*\sum_{i=1}^{n}e^{z_k}-e^{z_j}*e^{z_j}}{(\sum_{i=1}^{n}e^{z_k})^2} = a_j- a_j^2\)
    \(\frac{\partial L}{\partial z_j} = (a_j-a_j^2)*(-\frac{1}{a_j}) = a_j-1\)

所以\(\frac{\partial L}{\partial Z} = [a_1,a_2,...a_j-1,..a_n] = [a_1,a_2,,,,a_j,...,a_n] - [0,0,...,1,..0] = a - y\)

posted @ 2021-10-18 11:01  爱吃西瓜的菜鸟  阅读(308)  评论(0编辑  收藏  举报