神经网络——反向传播算法
神经网络的损失函数为
\[J\left( \Theta \right) = - \frac{1}{m}\left[ {\sum\limits_{i = 1}^m {\sum\limits_{k = 1}^k {y_k^{\left( i \right)}\log {{\left( {{h_\Theta }\left( {{x^{\left( i \right)}}} \right)} \right)}_k} + \left( {1 - y_k^{\left( i \right)}} \right)\log \left( {1 - {{\left( {{h_\Theta }\left( {{x^{\left( i \right)}}} \right)} \right)}_k}} \right)} } } \right] + \frac{\lambda }{{2m}}\sum\limits_{l = 1}^{L - 1} {\sum\limits_{i = 1}^{{s_l}} {\sum\limits_{j = 1}^{{s_{l + 1}}} {{{\left( {\Theta _{ji}^{\left( l \right)}} \right)}^2}} } } \]
我们想要最小化J(Θ)
\[\underbrace {\min }_\Theta J\left( \Theta \right)\]
需要计算
\[J\left( \Theta \right)\]
\[\frac{\partial }{{\partial _{ij}^{\left( l \right)}}}J\left( \Theta \right)\]
问题的关键是计算
\[\frac{\partial }{{\partial _{ij}^{\left( l \right)}}}J\left( \Theta \right)\]
有如下神经网络(省略中间的连线)
以一个样本为例(x, y)
先计算向前传播
\[\begin{array}{l}
{a^{\left( 1 \right)}} = x\\
{z^{\left( 2 \right)}} = {\Theta ^{\left( 1 \right)}}{a^{\left( 1 \right)}}\\
{a^{\left( 2 \right)}} = g\left( {{z^{\left( 2 \right)}}} \right)\left( { + a_0^{\left( 2 \right)}} \right)\\
{z^{\left( 3 \right)}} = {\Theta ^{\left( 2 \right)}}{a^{\left( 2 \right)}}\\
{a^{\left( 3 \right)}} = g\left( {{z^{\left( 3 \right)}}} \right)\left( { + a_0^{\left( 3 \right)}} \right)\\
{z^{\left( 4 \right)}} = {\Theta ^{\left( 3 \right)}}{a^{\left( 3 \right)}}\\
{a^{\left( 3 \right)}} = {h_\Theta }\left( x \right) = g\left( {{z^{\left( 4 \right)}}} \right)
\end{array}\]
反响传播算法
定义
\[\delta _j^{\left( l \right)} = “error” of node j in layer l.\]
\[\delta _j^{\left( l \right)} = 第 l 层的第 j 个节点的“偏差”\]
因此,对于每个输出层单元(L=4)
\[\delta _j^{\left( 4 \right)} = a_J^{\left( 4 \right)} - {y_j}\]
yj是真实值。
接下来计算前面几层的“偏差”
\[\begin{array}{l}
{\delta ^{\left( 3 \right)}} = {\left( {{\Theta ^{\left( 3 \right)}}} \right)^T}{\delta ^{\left( 4 \right)}}. * g'\left( {{z^{\left( 3 \right)}}} \right)\\
{\delta ^{\left( 2 \right)}} = {\left( {{\Theta ^{\left( 2 \right)}}} \right)^T}{\delta ^{\left( 3 \right)}}. * g'\left( {{z^{\left( 2 \right)}}} \right)
\end{array}\]
第一层没有“偏差”
又可以证明(我没有证明)
\[\begin{array}{l}
{\delta ^{\left( 3 \right)}} = {\left( {{\Theta ^{\left( 3 \right)}}} \right)^T}{\delta ^{\left( 4 \right)}}. * \left( {{a^{\left( 3 \right)}}. * \left( {1 - {a^{\left( 3 \right)}}} \right)} \right)\\
{\delta ^{\left( 2 \right)}} = {\left( {{\Theta ^{\left( 2 \right)}}} \right)^T}{\delta ^{\left( 3 \right)}}. * \left( {{a^{\left( 2 \right)}}. * \left( {1 - {a^{\left( 2 \right)}}} \right)} \right)
\end{array}\]
又有
\[\frac{\partial }{{\partial \Theta _{ij}^{\left( l \right)}}}J\left( \Theta \right) = a_j^{\left( l \right)}\delta _i^{\left( {l + 1} \right)}\]
总结反响传播算法
有训练集
\[\left\{ {\left( {{x^{\left( 1 \right)}},{y^{\left( 1 \right)}}} \right),...,\left( {{x^{\left( m \right)}},{y^{\left( m \right)}}} \right)} \right\}\]
1,令
\[\Delta _{ij}^{\left( l \right)} = 0\]
\[\Delta 是\delta的大写 \]
2,计算
For i = 1 to m {
Set a(1) = x(i)
Perform forward propagation to compute a(l) for l = 2, 3,.., L
Using y(i), compute δ(L) = a(L) - y(i)
Compute δ(L-1), δ(L-2),...,δ(2)
\[\Delta _{ij}^{\left( l \right)}: = \Delta _{ij}^{\left( l \right)} + a_j^{\left( l \right)}\delta _i^{\left( {l + 1} \right)}\]
}
3,计算
if j ≠ 0
\[D_{ij}^{\left( l \right)}: = \frac{1}{m}\Delta _{ij}^{\left( l \right)} + \lambda \Theta _{ij}^{\left( l \right)}\]
if j = 0
\[D_{ij}^{\left( l \right)}: = \frac{1}{m}\Delta _{ij}^{\left( l \right)}\]
这里
\[D_{ij}^{\left( l \right)} = \frac{\partial }{{\partial \Theta _{ij}^{\left( l \right)}}}J\left( \Theta \right)\]