神经网络——反向传播算法

神经网络的损失函数为

\[J\left( \Theta  \right) =  - \frac{1}{m}\left[ {\sum\limits_{i = 1}^m {\sum\limits_{k = 1}^k {y_k^{\left( i \right)}\log {{\left( {{h_\Theta }\left( {{x^{\left( i \right)}}} \right)} \right)}_k} + \left( {1 - y_k^{\left( i \right)}} \right)\log \left( {1 - {{\left( {{h_\Theta }\left( {{x^{\left( i \right)}}} \right)} \right)}_k}} \right)} } } \right] + \frac{\lambda }{{2m}}\sum\limits_{l = 1}^{L - 1} {\sum\limits_{i = 1}^{{s_l}} {\sum\limits_{j = 1}^{{s_{l + 1}}} {{{\left( {\Theta _{ji}^{\left( l \right)}} \right)}^2}} } } \]

我们想要最小化J(Θ)

\[\underbrace {\min }_\Theta J\left( \Theta  \right)\]

需要计算

\[J\left( \Theta  \right)\]

\[\frac{\partial }{{\partial _{ij}^{\left( l \right)}}}J\left( \Theta  \right)\]


问题的关键是计算

\[\frac{\partial }{{\partial _{ij}^{\left( l \right)}}}J\left( \Theta  \right)\]


有如下神经网络(省略中间的连线)

以一个样本为例(x, y)

先计算向前传播

\[\begin{array}{l}
{a^{\left( 1 \right)}} = x\\
{z^{\left( 2 \right)}} = {\Theta ^{\left( 1 \right)}}{a^{\left( 1 \right)}}\\
{a^{\left( 2 \right)}} = g\left( {{z^{\left( 2 \right)}}} \right)\left( { + a_0^{\left( 2 \right)}} \right)\\
{z^{\left( 3 \right)}} = {\Theta ^{\left( 2 \right)}}{a^{\left( 2 \right)}}\\
{a^{\left( 3 \right)}} = g\left( {{z^{\left( 3 \right)}}} \right)\left( { + a_0^{\left( 3 \right)}} \right)\\
{z^{\left( 4 \right)}} = {\Theta ^{\left( 3 \right)}}{a^{\left( 3 \right)}}\\
{a^{\left( 3 \right)}} = {h_\Theta }\left( x \right) = g\left( {{z^{\left( 4 \right)}}} \right)
\end{array}\]

 反响传播算法

定义

\[\delta _j^{\left( l \right)} = “error” of node j in layer l.\]

\[\delta _j^{\left( l \right)} = 第 l 层的第 j 个节点的“偏差”\]

因此,对于每个输出层单元(L=4)

\[\delta _j^{\left( 4 \right)} = a_J^{\left( 4 \right)} - {y_j}\]

yj是真实值。

接下来计算前面几层的“偏差”

\[\begin{array}{l}
{\delta ^{\left( 3 \right)}} = {\left( {{\Theta ^{\left( 3 \right)}}} \right)^T}{\delta ^{\left( 4 \right)}}. * g'\left( {{z^{\left( 3 \right)}}} \right)\\
{\delta ^{\left( 2 \right)}} = {\left( {{\Theta ^{\left( 2 \right)}}} \right)^T}{\delta ^{\left( 3 \right)}}. * g'\left( {{z^{\left( 2 \right)}}} \right)
\end{array}\]

第一层没有“偏差”

又可以证明(我没有证明)

\[\begin{array}{l}
{\delta ^{\left( 3 \right)}} = {\left( {{\Theta ^{\left( 3 \right)}}} \right)^T}{\delta ^{\left( 4 \right)}}. * \left( {{a^{\left( 3 \right)}}. * \left( {1 - {a^{\left( 3 \right)}}} \right)} \right)\\
{\delta ^{\left( 2 \right)}} = {\left( {{\Theta ^{\left( 2 \right)}}} \right)^T}{\delta ^{\left( 3 \right)}}. * \left( {{a^{\left( 2 \right)}}. * \left( {1 - {a^{\left( 2 \right)}}} \right)} \right)
\end{array}\]

又有

\[\frac{\partial }{{\partial \Theta _{ij}^{\left( l \right)}}}J\left( \Theta  \right) = a_j^{\left( l \right)}\delta _i^{\left( {l + 1} \right)}\]


总结反响传播算法

有训练集 

\[\left\{ {\left( {{x^{\left( 1 \right)}},{y^{\left( 1 \right)}}} \right),...,\left( {{x^{\left( m \right)}},{y^{\left( m \right)}}} \right)} \right\}\]

1,令

\[\Delta _{ij}^{\left( l \right)} = 0\]

\[\Delta 是\delta的大写 \]

2,计算

For i = 1 to m {

  Set a(1) = x(i)

  Perform forward propagation to compute a(l) for l = 2, 3,.., L

  Using y(i), compute δ(L) = a(L) - y(i)

  Compute δ(L-1), δ(L-2),...,δ(2)

  \[\Delta _{ij}^{\left( l \right)}: = \Delta _{ij}^{\left( l \right)} + a_j^{\left( l \right)}\delta _i^{\left( {l + 1} \right)}\]

 }

3,计算

if j ≠ 0

\[D_{ij}^{\left( l \right)}: = \frac{1}{m}\Delta _{ij}^{\left( l \right)} + \lambda \Theta _{ij}^{\left( l \right)}\]

if j = 0

\[D_{ij}^{\left( l \right)}: = \frac{1}{m}\Delta _{ij}^{\left( l \right)}\]

这里

\[D_{ij}^{\left( l \right)} = \frac{\partial }{{\partial \Theta _{ij}^{\left( l \right)}}}J\left( \Theta  \right)\]

posted @ 2018-10-29 19:49  qkloveslife  阅读(231)  评论(0编辑  收藏  举报