循环神经网络进阶

循环神经网络进阶

GRU

RNN存在的问题:梯度较容易出现衰减或爆炸(BPTT)
⻔控循环神经⽹络:捕捉时间序列中时间步距离较⼤的依赖关系
RNN:

Image Name

\[H_{t} = ϕ(X_{t}W_{xh} + H_{t-1}W_{hh} + b_{h}) \]

GRU:

Image Name

\[R_{t} = σ(X_tW_{xr} + H_{t−1}W_{hr} + b_r)\\ Z_{t} = σ(X_tW_{xz} + H_{t−1}W_{hz} + b_z)\\ \widetilde{H}_t = tanh(X_tW_{xh} + (R_t ⊙H_{t−1})W_{hh} + b_h)\\ H_t = Z_t⊙H_{t−1} + (1−Z_t)⊙\widetilde{H}_t \]

  • 重置⻔有助于捕捉时间序列⾥短期的依赖关系;
  • 更新⻔有助于捕捉时间序列⾥⻓期的依赖关系。

LSTM

长短期记忆long short-term memory:
遗忘门:控制上一时间步的记忆细胞
输入门:控制当前时间步的输入
输出门:控制从记忆细胞到隐藏状态
记忆细胞:⼀种特殊的隐藏状态的信息的流动

Image Name

\[I_t = σ(X_tW_{xi} + H_{t−1}W_{hi} + b_i) \\ F_t = σ(X_tW_{xf} + H_{t−1}W_{hf} + b_f)\\ O_t = σ(X_tW_{xo} + H_{t−1}W_{ho} + b_o)\\ \widetilde{C}_t = tanh(X_tW_{xc} + H_{t−1}W_{hc} + b_c)\\ C_t = F_t ⊙C_{t−1} + I_t ⊙\widetilde{C}_t\\ H_t = O_t⊙tanh(C_t) \]

深度循环神经网络

Image Name

\[\boldsymbol{H}_t^{(1)} = \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(1)} + \boldsymbol{H}_{t-1}^{(1)} \boldsymbol{W}_{hh}^{(1)} + \boldsymbol{b}_h^{(1)})\\ \boldsymbol{H}_t^{(\ell)} = \phi(\boldsymbol{H}_t^{(\ell-1)} \boldsymbol{W}_{xh}^{(\ell)} + \boldsymbol{H}_{t-1}^{(\ell)} \boldsymbol{W}_{hh}^{(\ell)} + \boldsymbol{b}_h^{(\ell)})\\ \boldsymbol{O}_t = \boldsymbol{H}_t^{(L)} \boldsymbol{W}_{hq} + \boldsymbol{b}_q \]

双向循环神经网络

Image Name

\[\begin{aligned} \overrightarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(f)} + \overrightarrow{\boldsymbol{H}}_{t-1} \boldsymbol{W}_{hh}^{(f)} + \boldsymbol{b}_h^{(f)})\\ \overleftarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(b)} + \overleftarrow{\boldsymbol{H}}_{t+1} \boldsymbol{W}_{hh}^{(b)} + \boldsymbol{b}_h^{(b)}) \end{aligned} \]

\[\boldsymbol{H}_t=(\overrightarrow{\boldsymbol{H}}_{t}, \overleftarrow{\boldsymbol{H}}_t) \]

\[\boldsymbol{O}_t = \boldsymbol{H}_t \boldsymbol{W}_{hq} + \boldsymbol{b}_q \]

posted @ 2020-02-18 21:22  yu212223  阅读(121)  评论(0编辑  收藏  举报