【深度学习】Bi-RNN | GRU | LSTM

https://www.cnblogs.com/zhaopAC/p/10240968.html

基于梯度的神经网络(eg back propagation)的梯度消失

This is not a fundamental problem with neural networks - it's a problem with gradient based learning methods caused by certain activation functions. Let's try to intuitively understand the problem and the cause behind it.

cause: Many common activation functions (e.g sigmoid or tanh) 'squash' their input into a very small output range

-> even a large change in the input will produce a small change in the output - hence the gradient is small.

当有multiple layers,This becomes much worse

每一层的input都被映射到a smaller output region

 As a result, even a large change in the parameters of the first layer doesn't change the output much.

 

 

GRU是如何解决梯度消失与膨胀的?https://www.cnblogs.com/bonelee/p/10475453.html

posted @ 2019-09-20 16:58  SENTIMENT_SONNE  阅读(1028)  评论(0编辑  收藏  举报