【深度学习】Bi-RNN | GRU | LSTM

https://www.cnblogs.com/zhaopAC/p/10240968.html

基于梯度的神经网络（eg back propagation）的梯度消失

This is not a fundamental problem with neural networks - it's a problem with gradient based learning methods caused by certain activation functions. Let's try to intuitively understand the problem and the cause behind it.

cause： Many common activation functions (e.g sigmoid or tanh) 'squash' their input into a very small output range

-> even a large change in the input will produce a small change in the output - hence the gradient is small.

当有multiple layers，This becomes much worse

每一层的input都被映射到a smaller output region

As a result, even a large change in the parameters of the first layer doesn't change the output much.

GRU是如何解决梯度消失与膨胀的？https://www.cnblogs.com/bonelee/p/10475453.html

posted @ 2019-09-20 16:58 SENTIMENT_SONNE 阅读(1028) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

SENTIMENT_SONNE

【深度学习】Bi-RNN | GRU | LSTM

公告