06 2018 档案
摘要:AlexNet (2012) The network had a very similar architecture as LeNet by Yann LeCun et al but was deeper, with more filters per layer, and with stacked
阅读全文
摘要:Momentum:解快了收敛速度,同时也减弱了SGD的波动 NAG: 减速了Momentum更新参数太快 Adagrad: 出现频率较低参数采用较大的更新,对于出现频率较高的参数采用较小的,不共用一个学习率 Adadelta:解决了Adagrad后续学习率为0的缺点,同时不要defalut 学习率
阅读全文