[Converge] Larger batch size?

Ref: Effect of batch size on training dynamics

Don’t decay the learning rate increase the batch size

We can often achieve the benefits of decaying the learning rate by instead increasing the batch size during training.

Hypothesis: larger batch sizes don’t generalize as well because the model cannot travel far enough in a reasonable number of training epochs.
Finding: better solutions can be far away from the initial weights and if the loss is averaged over the batch then large batch sizes simply do not allow the model to travel far enough to reach the better solutions for the same number of training epochs.

大家一般都有这样的默认, 调大batch size就要增大学习率, 这是为什么呢?

在比较大batch和小batch时,一般默认都是相同epoch去比较,这样大batch训练iteration次数会更少, 此时如果学习率不做任何调整, 大batch训练更少iteration, 导致拟合程度较低, 精度也会低, 因此需要对学习率做调整, 一个大batch包含样本更多, 避免了小batch包含极端样本的情况, 方差更小, 意味着使用大batch计算下来的梯度方向更可信, 因此可以使用一个更大的learning rate

盲目增大 Batch_Size 有何坏处？

内存利用率提高了，但是内存容量可能撑不住了。

跑完一次 epoch（全数据集）所需的迭代次数减少，要想达到相同的精度，其所花费的时间大大增加了，从而对参数的修正也就显得更加缓慢。

Batch_Size 增大到一定程度，其确定的下降方向已经基本不再变化。

Conclusion: use the default one.

posted @ 2021-08-16 16:12 郝壹贰叁阅读(67) 评论(0) 收藏举报

刷新页面返回顶部

机器学习水很深

We all have two lives. The second one starts when we realize that we only have one. --- Tom Hiddleston

[Converge] Larger batch size?

公告