batch size 会影响训练的计算效率吗?

We also use a smaller mini-batch size of 256 without any noticeable performance degradation. This is in contrast to CURL and DrQ that both use a larger batch size of 512 to attain more stable training in the expense of computational efficiency.

                                  ----from Drqv-2

 

这里不是很明白,难道更大的batch size会影响学习的速度吗?

posted @ 2022-08-11 18:53  呦呦南山  阅读(38)  评论(0编辑  收藏  举报