batch size 会影响训练的计算效率吗?
We also use a smaller mini-batch size of 256 without any noticeable performance degradation. This is in contrast to CURL and DrQ that both use a larger batch size of 512 to attain more stable training in the expense of computational efficiency.
----from Drqv-2
这里不是很明白,难道更大的batch size会影响学习的速度吗?