摘要: 目录概Adam-mini代码 Zhang Y., Chen C., Li Z., Ding T., Wu C., Ye Y., Luo Z. and Sun R. Adam-mini: Use fewer learning rates to gain more. arXiv preprint, 20 阅读全文
posted @ 2024-08-28 15:58 馒头and花卷 阅读(32) 评论(0) 推荐(0) 编辑