2018 年 7月 5 日随笔档案 - dwSun

2018年7月5日 #

1804.03235-Large scale distributed neural network training through online distillation.md

摘要：现有分布式模型训练的模式分布式SGD 并行SGD：大规模训练中，一次的最长时间取决于最慢的机器异步SGD：不同步的数据，有可能导致权重更新向着未知方向并行多模型：多个集群训练不同的模型，再组合最终模型，但是会消耗inference运行时蒸馏：流程复杂 student训练数据集的选择 u 阅读全文

posted @ 2018-07-05 23:40 dwSun 阅读(613) 评论(0) 推荐(0) 编辑

dwSun

导航

公告

1804.03235-Large scale distributed neural network training through online distillation.md