阅读笔记:Sybilla DLT任务重启判定系统
摘要:
本篇为Sibylla: To Retry or Not To Retry on Deep Learning Job Failure论文的阅读笔记
阅读笔记:Merak 大模型并行训练系统
摘要:
本篇为Merak: An Efficient Distributed DNN Training Framework With Automated 3D Parallelism for Giant Foundation Models论文的阅读笔记