08 2024 档案

transformer

摘要：论文：Attention is All You Need：Attention Is All You Need1. Transformer 整体结构首先介绍 Transformer 的整体结构，下图是 Transformer 用于中英文翻译的整体结构：可以看到transformer由Encoder和D 阅读全文

posted @ 2024-08-31 12:20 牛犁heart 阅读(65) 评论(0) 推荐(0) 编辑

ZeRO：一种去除冗余的数据并行方案

摘要：ZeRO：一种去除冗余的数据并行方案目前训练超大规模语言模型主要有两条技术路线： TPU + XLA + TensorFlow/JAX GPU + Pytorch + Megatron + DeepSpeed 前者由Google主导，由于TPU和自家云平台GCP深度绑定，对于非Googler来说并阅读全文

posted @ 2024-08-04 19:24 牛犁heart 阅读(544) 评论(0) 推荐(1) 编辑

牛犁heart

Stay Hungry，Stay Foolilsh

08 2024 档案

公告