各个语言模型大小,参数量等统计
模型名 | 参数量 | 模型大小(Pytorch) | 训练数据 | token长度 | 网络结构 | 训练硬件 | 训练时长 | 发布时间 | 来源 | 发布组织 |
GPT-2 |
small:124M medium:355M large:774M XL:1.5B |
small: 548MB medium:1.52GB large:3.25GB XL:6.43GB |
8百万个页面,40GB网络数据,4千5百万个Reddit链接,数据截止2017年,词汇量50,257 | 1024 |
small: 12layers medium: 24layers large:36layers XL:48layers batch size: 512 |
256 TPU v3 cores | 未知 | 2019年 |
OpenAI |
|
GPT | 117M | 479MB | 7千本书,5GB, 4万词汇量 | 512 |
37-layer, 12-layer decoder, 768 hidden size, 12 attention heads. batch size: 64 |
8 P600 GPU |
一个月; 0.96 petaflop days; 100 epochs |
2018年 |
OpenAI |
|
GPT-3 | 175B | 45TB | 2048 |
96 layers batch size: 3.2M
|
2020年 | OpenAI | ||||
T5 |
small: 60M base: 220M large:770M T5-3B: 3B T5-11B:11B |
small:242MB base:892MB large:2.95GB T5-3B: 11.4GB T5-11B: 45.2GB |
750GB |
encoder: 12 layers deconder: 12 layers 1024 hiden size |
1024 TPU v3 | |||||
BERT | 340M | Google AI | ||||||||
Turning-NLG | 17B | Microsoft Research | ||||||||
Megatron-LM | 8.3B | NVIDIA | ||||||||
Switch-Transformer | 1600B | Google Brain | ||||||||
OPT | 175B | 1000 A100 | 2个月 | Meta |
黄世宇/Shiyu Huang's Personal Page:https://huangshiyu13.github.io/