如何估算模型训练T(FL)OPS efficiency
Naive方法
以Torch Vision ResNet50-v1.5为例。
-
Step 1: 获取模型的前向理论需求MACs(Multiply–ACcumulate)
可使用thop得到模型的前向MACS。使用如下代码可得Torch Vision ResNet50-v1.5的前向MACs为4.112G。from torchvision.models import resnet50 from thop import profile, clever_format import torch model = resnet50() input = torch.randn(1, 3, 224, 224) macs, params = profile(model, inputs=(input,)) print(clever_format([macs, params], "%.3f"))
-
Step 2: 估算模型在某个实测性能下每秒需求的T(FL)OPS
估算公式以OpenAI AI and Compute估算公式为基础:required_T(FL)OPS = (MACs per forward pass) * (2 (FL)OPs/MAC) * (3 for forward and backward pass) * (number of examples per second)
再由实测性能数据:
accelerator data type bs IPS V100 FP16 256 1325 V100 FP32 128 303.1
以V100 FP16训练为例,有:
MACs per forward pass = 4.112G
number of examples per second = 1325
required_(FL)OPS = 4.112G * 2 * 3 * 1325 = 32.69 T
汇总结果为:accelerator data type bs IPS required T(FL)OPS V100 FP16 256 1325 32.69 V100 FP32 128 303.1 7.478 -
Step 3: 估算模型理论峰值算力利用率
-
理论峰值算力
-
理论峰值算力利用率
required_T(FL)OPS / peak_T(FL)OPS
accelerator data type bs IPS required TF(L)OPS peak ratio V100 FP16 256 1325 32.69 29.2% V100 FP32 128 303.1 7.478 53%
-