fp16 的累加误差有多大

本文地址：https://wanger-sjtu.github.io/fp16-err/

最近在项目中需要实现fp16的数据类型做FFN的计算，算子实现的同学反馈误差与x86上得到的golden数据有比较大误差。开始以为是x86侧做数值模拟仿真的问题。后面也实现了对比了一下，发现误差累计确实挺大。

实测结果对比

int main()
{
    // Seed with a real random value, if available
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_real_distribution<> dist(0, 0.01);
    
    float16_t lhs[4096] = {0};
    float16_t rhs[4096] = {0};
    for (int i = 0; i < 4096; i++) {
        lhs[i] =  dist(gen);
        rhs[i] =  dist(gen);
    }
    float16_t res_fp16 = 0;
    float res_fp32 = 0;

    for (int i = 0; i < 4096; i++) {
        res_fp16 += lhs[i] * rhs[i];
        res_fp32 += lhs[i] * rhs[i];
    }
    std::cout << "fp16 " << res_fp16 << std::endl;
    std::cout << "fp32 " << res_fp32 << std::endl;
    wirte2file("/data/local/tmp/lhs", reinterpret_cast<char*>(lhs), 8192);
    wirte2file("/data/local/tmp/rhs", reinterpret_cast<char*>(rhs), 8192);
}

结果输出：

fp16 0.0942383
fp32 0.103176

相对误差到8.1%了。难怪反馈有问题。

dim	绝对误差
100	1.63913e-07
1000	-0.00033829
2000	-0.000909835
4000	-0.00924221

golden 数据误差从何而来

实际生成golden数据的时候，也考虑了数值类型差异的影响，那为什么还存在误差呢？

对比了一下dot的视线与直接累加结果

import numpy as np
import torch

lhs = np.fromfile("lhs",dtype=np.float16)
rhs = np.fromfile("rhs",dtype=np.float16)

lhs = torch.from_numpy(lhs)
rhs = torch.from_numpy(rhs)

res = torch.Tensor([1]).half()
res[0] = 0
for i in range(4096):
    res += lhs[i:i+1] * rhs[i:i+1]

print(res)
print(torch.dot(lhs, rhs))

tensor([0.0942], dtype=torch.float16)
tensor(0.1041, dtype=torch.float16)

结果对得上了。torch 的 dot实现的时候很可能用了更高数值类型做累加。

怎么解决

如果不想引入更高精度的计算，可以考虑新加一个变量消减一下误差。

https://oi-wiki.org/misc/kahan-summation/

在计算 $S_{new}=S_{old}+a$（a 为浮点序列的一个数值）时，定义实际计算加入 $S$ 的值为 $a_{eff}=S_{new}-S_{old}$, 如果 $a_{eff} $比$a $大，则证明有向上舍入误差；如果$ a_{eff}$ 比$ a$ 小，则证明有向下舍入误差。则舍入误差定义为 $E_{roundoff} = a_{eff} - a$。那么用来纠正这部分舍入误差的值就为 $a-a_{eff}$, 即 $E_{roundoff}$ 的负值。定义 c 是对丢失的低位进行运算补偿的变量，就可以得到 $c_{new} = c_{old} + (a - a_{eff})$。

过程
Kahan 求和算法主要通过一个单独变量用来累积误差。如下方参考代码所示，sum 为最终返回的累加结果。c 是对丢失的低位进行运算补偿的变量（其被舍去的部分），也是 Kahan 求和算法中的必要变量。

因为 sum 大，y 小，所以 y 的低位数丢失。(t - sum) 抵消了 y 的高阶部分，减去 y 则会恢复负值（y 的低价部分）。因此代数值中 c 始终为零。在下一轮迭代中，丢失的低位部分会被更新添加到 y。

参考代码：

float kahanSum(vector<float> nums) {
  float sum = 0.0f;
  float c = 0.0f;
  for (auto num : nums) {
    float y = num - c;
    float t = sum + y;
    c = (t - sum) - y;
    sum = t;
  }
  return sum;
}

posted @ 2024-09-22 14:59 青铜时代的猪阅读(173) 评论(2) 编辑收藏举报

刷新页面返回顶部

王二

fp16 的累加误差有多大

实测结果对比

golden 数据误差从何而来

怎么解决

公告