数值计算中的相对误差,绝对误差
数值计算中的相对误差,绝对误差
相对误差:
Relative tolerance (RTOL) controls local error relative to the size of the solution— RTOL = 10-4 means that errors are controlled to 0.01%
绝对误差:
Absolute tolerances (ATOL) control error when a solution component may be small
— Ex: solution starting at a nonzero value but decaying to noise level, ATOL should be set to noise level
Operator Numerical Check
1. 基本公式
其中:
atol: absolute tolerance
rtol: relative tolerance
NaN and Inf行为:
NaNs are treated as equal if they are in the same place and if equal_nan=True. Infs are treated as equal if they are in the same place and of the same sign in both arrays.
2. 软件行为
NumPy
Numpy对应的接口有两个:
[numpy.allclose(a, b, rtol=1.e-5, atol=1.e-8, equal_nan=False)](https://github.com/numpy/numpy/blob/19989c21b6b7a45da80b21d3abde485542ae4eb1/numpy/core/numeric.py#L2189)
[testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True)](https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html)
PyTorch
PyTorch采用Numpy相同的default tolerance,API为:
torch.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False)。
每个op可使用default也可自设,具体每个op的tolerance可见source code; Caffe2算子的tolerance可见source code。
Examples
op | 数据类型 | absolute tolerance | relative tolerance |
---|---|---|---|
MatMul(FWD) | FP32 | 1e-4 | 1e-4 |
MatMul(FWD) | FP16 | 150 * 1e-4 | 150 * 1e-4 |
TensorFlow
TF使用如下test util判断:
或它的简化版本ssertAllClose。
对不同的数据类型,default tolerance如下:
数据类型 | absolute tolerance | relative tolerance |
---|---|---|
FP64 | 2.22e-15 | 2.22e-15 |
FP32 | 1e-6 | 1e-6 |
FP16 | 1e-3 | 1e-3 |
BF16 | 1e-2 | 1e-2 |
TF default tolerance的选择标准描述如下:
具体每个op的tolerance可见相应op的test code或相应kernel的test code。
Examples
op | 数据类型 | absolute tolerance | relative tolerance |
---|---|---|---|
MatMul(FWD) | FP32 | 3e-5 | 3e-5 |
MatMul(FWD) | FP16 | 0.2 | 0.2 |