足迹

能看不尽景,始是不凡人

 

Operator Numerical Check

Operator Numerical Check

姚伟峰

基本公式


其中:
atol: absolute tolerance
rtol: relative tolerance
NaN and Inf行为:
NaNs are treated as equal if they are in the same place and if equal_nan=True. Infs are treated as equal if they are in the same place and of the same sign in both arrays.

软件行为

NumPy

Numpy对应的接口有两个:

  1. numpy.allclose(a, b, rtol=1.e-5, atol=1.e-8, equal_nan=False)

  2. testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True)

PyTorch

PyTorch采用Numpy相同的default tolerance,API为:
torch.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False)

每个op可使用default也可自设,具体每个op的tolerance可见source code; Caffe2算子的tolerance可见source code

Examples

op 数据类型 absolute tolerance relative tolerance
MatMul(FWD) FP32 1e-4 1e-4
MatMul(FWD) FP16 150 * 1e-4 150 * 1e-4

from code

TensorFlow

TF使用如下test util判断:
assertAllCloseAccordingToType(a, b, rtol=1e-06, atol, float_rtol, float_atol, half_rtol, half_atol, bfloat16_rtol, bfloat16_atol, msg=None) 或它的简化版本ssertAllClose
对不同的数据类型,default tolerance如下:

数据类型 absolute tolerance relative tolerance
FP64 2.22e-15 2.22e-15
FP32 1e-6 1e-6
FP16 1e-3 1e-3
BF16 1e-2 1e-2

TF default tolerance的选择标准描述如下:

The default atol and rtol is 10 * eps, where eps is the smallest representable positive number such that 1 + eps != 1. This is about 1.2e-6 in 32bit, 2.22e-15 in 64bit, and 0.00977 in 16bit. See numpy.finfo.

具体每个op的tolerance可见相应op的test code相应kernel的test code

Examples

op 数据类型 absolute tolerance relative tolerance
MatMul(FWD) FP32 3e-5 3e-5
MatMul(FWD) FP16 0.2 0.2

from code

posted on 2022-01-21 14:32  姚伟峰  阅读(166)  评论(0编辑  收藏  举报

导航