TVM 模型量化
TVM 模型量化
[RFC] Search-based Automated Quantization
- I proposed a new quantization framework, which brings hardware and learning method in the loop.
- Brought the idea from some existing quantization frameworks, I choose to adopt the annotation-calibration-realization 3-phases design:
- Annotation: The annotation pass rewrites the graph and inserts simulated quantize operation according to the rewrite function of each operator. The simulated quantize operation simulates the rounding error and saturating error of quantizing from float to integer,
- Calibration: The calibration pass will adjust thresholds of simulated quantize operations to reduce the accuracy dropping.
- Realization: The realization pass transforms the simulation graph, which computes with float32 actually, to a real low-precision integer graph.
参考:
https://www.twblogs.net/a/5eedc7fee3ae0757d21ab20e/?lang=zh-cn
https://discuss.tvm.apache.org/t/int8-quantization-quantizing-models/517/4
https://discuss.tvm.apache.org/t/int8-quantization-proposal/516
https://discuss.tvm.apache.org/t/rfc-improvements-to-automatic-quantization-for-bare-metal/7108
https://discuss.tvm.apache.org/t/quantization-pytorch-dynamic-quantization/10294
https://discuss.tvm.apache.org/t/rfc-quantization-quantization-in-tvm/9161