ONNX Runtime 源码阅读:各类 ml-Values 在内存管理上的分类
出处:代码的注释内容 include/onnxruntime/core/framework/alloc_kind.h
ONNX Runtime 在推理流程中,存在以下几类值(ml-Values):
inference inputs
:由调用者(caller)分配以及释放内存空间,默认情况下运行时(runtime)对它只读不写(read-only)inference outputs
:由运行时分配内存,并将所有权(ownership)转移给调用者weights(constant tensors,常量类型的张量)
:只分配一次,一个 InferenceSession 中的所有 Inference 可以复用该值tensor values
:这类张量值得生命周期是静态确定的,用于内存复用、共享等优化。运行时将在正确的时间分配以及释放内存空间。
以下是原文:
The ml-Values fall into the following categories with respect to their
memory management:
- inference inputs: owned (allocated and freed) by caller, and is by
default read-only by the runtime.- inference outputs: allocated by runtime, ownership transferred to
caller. TODO: Make sure this semantics is clear in InferenceSession API.- weights (constant tensors): can be allocated once (statically), and
reused by all inference calls within an InferenceSession.- tensor values: The lifetimes of these tensor-values are statically
determined, which is used for memory reuse/sharing optimizations. The
runtime allocates/frees these values at the right time (as determined
by the static allocation plan). Note that this is simplified since we
do not try to optimize for "slice" like ops, where we may be able to
conditionally reuse memory/data in some cases but not others.
Generalizing this is future work.