ONNX Runtime 源码阅读:各类 ml-Values 在内存管理上的分类

出处:代码的注释内容 include/onnxruntime/core/framework/alloc_kind.h

ONNX Runtime 在推理流程中,存在以下几类值(ml-Values):

  • inference inputs:由调用者(caller)分配以及释放内存空间,默认情况下运行时(runtime)对它只读不写(read-only)
  • inference outputs:由运行时分配内存,并将所有权(ownership)转移给调用者
  • weights(constant tensors,常量类型的张量):只分配一次,一个 InferenceSession 中的所有 Inference 可以复用该值
  • tensor values:这类张量值得生命周期是静态确定的,用于内存复用、共享等优化。运行时将在正确的时间分配以及释放内存空间。

以下是原文:

The ml-Values fall into the following categories with respect to their
memory management:

  • inference inputs: owned (allocated and freed) by caller, and is by
    default read-only by the runtime.
  • inference outputs: allocated by runtime, ownership transferred to
    caller. TODO: Make sure this semantics is clear in InferenceSession API.
  • weights (constant tensors): can be allocated once (statically), and
    reused by all inference calls within an InferenceSession.
  • tensor values: The lifetimes of these tensor-values are statically
    determined, which is used for memory reuse/sharing optimizations. The
    runtime allocates/frees these values at the right time (as determined
    by the static allocation plan). Note that this is simplified since we
    do not try to optimize for "slice" like ops, where we may be able to
    conditionally reuse memory/data in some cases but not others.
    Generalizing this is future work.
posted @ 2022-05-07 14:56  虔诚的树  阅读(604)  评论(0编辑  收藏  举报