用动态实现扩展TVM
用动态实现扩展TVM
Extending TVM with Dynamic Execution
Outline
● Motivation for Dynamism
● Representing Dynamism
● Executing Dynamism
● Evaluation
Dynamic Neural Networks
● Networks are exhibiting more and more dynamism
○ Dynamic inputs: batch size, image size, sequence length, etc.
○ Control-flow, recursion, conditionals and loops (in Relay today).
○ Dynamically sized tensors
■ Output shape of some ops are data dependent: arange, nms, etc.
■ Control flow: concatenation within a while loop
● A central challenge is how do we both represent and execute these networks.
fn network(input: Tensor<(n,3,1024,1024), float32>) -> … { … }
%t1: Tensor<(1), f32>
%t2 : Tensor<(10), f32>
if (%cond) { … } else { … } : Tensor
<(?), f32>
%start,%stop, %step : i32
arange(%start, %stop, %step) : Tensor
<(?), f32>
Dynamic Neural Networks
● A central challenge is how do we both represent and execute these networks.
● We will address these two challenges at various levels of the TVM stack and share initial promising results.
Outline
● Motivation for Dynamism
● Representing Dynamism
● Executing Dynamism
● Evaluation
Representing dynamics in TVM
● Add Relay support for dynamic dimension (Any-dim)
● Use shape functions to compute runtime shapes.
● Supporting Any in Tensor Expression (TE) IR.
Any: typing dynamic dimension in Relay
Any: represent an unknown dimension at compilation time.
Any: typing dynamic dimension in Relay
Any: represent an unknown dimension at compilation time.
Define a tensor type: Tensor
<(Any, 3, 32, 32), fp32>
Any: typing dynamic dimension in Relay
Any: represent an unknown dimension at compilation time.
Define a tensor type: Tensor<(Any, 3, 32, 32), fp32>
Define type relation:
arange: fn(start:fp32, stop:fp32, step:fp32) -> Tensor
<(Any), fp32>
How to compute and check shape dynamically?
Challenges
● Static type checking cannot eliminate all errors
● Type checking system too heavy weight for runtime
How to compute and check shape dynamically?
Challenges
● Static type checking cannot eliminate all errors
● Type checking system too heavy weight for runtime Approach
● Instrument shape computing functions into the program
Shape function
● Register a shape function to each operator to check the type and compute the output shape
Shape function
● Register a shape function to each operator to check the type and compute the output shape
● Shape function has two modes (op_attrs, input_tensors, out_ndims) -> out_shape_tensors
○ Data independent (op_attrs, input_shapes, out_ndims) -> out_shape_tensors
○ Data dependent (op_attrs, input_data, out_ndims) -> out_shape_tensors
Outline
● Motivation for Dynamism
● Representing Dynamism
● Executing Dynamism
● Evaluation
Executing dynamics in TVM
● By extending the IR we now can represent dynamic programs but how do we execute them?
● To handle flexibly executing dynamic programs we introduce the Relay virtual machine.
● We must also generate code which handles dynamic shapes in kernels (work-in-progress):
○ Kernel dispatch for a single op
○ Dispatch for a (sub-)expression
Previous approach: Graph Runtime
● Existing executors are based on a graph traversal style execution.
● Set up a graph of operators and push data along every edge, compute the operation, and flow forward until finished.
● Simple design enables simple memory allocation, and executor.
● Design is complicated by control, and dynamic shapes.
Enter the virtual machine
● Instead we take inspiration from full programming languages and design a VM.
● The VM has special considerations
○ Primitives are tensors, and instructions operate on tensors (CISC-style, no-scalar instructions)
○ Instructions normally built in (+, -, etc.) are realized by code generated via TVM. ○ Control handled in standard way in VM.
○ In contrast to AoT compilation, VM is flexible
■ graph dispatch and bucketing can be easily implemented.
Generating code for dynamic shapes
● We now must solve the final problem of generating kernels that provide compelling performance for non-static shapes.
● The VM provides a framework for experimenting with different strategies, we will discuss in progress approaches:
○ Dynamic operator dispatch (WIP)
○ Graph Dispatch (https://github.com/apache/incubator-tvm/pull/4241)
● We believe there exists lots of future work in this area.
Outline
● Motivation for Dynamism
● Representing Dynamism
● Executing Dynamism
● Evaluation
Dynamic model performance
BERT model performance
Conclusions
● We have extended Relay/TVM with support for dynamic shapes.
● To support increased expressivity of Relay we have built a new execution mechanism the VM.
● We have begun exploring strategies for generating efficient kernels that support dynamic shapes with promising results.
● We believe the VM infrastructure can serve as a foundation for exploring future research into dynamic execution and code generation.
Outline
● Dynamic motivations ○ NLP, NMS, control, data structures ○ Integration with external code and runtimes
● Existing solution: graph runtime
○ Challenges with graph runtime
● Enter VM
○ Designed to be scaffold to build new dynamic functionality consisting of compiler and runtime improvements
● VM design
● Extensions
● Results
● Future Work
○ Dispatch, strategies?
Existing solution: graph runtime Challenges:
● Control flow (if, loop, etc)
● Dynamic shapes
○ Dynamic inputs: batch size, image size, sequence length, etc.
○ Output shape of some ops are data dependent: arange, nms, etc.
○ Control flow: concatenate within a while loop Limitation of TVM/graph runtime
● Cannot compile and run dynamic models
Dynamic codegen: op dispatch (proposal)
● Goal: support codegen for dynamic shape
● Challenges
○ Single kernel performs poor across different shapes
○ Different templates for the same op
○ TVM compute and schedule are coupled together
Why do we need graph dispatcher
1. Minimal overhead: only one dispatching operation is required for each inference.
2. Fit for operator such as conv2d_NCHWc. Graph tuning is well defined for each subgraph.
3. Avoid runtime layout tracking system for operator requires layout transformation to optimize.
参考链接:
https://sampl.cs.washington.edu/tvmconf/slides/2019/Jared-Roesch-Haichen-Shen-RelayVM.pdf