用动态实现扩展TVM

用动态实现扩展TVM

Extending TVM with Dynamic Execution

Outline

● Motivation for Dynamism

● Representing Dynamism

● Executing Dynamism

● Evaluation

Dynamic Neural Networks

● Networks are exhibiting more and more dynamism

○ Dynamic inputs: batch size, image size, sequence length, etc.

○ Control-flow, recursion, conditionals and loops (in Relay today).

○ Dynamically sized tensors

■ Output shape of some ops are data dependent: arange, nms, etc.

■ Control flow: concatenation within a while loop

● A central challenge is how do we both represent and execute these networks.

fn network(input: Tensor<(n,3,1024,1024), float32>) -> … { … }

%t1: Tensor<(1), f32>

%t2 : Tensor<(10), f32>

if (%cond) { … } else { … } : Tensor

<(?), f32>

%start,%stop, %step : i32

arange(%start, %stop, %step) : Tensor

<(?), f32>

Dynamic Neural Networks

● A central challenge is how do we both represent and execute these networks.

● We will address these two challenges at various levels of the TVM stack and share initial promising results.

Outline

● Motivation for Dynamism

● Representing Dynamism

● Executing Dynamism

● Evaluation

Representing dynamics in TVM

● Add Relay support for dynamic dimension (Any-dim)

● Use shape functions to compute runtime shapes.

● Supporting Any in Tensor Expression (TE) IR.

Any: typing dynamic dimension in Relay

Any: represent an unknown dimension at compilation time.

Any: typing dynamic dimension in Relay

Any: represent an unknown dimension at compilation time.

Define a tensor type: Tensor

<(Any, 3, 32, 32), fp32>

Any: typing dynamic dimension in Relay

Any: represent an unknown dimension at compilation time.

Define a tensor type: Tensor<(Any, 3, 32, 32), fp32>

Define type relation:

arange: fn(start:fp32, stop:fp32, step:fp32) -> Tensor

<(Any), fp32>

 

 

 How to compute and check shape dynamically?

Challenges

● Static type checking cannot eliminate all errors

● Type checking system too heavy weight for runtime

How to compute and check shape dynamically?

Challenges

● Static type checking cannot eliminate all errors

● Type checking system too heavy weight for runtime Approach

● Instrument shape computing functions into the program

 

 

 Shape function

● Register a shape function to each operator to check the type and compute the output shape

Shape function

● Register a shape function to each operator to check the type and compute the output shape

● Shape function has two modes (op_attrs, input_tensors, out_ndims) -> out_shape_tensors

○ Data independent (op_attrs, input_shapes, out_ndims) -> out_shape_tensors

○ Data dependent (op_attrs, input_data, out_ndims) -> out_shape_tensors

 

 

  

 

  

 

  

 

 Outline

● Motivation for Dynamism

● Representing Dynamism

● Executing Dynamism

● Evaluation

Executing dynamics in TVM

● By extending the IR we now can represent dynamic programs but how do we execute them?

● To handle flexibly executing dynamic programs we introduce the Relay virtual machine.

● We must also generate code which handles dynamic shapes in kernels (work-in-progress):

○ Kernel dispatch for a single op

○ Dispatch for a (sub-)expression

Previous approach: Graph Runtime

● Existing executors are based on a graph traversal style execution.

● Set up a graph of operators and push data along every edge, compute the operation, and flow forward until finished.

● Simple design enables simple memory allocation, and executor.

● Design is complicated by control, and dynamic shapes.

Enter the virtual machine

● Instead we take inspiration from full programming languages and design a VM.

● The VM has special considerations

○ Primitives are tensors, and instructions operate on tensors (CISC-style, no-scalar instructions)

○ Instructions normally built in (+, -, etc.) are realized by code generated via TVM. ○ Control handled in standard way in VM.

○ In contrast to AoT compilation, VM is flexible

■ graph dispatch and bucketing can be easily implemented.

 

 

  

 

  

 

 Generating code for dynamic shapes

● We now must solve the final problem of generating kernels that provide compelling performance for non-static shapes.

● The VM provides a framework for experimenting with different strategies, we will discuss in progress approaches:

○ Dynamic operator dispatch (WIP)

○ Graph Dispatch (https://github.com/apache/incubator-tvm/pull/4241)

● We believe there exists lots of future work in this area.

Outline

● Motivation for Dynamism

● Representing Dynamism

● Executing Dynamism

● Evaluation

 

 

 

 

 Dynamic model performance

 

 

 BERT model performance

 

 

 Conclusions

● We have extended Relay/TVM with support for dynamic shapes.

● To support increased expressivity of Relay we have built a new execution mechanism the VM.

● We have begun exploring strategies for generating efficient kernels that support dynamic shapes with promising results.

● We believe the VM infrastructure can serve as a foundation for exploring future research into dynamic execution and code generation.

Outline

● Dynamic motivations ○ NLP, NMS, control, data structures ○ Integration with external code and runtimes

● Existing solution: graph runtime

○ Challenges with graph runtime

● Enter VM

○ Designed to be scaffold to build new dynamic functionality consisting of compiler and runtime improvements

● VM design

● Extensions

● Results

● Future Work

○ Dispatch, strategies?

Existing solution: graph runtime Challenges:

● Control flow (if, loop, etc)

● Dynamic shapes

○ Dynamic inputs: batch size, image size, sequence length, etc.

○ Output shape of some ops are data dependent: arange, nms, etc.

○ Control flow: concatenate within a while loop Limitation of TVM/graph runtime

● Cannot compile and run dynamic models

Dynamic codegen: op dispatch (proposal)

● Goal: support codegen for dynamic shape

● Challenges

○ Single kernel performs poor across different shapes

○ Different templates for the same op

○ TVM compute and schedule are coupled together

 

 

  

 

  

 

  

 

  

 

 Why do we need graph dispatcher

1. Minimal overhead: only one dispatching operation is required for each inference.

2. Fit for operator such as conv2d_NCHWc. Graph tuning is well defined for each subgraph.

3. Avoid runtime layout tracking system for operator requires layout transformation to optimize.

 

  

 

 

 

 

参考链接:

https://sampl.cs.washington.edu/tvmconf/slides/2019/Jared-Roesch-Haichen-Shen-RelayVM.pdf

 

posted @ 2021-11-28 16:32  吴建明wujianming  阅读(357)  评论(0编辑  收藏  举报