TensorRT模型装换

当前支持的深度学习框架主要有：caffe、tensorflow、pytorch;

tensorflow深度学习框架：当前最佳的模型提供形式是pb,这是一种Frozen Graphdef形式的模型文件， Frozen Graphdef 将tensorflow导出的模型的权重都冻结，使得其都变为常量。并且模型参数和网络结构保存在同一个文件中，可以在python以及java中自由调用。
pytorch深度学习框架：最佳的模型提供形式是onnx，ONNX是一种针对机器学习所设计的开放式的文件格式，用于存储训练好的模型。它使得不同的人工智能框架可以采用相同格式存储模型数据并交互。 ONNX的规范及代码主要由微软，亚马逊，Facebook和IBM等公司共同开发。https://pytorch.org/docs/stable/onnx.html

# pytorch模型转onnx

import torch
torch_model = torch.load("save.pt") # pytorch模型加载
batch_size = 1  #批处理大小
input_shape = (3,244,244)   #输入数据
 
# set the model to inference mode
torch_model.eval()
 
x = torch.randn(batch_size,*input_shape)        # 生成张量
export_onnx_file = "test.onnx"                  # 目的ONNX文件名
torch.onnx.export(torch_model,
                    x,
                    export_onnx_file,
                    opset_version=10,
                    do_constant_folding=True,   # 是否执行常量折叠优化
                    input_names=["input"],      # 输入名
                    output_names=["output"],    # 输出名
                    dynamic_axes={"input":{0:"batch_size"},     # 批处理变量
                                    "output":{0:"batch_size"}})

caffe深度学习框架：对于caffe而言，不需要进行模型的转换直接可以使用caffe的caffemodel、deploy prototext即可。

Tensorflow

对于Tensorflow模型，有两种途径转换成TensorRT支持的engine。第一种是pb→uff→engine；另一种是pb→onnx→engine,

第一种方式

#tensorflow pb转uff

首先安装convert-to-uff: apt install uff-converter-tf
执行：python3 /usr/local/bin/convert-to-uff --help
输出：
 
Converts TensorFlow models to Unified Framework Format (UFF).
 
positional arguments:
  input_file            path to input model (protobuf file of frozen GraphDef)
 
optional arguments:
  -h, --help            show this help message and exit
  -l, --list-nodes      show list of nodes contained in input file
  -t, --text            write a text version of the output in addition to the
                        binary
  --write_preprocessed  write the preprocessed protobuf in addition to the
                        binary
  -q, --quiet           disable log messages
  -d, --debug           Enables debug mode to provide helpful debugging output
  -o OUTPUT, --output OUTPUT
                        name of output uff file
  -O OUTPUT_NODE, --output-node OUTPUT_NODE
                        name of output nodes of the model
  -I INPUT_NODE, --input-node INPUT_NODE
                        name of a node to replace with an input to the model.
                        Must be specified as:
                        "name,new_name,dtype,dim1,dim2,..."
  -p PREPROCESSOR, --preprocessor PREPROCESSOR
                        the preprocessing file to run before handling the
                        graph. This file must define a `preprocess` function
                        that accepts a GraphSurgeon DynamicGraph as it's
                        input. All transformations should happen in place on
                        the graph, as return values are discarded
转换过程:
python3 /usr/local/bin/convert-to-uff model.pb -o model.uff -O softmax/Softmax -I input_1,input_1,float32,1,3,224,224

# tensorflow uff转engine


执行：/usr/src/tensorrt/bin/trtexec --help
输出：
 
=== Model Options ===
  --uff=<file>                UFF model
  --onnx=<file>               ONNX model
  --model=<file>              Caffe model (default = no model, random weights used)
  --deploy=<file>             Caffe prototxt file
  --output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
  --uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
  --uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)
 
=== Build Options ===
  --maxBatch                  Set max batch size and build an implicit batch engine (default = 1)
  --explicitBatch             Use explicit batch sizes when building the engine (default = implicit)
  --minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided
                              Note: if any of min/max/opt is missing, the profile will be completed using the shapes
                                    provided and assuming that opt will be equal to max unless they are both specified;
                                    partially specified shapes are applied starting from the batch size;
                                    dynamic shapes imply explicit batch
                                    input names can be wrapped with single quotes (ex: 'Input:0')
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --inputIOFormats=spec       Type and formats of the input tensors (default = all inputs in fp32:chw)
  --outputIOFormats=spec      Type and formats of the output tensors (default = all outputs in fp32:chw)
                              IO Formats: spec  ::= IOfmt[","spec]
                                          IOfmt ::= type:fmt
                                          type  ::= "fp32"|"fp16"|"int32"|"int8"
                                          fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32")["+"fmt]
  --workspace=N               Set workspace size in megabytes (default = 16)
  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)
  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)
  --fp16                      Enable fp16 algorithms, in addition to fp32 (default = disabled)
  --int8                      Enable int8 algorithms, in addition to fp32 (default = disabled)
  --calib=<file>              Read INT8 calibration cache file
  --safe                      Only test the functionality available in safety restricted flows
  --saveEngine=<file>         Save the serialized engine
  --loadEngine=<file>         Load a serialized engine
 
=== Inference Options ===
  --batch=N                   Set batch size for implicit batch engines (default = 1)
  --shapes=spec               Set input shapes for dynamic shapes inputs. Input names can be wrapped with single quotes(ex: 'Input:0')
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --loadInputs=spec           Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
                              Input values spec ::= Ival[","spec]
                                           Ival ::= name":"file
  --iterations=N              Run at least N inference iterations (default = 10)
  --warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)
  --duration=N                Run performance measurements for at least N seconds wallclock time (default = 3)
  --sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
  --streams=N                 Instantiate N engines to use concurrently (default = 1)
  --exposeDMA                 Serialize DMA transfers to and from device. (default = disabled)
  --useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
  --threads                   Enable multithreading to drive engines with independent threads (default = disabled)
  --useCudaGraph              Use cuda graph to capture engine execution and then launch inference (default = disabled)
  --buildOnly                 Skip inference perf measurement (default = disabled)
 
=== Build and Inference Batch Options ===
                              When using implicit batch, the max batch size of the engine, if not given,
                              is set to the inference batch size;
                              when using explicit batch, if shapes are specified only for inference, they
                              will be used also as min/opt/max in the build profile; if shapes are
                              specified only for the build, the opt shapes will be used also for inference;
                              if both are specified, they must be compatible; and if explicit batch is
                              enabled but neither is specified, the model must provide complete static
                              dimensions, including batch size, for all inputs
 
=== Reporting Options ===
  --verbose                   Use verbose logging (default = false)
  --avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)
  --percentile=P              Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%)
  --dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)
  --dumpProfile               Print profile information per layer (default = disabled)
  --exportTimes=<file>        Write the timing results in a json file (default = disabled)
  --exportOutput=<file>       Write the output tensors to a json file (default = disabled)
  --exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)
 
=== System Options ===
  --device=N                  Select cuda device N (default = 0)
  --useDLACore=N              Select DLA core N for layers that support DLA (default = none)
  --allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
  --plugins                   Plugin library (.so) to load (can be specified multiple times)
 
=== Help ===
  --help                      Print this message
 
转换过程:
/usr/src/tensorrt/bin/trtexec --uff=/home/model/model.uff --uffInput=input_1,1,3,224,224 --output=softmax/Softmax --saveEngine=/home/model/model.engine --outputIOFormats=fp32:chw --buildOnly --useCudaGraph

第二种方式

# tensorflow pb转onnx，生成engine


第一步安装tf2onnx：pip install -U tf2onnx
执行：python3 -m tf2onnx.convert --help 查看使用方式
输出：
usage: convert.py [-h] [--input INPUT] [--graphdef GRAPHDEF]
                  [--saved-model SAVED_MODEL] [--tag TAG]
                  [--signature_def SIGNATURE_DEF]
                  [--concrete_function CONCRETE_FUNCTION]
                  [--checkpoint CHECKPOINT] [--keras KERAS] [--large_model]
                  [--output OUTPUT] [--inputs INPUTS] [--outputs OUTPUTS]
                  [--opset OPSET] [--custom-ops CUSTOM_OPS]
                  [--extra_opset EXTRA_OPSET] [--target {rs4,rs5,rs6,caffe2}]
                  [--continue_on_error] [--verbose] [--debug]
                  [--output_frozen_graph OUTPUT_FROZEN_GRAPH] [--fold_const]
                  [--inputs-as-nchw INPUTS_AS_NCHW]
 
Convert tensorflow graphs to ONNX.
 
optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         input from graphdef
  --graphdef GRAPHDEF   input from graphdef
  --saved-model SAVED_MODEL
                        input from saved model
  --tag TAG             tag to use for saved_model
  --signature_def SIGNATURE_DEF
                        signature_def from saved_model to use
  --concrete_function CONCRETE_FUNCTION
                        For TF2.x saved_model, index of func signature in
                        __call__ (--signature_def is ignored)
  --checkpoint CHECKPOINT
                        input from checkpoint
  --keras KERAS         input from keras model
  --large_model         use the large model format (for models > 2GB)
  --output OUTPUT       output model file
  --inputs INPUTS       model input_names
  --outputs OUTPUTS     model output_names
  --opset OPSET         opset version to use for onnx domain
  --custom-ops CUSTOM_OPS
                        list of custom ops
  --extra_opset EXTRA_OPSET
                        extra opset with format like domain:version, e.g.
                        com.microsoft:1
  --target {rs4,rs5,rs6,caffe2}
                        target platform
  --continue_on_error   continue_on_error
  --verbose, -v         verbose output, option is additive
  --debug               debug mode
  --output_frozen_graph OUTPUT_FROZEN_GRAPH
                        output frozen tf graph to file
  --fold_const          Deprecated. Constant folding is always enabled.
  --inputs-as-nchw INPUTS_AS_NCHW
                        transpose inputs as from nhwc to nchw
 
Usage Examples:
 
python -m tf2onnx.convert --saved-model saved_model_dir --output model.onnx
python -m tf2onnx.convert --input frozen_graph.pb  --inputs X:0 --outputs output:0 --output model.onnx
python -m tf2onnx.convert --checkpoint checkpoint.meta  --inputs X:0 --outputs output:0 --output model.onnx
 
For help and additional information see:
    https://github.com/onnx/tensorflow-onnx
 
If you run into issues, open an issue here:
    https://github.com/onnx/tensorflow-onnx/issues
 
转换成onnx 需要执行conda activate tensorflow
python3 -m tf2onnx.convert --input model.pb --inputs input_1:0 --outputs softmax/Softmax:0 --inputs-as-nchw input_1:0 --output model.onnx --opset 13
 
 
onnx转engine(这是动态输入时的转换方式,这个时候需要退出 tensorflow的conda环境)
/usr/src/tensorrt/bin/trtexec --onnx=/home/model/model.onnx  --explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --saveEngine=/home/model/model.engine  --buildOnly --useCudaGraph
 
如果是固定输入则去除--explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3，然后增加--batch batch_size,这里的batch_size是tensorflow模型的batch_size

pytorch 模型转engine

第一步：参考pytorch模型转onnx将pytorch模型转换成onnx模型
第二步：参考上面Tensorflow模型onnx转engine步骤转换即可

caffe转engine

caffe转engine只需参考tensorflow转engine的第一种方式的第二步，只不过将--uff改成--model、--deploy的组合即可。

模型转换之后服务的运行是依赖转换过程中使用的硬件环境的，比如模型转换是在Tesla V100S显卡、Driver Version: 440.118.02 CUDA Version: 10.2 上转换的则服务运行的机器也需要保持一致(Driver Version和cuda Version可以更高)。

posted @ 2021-10-30 21:32 小小马进阶笔记阅读(1814) 评论(0) 编辑收藏举报

刷新页面返回顶部

TensorRT模型装换

Tensorflow

第一种方式

第二种方式

公告