TensorRT模型装换

 当前支持的深度学习框架主要有:caffe、tensorflow、pytorch;   

  1.  tensorflow深度学习框架:当前最佳的模型提供形式是pb,这是一种Frozen Graphdef形式的模型文件, Frozen Graphdef 将tensorflow导出的模型的权重都冻结,使得其都变为常量。并且模型参数和网络结构保存在同一个文件中,可以在python以及java中自由调用。
  2. pytorch深度学习框架:最佳的模型提供形式是onnx,ONNX是一种针对机器学习所设计的开放式的文件格式,用于存储训练好的模型。它使得不同的人工智能框架可以采用相同格式存储模型数据并交互。 ONNX的规范及代码主要由微软,亚马逊,Facebook和IBM等公司共同开发。https://pytorch.org/docs/stable/onnx.html

# pytorch模型转onnx
import torch
torch_model = torch.load("save.pt") # pytorch模型加载
batch_size = 1  #批处理大小
input_shape = (3,244,244)   #输入数据
 
# set the model to inference mode
torch_model.eval()
 
x = torch.randn(batch_size,*input_shape)        # 生成张量
export_onnx_file = "test.onnx"                  # 目的ONNX文件名
torch.onnx.export(torch_model,
                    x,
                    export_onnx_file,
                    opset_version=10,
                    do_constant_folding=True,   # 是否执行常量折叠优化
                    input_names=["input"],      # 输入名
                    output_names=["output"],    # 输出名
                    dynamic_axes={"input":{0:"batch_size"},     # 批处理变量
                                    "output":{0:"batch_size"}})
  1. caffe深度学习框架:对于caffe而言,不需要进行模型的转换直接可以使用caffe的caffemodel、deploy prototext即可。
  2.  Tensorflow

    • 对于Tensorflow模型,有两种途径转换成TensorRT支持的engine。第一种是pb→uff→engine;另一种是pb→onnx→engine,

          第一种方式
    #tensorflow pb转uff

     

    首先安装convert-to-uff: apt install uff-converter-tf
    执行:python3 /usr/local/bin/convert-to-uff --help
    输出:
     
    Converts TensorFlow models to Unified Framework Format (UFF).
     
    positional arguments:
      input_file            path to input model (protobuf file of frozen GraphDef)
     
    optional arguments:
      -h, --help            show this help message and exit
      -l, --list-nodes      show list of nodes contained in input file
      -t, --text            write a text version of the output in addition to the
                            binary
      --write_preprocessed  write the preprocessed protobuf in addition to the
                            binary
      -q, --quiet           disable log messages
      -d, --debug           Enables debug mode to provide helpful debugging output
      -o OUTPUT, --output OUTPUT
                            name of output uff file
      -O OUTPUT_NODE, --output-node OUTPUT_NODE
                            name of output nodes of the model
      -I INPUT_NODE, --input-node INPUT_NODE
                            name of a node to replace with an input to the model.
                            Must be specified as:
                            "name,new_name,dtype,dim1,dim2,..."
      -p PREPROCESSOR, --preprocessor PREPROCESSOR
                            the preprocessing file to run before handling the
                            graph. This file must define a `preprocess` function
                            that accepts a GraphSurgeon DynamicGraph as it's
                            input. All transformations should happen in place on
                            the graph, as return values are discarded
    转换过程:
    python3 /usr/local/bin/convert-to-uff model.pb -o model.uff -O softmax/Softmax -I input_1,input_1,float32,1,3,224,224
    # tensorflow uff转engine

    执行:/usr/src/tensorrt/bin/trtexec --help 输出: === Model Options === --uff=<file> UFF model --onnx=<file> ONNX model --model=<file> Caffe model (default = no model, random weights used) --deploy=<file> Caffe prototxt file --output=<name>[,<name>]* Output names (it can be specified multiple times); at least one output is required for UFF and Caffe --uffInput=<name>,X,Y,Z Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models --uffNHWC Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput) === Build Options === --maxBatch Set max batch size and build an implicit batch engine (default = 1) --explicitBatch Use explicit batch sizes when building the engine (default = implicit) --minShapes=spec Build with dynamic shapes using a profile with the min shapes provided --optShapes=spec Build with dynamic shapes using a profile with the opt shapes provided --maxShapes=spec Build with dynamic shapes using a profile with the max shapes provided Note: if any of min/max/opt is missing, the profile will be completed using the shapes provided and assuming that opt will be equal to max unless they are both specified; partially specified shapes are applied starting from the batch size; dynamic shapes imply explicit batch input names can be wrapped with single quotes (ex: 'Input:0') Input shapes spec ::= Ishp[","spec] Ishp ::= name":"shape shape ::= N[["x"N]*"*"] --inputIOFormats=spec Type and formats of the input tensors (default = all inputs in fp32:chw) --outputIOFormats=spec Type and formats of the output tensors (default = all outputs in fp32:chw) IO Formats: spec ::= IOfmt[","spec] IOfmt ::= type:fmt type ::= "fp32"|"fp16"|"int32"|"int8" fmt ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32")["+"fmt] --workspace=N Set workspace size in megabytes (default = 16) --minTiming=M Set the minimum number of iterations used in kernel selection (default = 1) --avgTiming=M Set the number of times averaged in each iteration for kernel selection (default = 8) --fp16 Enable fp16 algorithms, in addition to fp32 (default = disabled) --int8 Enable int8 algorithms, in addition to fp32 (default = disabled) --calib=<file> Read INT8 calibration cache file --safe Only test the functionality available in safety restricted flows --saveEngine=<file> Save the serialized engine --loadEngine=<file> Load a serialized engine === Inference Options === --batch=N Set batch size for implicit batch engines (default = 1) --shapes=spec Set input shapes for dynamic shapes inputs. Input names can be wrapped with single quotes(ex: 'Input:0') Input shapes spec ::= Ishp[","spec] Ishp ::= name":"shape shape ::= N[["x"N]*"*"] --loadInputs=spec Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0') Input values spec ::= Ival[","spec] Ival ::= name":"file --iterations=N Run at least N inference iterations (default = 10) --warmUp=N Run for N milliseconds to warmup before measuring performance (default = 200) --duration=N Run performance measurements for at least N seconds wallclock time (default = 3) --sleepTime=N Delay inference start with a gap of N milliseconds between launch and compute (default = 0) --streams=N Instantiate N engines to use concurrently (default = 1) --exposeDMA Serialize DMA transfers to and from device. (default = disabled) --useSpinWait Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled) --threads Enable multithreading to drive engines with independent threads (default = disabled) --useCudaGraph Use cuda graph to capture engine execution and then launch inference (default = disabled) --buildOnly Skip inference perf measurement (default = disabled) === Build and Inference Batch Options === When using implicit batch, the max batch size of the engine, if not given, is set to the inference batch size; when using explicit batch, if shapes are specified only for inference, they will be used also as min/opt/max in the build profile; if shapes are specified only for the build, the opt shapes will be used also for inference; if both are specified, they must be compatible; and if explicit batch is enabled but neither is specified, the model must provide complete static dimensions, including batch size, for all inputs === Reporting Options === --verbose Use verbose logging (default = false) --avgRuns=N Report performance measurements averaged over N consecutive iterations (default = 10) --percentile=P Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%) --dumpOutput Print the output tensor(s) of the last inference iteration (default = disabled) --dumpProfile Print profile information per layer (default = disabled) --exportTimes=<file> Write the timing results in a json file (default = disabled) --exportOutput=<file> Write the output tensors to a json file (default = disabled) --exportProfile=<file> Write the profile information per layer in a json file (default = disabled) === System Options === --device=N Select cuda device N (default = 0) --useDLACore=N Select DLA core N for layers that support DLA (default = none) --allowGPUFallback When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled) --plugins Plugin library (.so) to load (can be specified multiple times) === Help === --help Print this message 转换过程: /usr/src/tensorrt/bin/trtexec --uff=/home/model/model.uff --uffInput=input_1,1,3,224,224 --output=softmax/Softmax --saveEngine=/home/model/model.engine --outputIOFormats=fp32:chw --buildOnly --useCudaGraph
          第二种方式
  3. # tensorflow pb转onnx,生成engine

    第一步安装tf2onnx:pip install -U tf2onnx 执行:python3 -m tf2onnx.convert --help 查看使用方式 输出: usage: convert.py [-h] [--input INPUT] [--graphdef GRAPHDEF] [--saved-model SAVED_MODEL] [--tag TAG] [--signature_def SIGNATURE_DEF] [--concrete_function CONCRETE_FUNCTION] [--checkpoint CHECKPOINT] [--keras KERAS] [--large_model] [--output OUTPUT] [--inputs INPUTS] [--outputs OUTPUTS] [--opset OPSET] [--custom-ops CUSTOM_OPS] [--extra_opset EXTRA_OPSET] [--target {rs4,rs5,rs6,caffe2}] [--continue_on_error] [--verbose] [--debug] [--output_frozen_graph OUTPUT_FROZEN_GRAPH] [--fold_const] [--inputs-as-nchw INPUTS_AS_NCHW] Convert tensorflow graphs to ONNX. optional arguments: -h, --help show this help message and exit --input INPUT input from graphdef --graphdef GRAPHDEF input from graphdef --saved-model SAVED_MODEL input from saved model --tag TAG tag to use for saved_model --signature_def SIGNATURE_DEF signature_def from saved_model to use --concrete_function CONCRETE_FUNCTION For TF2.x saved_model, index of func signature in __call__ (--signature_def is ignored) --checkpoint CHECKPOINT input from checkpoint --keras KERAS input from keras model --large_model use the large model format (for models > 2GB) --output OUTPUT output model file --inputs INPUTS model input_names --outputs OUTPUTS model output_names --opset OPSET opset version to use for onnx domain --custom-ops CUSTOM_OPS list of custom ops --extra_opset EXTRA_OPSET extra opset with format like domain:version, e.g. com.microsoft:1 --target {rs4,rs5,rs6,caffe2} target platform --continue_on_error continue_on_error --verbose, -v verbose output, option is additive --debug debug mode --output_frozen_graph OUTPUT_FROZEN_GRAPH output frozen tf graph to file --fold_const Deprecated. Constant folding is always enabled. --inputs-as-nchw INPUTS_AS_NCHW transpose inputs as from nhwc to nchw Usage Examples: python -m tf2onnx.convert --saved-model saved_model_dir --output model.onnx python -m tf2onnx.convert --input frozen_graph.pb --inputs X:0 --outputs output:0 --output model.onnx python -m tf2onnx.convert --checkpoint checkpoint.meta --inputs X:0 --outputs output:0 --output model.onnx For help and additional information see: https://github.com/onnx/tensorflow-onnx If you run into issues, open an issue here: https://github.com/onnx/tensorflow-onnx/issues 转换成onnx 需要执行conda activate tensorflow python3 -m tf2onnx.convert --input model.pb --inputs input_1:0 --outputs softmax/Softmax:0 --inputs-as-nchw input_1:0 --output model.onnx --opset 13 onnx转engine(这是动态输入时的转换方式,这个时候需要退出 tensorflow的conda环境) /usr/src/tensorrt/bin/trtexec --onnx=/home/model/model.onnx --explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --saveEngine=/home/model/model.engine --buildOnly --useCudaGraph 如果是固定输入则去除--explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3,然后增加--batch batch_size,这里的batch_size是tensorflow模型的batch_size

    pytorch 模型转engine

    第一步:参考pytorch模型转onnx将pytorch模型转换成onnx模型
    第二步:参考上面Tensorflow模型onnx转engine步骤转换即可

    caffe转engine

    caffe转engine只需参考tensorflow转engine的第一种方式的第二步,只不过将--uff改成--model、--deploy的组合即可。

    模型转换之后服务的运行是依赖转换过程中使用的硬件环境的,比如模型转换是在Tesla V100S显卡、Driver Version: 440.118.02 CUDA Version: 10.2 上转换的则服务运行的机器也需要保持一致(Driver Version和cuda Version可以更高)。

posted @ 2021-10-30 21:32  小小马进阶笔记  阅读(1693)  评论(0编辑  收藏  举报