print('hello world |

gy77

园龄:6年4个月粉丝:43关注:10

Triton部署mmdeploy导出的TensorRT模型失败篇

记录一下历程,最终没有部署成功,应该是Ubantu系统版本的问题。现在没有时间搞了,先记录一下,后续用到再填坑。

Triton demo

git clone -b r22.06 https://github.com/triton-inference-server/server.git

cd server/docs/examples

./fetch_models.sh

# 构建并启动容器1的服务
docker run --gpus=1 --rm --net=host -v /home/xbsj/gaoying/triton/triton_demo/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models

# 进入容器2,准备发送请求
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:22.06-py3-sdk

# 在容器2中发送请求
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

Triton安装及启动服务(docker)

triton容器与cuda,tensorrt对应: Release Notes :: NVIDIA Deep Learning Triton Inference Server Documentation

更详细的在这: Frameworks Support Matrix :: NVIDIA Deep Learning Frameworks Documentation

1️⃣ Triton安装

拉取docker镜像,20.11是版本号, 可以去这里挑选:Triton Inference Server (Formerly TensorRT inference Server) | NVIDIA NGC

新建一个Dockerfile.triton文件,内容如下

FROM nvcr.io/nvidia/tritonserver:20.11-py3

RUN 

保存并推出,运行下面命令安装triton 的 docker。先创建Dockerfile.triton文件再安装的好处是,可以把镜像命名为triton:2104,方便查看。并且如果想对triton docker镜像添加一些操作的话,可以在Dockerfile.triton文件中继续添加。

nvidia-docker build -f Dockerfile.triton -t triton:2011 . 

2️⃣ 模型配置文件编写

新建一个本地目录,用于映射到docker容器

映射目录配置

.
└── model_rep                # 宿主机要映射的根目录
    ├── demo1                # 模型1
    │   ├── 1                # 模型版本号
    │   │   └── model.pt    # 模型
    │   ├── 2                # 模型版本号
    │   │   └── model.pt    # 模型
    │   └── config.pbtxt
    └── demo2                # 模型2
        ├── 1
        │   └── model.pt
        └── config.pbtxt

模型配置文件编写

下面是一个用Netron软件打开的onnx格式的模型。我们可以看到输入,输出的名称,以及类型。我们根据这个修改配置文件中的input和output。下面是faster_rcnn_r50_trt的onnx模型文件,以及faster_rcnn_r50_trt的配置文件。

下面是对应上边模型的config.pbtxt配置文件

name: "faster_rcnn_r50_trt"               # 模型名,也是目录名
platform: "tensorrt_plan"    # 模型对应的平台,参考文章下面给出的表格
max_batch_size : 8              # 一次送入模型的最大batch_size。
input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 3,-1,-1 ]            # 第一个维度默认是batch size,不用咱们配置。因此我们从第二个维度开始配置。
                                # 如果是可变维度,我们就用 -1
  }
]
output [
  {
    name: "dets"
    data_type: TYPE_FP32
    dims: [-1,-1]
  },
  {
    name: "labels"
    data_type: TYPE_INT32
    dims: [ -1 ]
  }
]

default_model_filename: "end2end.engine"

框架与platform对应表格:

框架名 platform
TensorRT tensorrt_plan
TensorFlow SavedModel tensorflow_savedmodel
TensorFlow GraphDef tensorflow_graphdef
ONNX onnxruntime_onnx
Torch pytorch_libtorch

输入输出data_type对应表格:

Model Config TensorRT TensorFlow ONNX Runtime PyTorch API NumPy
TYPE_BOOL kBOOL DT_BOOL BOOL kBool BOOL bool
TYPE_UINT8 DT_UINT8 UINT8 kByte UINT8 uint8
TYPE_UINT16 DT_UINT16 UINT16 UINT16 uint16
TYPE_UINT32 DT_UINT32 UINT32 UINT32 uint32
TYPE_UINT64 DT_UINT64 UINT64 UINT64 uint64
TYPE_INT8 kINT8 DT_INT8 INT8 kChar INT8 int8
TYPE_INT16 DT_INT16 INT16 kShort INT16 int16
TYPE_INT32 kINT32 DT_INT32 INT32 kInt INT32 int32
TYPE_INT64 DT_INT64 INT64 kLong INT64 int64
TYPE_FP16 kHALF DT_HALF FLOAT16 FP16 float16
TYPE_FP32 kFLOAT DT_FLOAT FLOAT kFloat FP32 float32
TYPE_FP64 DT_DOUBLE DOUBLE kDouble FP64 float64
TYPE_STRING DT_STRING STRING BYTES dtype(object)

3️⃣ 启动服务

🔸 启动并执行服务:

--gpus all代表启用gpus

/home/xbsj/gaoying/triton/model_rep/:/models 本地目录映射到docker目录

8000为http端口,8001为grpc端口

nvcr.io/nvidia/tritonserver:21.11-py3,版本号记得改成自己的。

docker run --gpus all -p8000:8000 -p8001:8001 -p8002:8002 -v /home/xbsj/gaoying/triton/model_rep:/model_rep -v /home/xbsj/gaoying/triton/plugin_rep:/plugin_rep --env LD_PRELOAD=/plugin_rep/libmmdeploy_tensorrt_ops.so triton:2201 tritonserver --model-repository=/model_rep
🔸 进入docker,启动服务
docker run --gpus=all --network=host --shm-size=2g -v /home/xbsj/gaoying/triton/model_rep/:/models  -it nvcr.io/nvidia/tritonserver:21.04-py3  # 进入 docker
./bin/tritonserver --model-repository=/models  # 启动 triton
docker run --gpus=all --network=host -v /home/xbsj/gaoying/triton/model_rep:/opt/ml/model  -it triton:2104  # 进入 docker
./bin/tritonserver --model-repository=/models  # 启动 triton

客户端测试接口

1️⃣ 命令行接口测试

测试命令是否准备好,宿主机命令行运行

curl -v localhost:8000/v2/health/ready

成功结果:

  • Trying 127.0.0.1...

  • TCP_NODELAY set

  • Connected to localhost (127.0.0.1) port 8000 (#0)

    GET /v2/health/ready HTTP/1.1
    Host: localhost:8000
    User-Agent: curl/7.58.0
    Accept: /

    < HTTP/1.1 200 OK
    < Content-Length: 0
    < Content-Type: text/plain
    <

  • Connection #0 to host localhost left intact

2️⃣ triton client 接口测试

🔸 grpc

faster rcnn r50 十个迭代用时: 1.0688064098358154

import os
import time
import numpy as np
import tritonclient.grpc as grpcclient
from PIL import Image


def client_init(url="localhost:8001",
                ssl=False, private_key=None, root_certificates=None, certificate_chain=None,
                verbose=False):
    triton_client = grpcclient.InferenceServerClient(
        url=url,
        verbose=verbose,
        ssl=ssl,
        root_certificates=root_certificates,
        private_key=private_key,
        certificate_chain=certificate_chain)
    return triton_client


def infer_faster_rcnn_r50_trt_grpc(triton_client, model_name, input='input', dets='dets', labels='labels',
                                   compression_algorithm=None):
    inputs = []
    outputs = []

    # 添加输入的数据
    inputs.append(grpcclient.InferInput(input, [1, 3, 427, 640], "FP32"))

    # 给输入的数据赋值
    root_dir = os.getcwd()
    img_path = os.path.join(root_dir, 'demo.jpg')  # 自己把一张图片命名为demo.jpg放到目录下
    img = np.array(Image.open(img_path))
    img = img.astype(np.float32)
    img = img.transpose((2, 0, 1))
    img = np.expand_dims(img, axis=0)  # (1, 3, 427, 640)
    inputs[0].set_data_from_numpy(img)

    # 添加输出的数据
    outputs.append(grpcclient.InferRequestedOutput(dets))
    outputs.append(grpcclient.InferRequestedOutput(labels))

    results = triton_client.infer(
        model_name=model_name,
        inputs=inputs,
        outputs=outputs,
        compression_algorithm=compression_algorithm
        # client_timeout=0.1
    )
    # print('=' * 50)
    print(results)
    # print('=' * 50)
    # # 转化为numpy格式
    # print(results.as_numpy(output0))
    # print('=' * 50)
    # print(results.as_numpy(output1))
    # print('=' * 50)


if __name__ == '__main__':
    client = client_init()

    st = time.time()
    for i in range(10):
        infer_faster_rcnn_r50_trt_grpc(triton_client=client, model_name='faster_rcnn_r50_trt')
    print("grpc faster rcnn r50 十个迭代用时: {}".format(time.time() - st))
🔸 http

http faster rcnn r50 十个迭代用时:1.1643376350402832

import os
import time

import gevent.ssl
import numpy as np
import tritonclient.http as httpclient
from PIL import Image


def client_init(url="localhost:8000",
                ssl=False, key_file=None, cert_file=None, ca_certs=None, insecure=False,
                verbose=False):
    if ssl:
        ssl_options = {}
        if key_file is not None:
            ssl_options['keyfile'] = key_file
        if cert_file is not None:
            ssl_options['certfile'] = cert_file
        if ca_certs is not None:
            ssl_options['ca_certs'] = ca_certs
        ssl_context_factory = None
        if insecure:
            ssl_context_factory = gevent.ssl._create_unverified_context
        triton_client = httpclient.InferenceServerClient(
            url=url,
            verbose=verbose,
            ssl=True,
            ssl_options=ssl_options,
            insecure=insecure,
            ssl_context_factory=ssl_context_factory)
    else:
        triton_client = httpclient.InferenceServerClient(
            url=url, verbose=verbose)
    return triton_client


def infer_faster_rcnn_r50_trt_http(triton_client, model_name='faster_rcnn_r50_trt',
                              input='input', output0='dets', output1='labels',
                              request_compression_algorithm=None,
                              response_compression_algorithm=None):
    inputs = []
    outputs = []

    # 添加输入的数据
    inputs.append(httpclient.InferInput(input, [1, 3, 427, 640], "FP32"))

    # 给输入的数据赋值
    root_dir = os.getcwd()
    img_path = os.path.join(root_dir, 'demo.jpg')  # 自己把一张图片命名为demo.jpg放到目录下
    img = np.array(Image.open(img_path))
    img = img.astype(np.float32)
    img = img.transpose((2, 0, 1))
    img = np.expand_dims(img, axis=0)  # (1, 3, 427, 640)
    inputs[0].set_data_from_numpy(img)

    # OUTPUT0、OUTPUT1为配置文件中的输出节点名称
    outputs.append(httpclient.InferRequestedOutput(output0, binary_data=False))
    outputs.append(httpclient.InferRequestedOutput(output1, binary_data=False))

    results = triton_client.infer(
        model_name=model_name,
        inputs=inputs,
        outputs=outputs,
        request_compression_algorithm=request_compression_algorithm,
        response_compression_algorithm=response_compression_algorithm)
    # print('=' * 50)
    print(results)
    # print('=' * 50)
    # # 转化为numpy格式
    # print(results.as_numpy(output0))
    # print('=' * 50)
    # print(results.as_numpy(output1))
    # print('=' * 50)


if __name__ == '__main__':
    triton_client = client_init()
    st=time.time()
    for i in range(10):
        infer_faster_rcnn_r50_trt_http(triton_client)
    print("http faster rcnn r50 十个迭代用时:{}".format(time.time()-st))

3️⃣ requests 接口测试

requests faster rcnn r50 十个迭代用时: 3.843385934829712

import os
import time

import numpy as np
from PIL import Image
import requests


def infer_demo_torch_http():
    url = 'http://localhost:8000/v2/models/demo_torch/versions/1/infer'
    data = {
        "inputs": [{
            "name": "input__0",
            "shape": [2, 3],
            "datatype": "INT64",
            "data": [[1, 2, 3], [4, 5, 6]]
        }],
        "outputs": [{"name": "output__0"}, {"name": "output__1"}]
    }
    headers = {'Content-Type': 'application/json'}
    res = requests.post(url, json=data, headers=headers).json()
    print(res)


def infer_demo_onnx_http():
    url = 'http://localhost:8000/v2/models/demo_onnx/versions/1/infer'
    data = {
        "inputs": [{
            "name": "INPUT0",
            "shape": [8, 2],
            "datatype": "FP32",
            "data": [[0.1] * 2 for _ in range(8)]
        }, {
            "name": "INPUT1",
            "shape": [8, 2],
            "datatype": "INT32",
            "data": [[1] * 2 for _ in range(8)]
        }],
        "outputs": [{"name": "OUTPUT0"}, {"name": "OUTPUT1"}]
    }
    headers = {'Content-Type': 'application/json'}
    res = requests.post(url, json=data, headers=headers).json()
    print(res)


def infer_faster_rcnn_r50_onnx_http():
    root_dir = os.getcwd()
    img_path = os.path.join(root_dir, 'demo.jpg')
    img = np.array(Image.open(img_path))
    img = img.astype(np.float32)
    img = img.transpose((2, 0, 1))
    img = np.expand_dims(img, axis=0)  # (1, 3, 427, 640)
    # img = np.repeat(img, repeats=2, axis=0)  # (2, 3, 427, 640)
    img = img.tolist()
    url = 'http://localhost:8000/v2/models/faster_rcnn_r50_onnx/versions/1/infer'

    data = {
        "inputs": [{
            "name": "input",
            "shape": [1, 3, 427, 640],
            "datatype": "FP32",
            "data": img
        }, ],
        "outputs": [{"name": "dets"}, {"name": "labels"}]
    }
    headers = {'Content-Type': 'application/json'}
    res = requests.post(url, json=data, headers=headers).json()
    print(res)


def infer_faster_rcnn_r50_trt_http():
    root_dir = os.getcwd()
    img_path = os.path.join(root_dir, 'demo.jpg')
    img = np.array(Image.open(img_path))
    img = img.astype(np.float32)
    img = img.transpose((2, 0, 1))
    img = np.expand_dims(img, axis=0)  # (1, 3, 427, 640)
    img = img.tolist()
    url = 'http://localhost:8000/v2/models/faster_rcnn_r50_trt/versions/1/infer'

    data = {
        "inputs": [{
            "name": "input",
            "shape": [1, 3, 427, 640],
            "datatype": "FP32",
            "data": img
        }, ],
        "outputs": [{"name": "dets"}, {"name": "labels"}]
    }
    headers = {'Content-Type': 'application/json'}
    res = requests.post(url, json=data, headers=headers).json()
    print(res)


if __name__ == "__main__":
    print('=' * 50)
    print('| Infer demo_torch')
    print('_' * 20)
    infer_demo_torch_http()
    print('=' * 50)
    print('| Infer demo_onnx')
    print('_' * 20)
    infer_demo_onnx_http()
    print('=' * 50)
    print('| Infer faster_rcnn_r50_onnx')
    print('_' * 20)
    infer_faster_rcnn_r50_onnx_http()

    print('=' * 50)
    print('| Infer faster_rcnn_r50_trt')
    print('_' * 20)
    st = time.time()
    for _ in range(10):
        infer_faster_rcnn_r50_trt_http()
    print("requests faster rcnn r50 十个迭代用时: {}".format(time.time() - st))
    print('=' * 50)

triton压测

首先构建好我们的输入数据,input.json。

{
        "inputs": [{
            "name": "input__0",
            "shape": [2, 3],
            "datatype": "INT64",
            "data": [[1, 2, 3], [4, 5, 6]]
        }],
        "outputs": [{"name": "output__0"}, {"name": "output__1"}]
}

安装一下用到的包

sudo apt install apache2-utils

压测命令

ab -k -c 5 -n 500 -p input.json http://localhost:8000/v2/models/demo/versions/1/infer 

命令的意思是5个进程反复调用接口共500次,输入数据为input.json,模型是demo模型,版本1。

triton报错合集:

⚠️ INVALID_ARGUMENT: getPluginCreator could not find plugin TRTBatchedNMS version 1

用mmdeploy docker转换出来的tensorrt模型,在triton docker中没法用,报以下错误:(triton的报错信息,刚开始我也不会看,那么一大堆,找不到关键是哪里报错。教大家一下,E开头的就是报错的)

E0630 01:31:22.566631 1 logging.cc:43] INVALID_ARGUMENT: getPluginCreator could not find plugin TRTBatchedNMS version 1
E0630 01:31:22.566657 1 logging.cc:43] safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
E0630 01:31:22.566739 1 logging.cc:43] INVALID_STATE: std::exception
E0630 01:31:22.572629 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
E0630 01:31:22.587565 1 model_repository_manager.cc:1215] failed to load 'faster_rcnn_r50_tensorrt' version 1: Internal: unable to create TensorRT engine

🔸 方法一(推荐)

参考:yolo模型部署——tensorRT模型加速+triton服务器模型部署

直接运行下面命令(根据自己的自行修改)

docker run --gpus all -p8000:8000 -p8001:8001 -p8002:8002 -v /home/xbsj/gaoying/triton/model_rep:/model_rep -v /home/xbsj/gaoying/triton/plugin_rep:/plugin_rep --env LD_PRELOAD=/plugin_rep/libmmdeploy_tensorrt_ops.so triton:2104 tritonserver --model-repository=/model_rep
🔸 方法二

解决方法来源: end2end.engine to Triton · Issue #465 · open-mmlab/mmdeploy (github.com)

具体方法:(我试了,没成功。。。是我操作不对)

1️⃣ 将 /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so 从 mmdeploy docker 镜像复制到 triton docker 镜像中的 /opt/tritonserver/lib/

docker run --gpus=all --network=host -v /home/xbsj/gaoying/triton/model_rep:/opt/ml/model  -it triton:2104     # 宿主机命令行运行,进入triton docker容器,但不启动服务

docker ps    # 宿主机命令行运行,查看triton docker容器的id

docker cp /data/imagetd/xbsj/gaoying//mmdeploy_out/libmmdeploy_tensorrt_ops.so 7725e367f0f0:/opt/tritonserver/lib/libmmdeploy_tensorrt_ops.so      # 传输文件,宿主机->triton容器

2️⃣ 将 LD_PRELOAD=libmmdeploy_tensorrt_ops.so 附加到 /bin/serve/ 的末尾,tritonserver服务之前。

vim /bin/serve

添加上下面命令,105行

LD_PRELOAD=libmmdeploy_tensorrt_ops.so

启动服务

./bin/tritonserver --model-store=/models

⚠️ ImportError: cannot import name 'ORTWrapper' from 'mmdeploy.backend.onnxruntime' (/data/imagetd/xbsj/gaoying/mmdeploy/mmdeploy/backend/onnxruntime/init.py)

解决方法来源:Bug using ORTwrapper · Issue #37 · open-mmlab/mmdeploy (github.com)

🔸 方法

mmdeploy/codebase/mmdet/core/post_processing/bbox_nms.py::select_nms_index 中,将return batched_dets, batched_labele 更改为 return batched_dets[:, 0:-1, :], batched_labels[:, 0:-1] 可能会修复 bug .

然后运行命令

python setup.py install

后边再进行模型转换

⚠️ Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.

解决方法参考:Bug using ORTwrapper · Issue #37 · open-mmlab/mmdeploy (github.com)

本文作者:gy77

本文链接:https://www.cnblogs.com/gy77/p/16524031.html

版权声明:本作品采用转载请注明出处,禁止用于商业用途许可协议进行许可。

posted @   gy77  阅读(1390)  评论(0编辑  收藏  举报
点击右上角即可分享
微信分享提示
评论
收藏
关注
推荐
深色
回顶
收起