Torchserve(一)——基本接口
官方手册:1. TorchServe — PyTorch/Serve master documentation
官网:pytorch/serve: Serve, optimize and scale PyTorch models in production (github.com)
1 快速开始torchserve
要求:python >= 3.8
快速预测接口:
curl http://127.0.0.1:8080/predictions/bert -T input.txt
# Install dependencies # cuda is optional python ./ts_scripts/install_dependencies.py --cuda=cu111 # Latest release pip install torchserve torch-model-archiver torch-workflow-archiver # Nightly build 每天编译版本,不推荐 pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly
docker方式:
# Latest release 默认最新版本 docker pull pytorch/torchserve # Nightly build 每天编译版本 docker pull pytorch/torchserve-nightly
2 模型打包(
torch-model-archiver
)
作用: 将模型及相关文件打包成可部署的mar文件
直接安装:
pip install torch-model-archiver
源码编译:
git clone https://github.com/pytorch/serve.git cd serve/model-archiver pip install .
官网: serve/model-archiver at master · pytorch/serve (github.com)
torch-model-archiver [-h] --model-name MODEL_NAME --version MODEL_VERSION_NUMBER --model-file MODEL_FILE_PATH --serialized-file MODEL_SERIALIZED_PATH --handler HANDLER [--runtime {python,python2,python3}] [--export-path EXPORT_PATH] [-f] [--requirements-file]
参数:
- --model-name: 导出的模型名称,导出的模型将被命名为“模型名称.mar”;
- --serialized-file: eager mode模式下包含state_dict模型状态的".pt"、“.pth”模型文件或者TorchScript的可执行脚本模块;
- --model-file: 此参数在“eager mode”模式为必要参数,模型框架的py文件,此文件只包含一个从torch.nn.modules扩展类;
- --handler: handler文件,torchserve的逻辑接口;
--extra-files: 模型的一些额外依赖文件,有多个文件可用逗号分隔;
--runtime: 运行时,指定的运行语言,支持
{python,python2,python3};
--export-path: 可选参数,mar文件的导出路径;
--archive-format: 模型存档格式,默认tgz;
-f, --force: 当模型输出路径是否存在同名的mar文件,此参数表示强制覆盖;
-v, --version: 模型版本;
-r, --requirements-file: 模型相关包依赖文件
requirements.txt;
例子:
torchscript模式:
torch-model-archiver --model-name densenet_161 --version 1.0 --serialized-file model.pt --handler image_classifier
eagermode模式:
torch-model-archiver --model-name densenet_161 --version 1.0 --model-file model.py --serialized-file model.pt --handler image_classifier
torch-model-archiver --model-name densenet_161 --version 1.0 --model-file model.py --serialized-file model.pt --handler image_classifier --requirements-file <path_to_custom_requirements_file>
3 torchserve服务
$ torchserve --help usage: torchserve [-h] [-v | --version] [--start] [--stop] [--ts-config TS_CONFIG] [--model-store MODEL_STORE] [--workflow-store WORKFLOW_STORE] [--models MODEL_PATH1 MODEL_NAME=MODEL_PATH2... [MODEL_PATH1 MODEL_NAME=MODEL_PATH2... ...]] [--log-config LOG_CONFIG]
必要参数: --model-store: 必要参数, 加载模型的路径; 可选参数: -v, --vision: 返回torchserve的版本号; --start: 开始启动torchserve; --stop: 停止torch-server; --ts-config: torchserve的配置文件; --models: 模型文件, 支持多个模型; --log-config: 日志的配置文件; --ncs, --no-config-snapshots: 关闭snapshots特性; --workflow-store: 工作流存储位置, 默认为model-store;
例子:
torchserve --start --model-store /models --models not-hot-dog=super-fancy-net.mar
开启一个模型服务,则预测接口是:
predictions/not-hot-dog/
--models all 启动所有模型:
torchserve --start --model-store /models --models all
启动多个模型:
torchserve --start --model-store /models --models name=model_location name2=model_location2
使用本地的resnet-18和vgg-16模型示例:
torchserve --start --model-store /models --models resnet-18=resnet-18.mar squeezenet=squeezenet_v1.1.mar
4 REST接口
torchserve启动时,会默认启动两个web服务,在端口8080监听推理接口,在端口8081监听管理接口;
默认情况下,这两个API只能从本地主机访问。要启用从远程主机访问,可查阅https://pytorch.org/serve/configuration.html。
torchserve的API包括:
- Inference API推理接口
- Management API管理接口
- Metrics API指标接口
- Workflow Inference API工作流推理接口
- Workflow Management API工作流管理接口
4.1 推理接口(Inference API)
该接口默认监听8080, 默认只能从本地主机访问。更改默认配置,查阅https://pytorch.org/serve/configuration.html。
TorchServe 服务器支持以下 API:
-
API Description - 可用的API列表
-
Health check API - 检查API服务状态
-
Predictions API - 模型API预测结果
-
Explanations API - 从服务的模型获取解释
-
KServe Inference API - 从KServe获取预测
-
KServe Explanations API - 从KServe获取解释
4.1.1 服务描述
curl -X OPTIONS http://localhost:8080
4.1.2 服务状态检查
curl http://localhost:8080/ping
正常返回:
{ "status": "Healthy" }
4.1.3 模型预测
预测接口只返回概率最高的结果:
POST /predictions/{model_name}
curl命令接口:
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg curl http://localhost:8080/predictions/resnet-18 -T kitten_small.jpg or: curl http://localhost:8080/predictions/resnet-18 -F "data=@kitten_small.jpg"
or:
curl -X POST http://localhost:8080/predictions/text_cls --data "data=输入字符串"
request请求:
import requests res = requests.post("http://localhost:8080/predictions/unitcls", data ={'data': "权利人字符串"}) print(res.text)
import requests res = requests.post("http://localhost:8080/predictions/my_tc",files={'data': open('examples/text_classification/sample_text.txt', 'rt')}) print(res.text)
多输入示例:
curl http://localhost:8080/predictions/squeezenet1_1 -F 'data=@docs/images/dogs-before.jpg' -F 'data=@docs/images/kitten_small.jpg' or: import requests res = requests.post("http://localhost:8080/predictions/squeezenet1_1", files={'data': open('docs/images/dogs-before.jpg', 'rb'), 'data': open('docs/images/kitten_small.jpg', 'rb')})
模型多个版本接口的REST调用/predictions/{model_name}/{version}:
POST /predictions/{model_name}/{version}
调用示例:
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg curl http://localhost:8080/predictions/resnet-18/2.0 -T kitten_small.jpg or: curl http://localhost:8080/predictions/resnet-18/2.0 -F "data=@kitten_small.jpg"
返回json格式数据:
{ "class": "n02123045 tabby, tabby cat", "probability": 0.42514491081237793 }
4.1.4 模型释义
姑且这么翻译,该接口用于返回模型各个详细指标及其概率等,有解释说明模型原理的意思。
释义接口返回预测的所有所有概率, REST接口/explanations/{model_name}
POST /explanations/{model_name}
curl示例:
curl http://127.0.0.1:8080/explanations/mnist -T examples/image_classifier/mnist/test_data/0.png
释义接口返回json格式:
[ [ [ [ 0.004570948731989492, 0.006216969640322402, 0.008197565423679522, 0.009563574612830427, 0.008999274832810742, 0.009673474804303854, 0.007599905146155397, , , ] ] ] ]
4.1.5 KServe推理
KServe推理API
POST /v1/models/{model_name}:predict
curl示例:
curl -H "Content-Type: application/json" --data @kubernetes/kserve/kf_request_json/v1/mnist.json http://127.0.0.1:8080/v1/models/mnist:predict
返回一个json格式结果:
{ "predictions": [ 2 ] }
4.1.6 KServe模型释义
/v1/models/{model_name}:explain
curl示例:
curl -H "Content-Type: application/json" --data @kubernetes/kserve/kf_request_json/v1/mnist.json http://127.0.0.1:8080/v1/models/mnist:explain
4.2 管理接口(MANAGEMENT API )
torchserve提供以下管理接口- Register a model 注册模型
- Increase/decrease number of workers for specific model 增加减少模型工作数
- Describe a model’s status 模型状态描述
- Unregister a model 注销模型
- List registered models 模型注册清单
- Set default version of a model 设置默认模型版本
4.2.1 模型注册
POST /models
url: 模型mar文件的下载地址。支持以下位置: A 本地mar文件, 要求文件必须位于model-store文件夹中,而不是子文件; B 使用http协议的url, 支持从互联网下载mar文件; model-name: 模型名称,此名称为api接口中{model_name}路径的一部分。如果参数不存在,则使用MANIFEST.json文件中的modelName。 handler: 推理程序的入口。如果存在,则将覆盖MANIFEST.json中的,注意:确保给定的handler位置PYTHONPATH中。handler格式为:module_name:method_name.handler。 runtime: 如果存在,则覆盖MANIFEST.json中的值。默认是PYTHON。 batch_size: 推理接口的batch_size,默认为1。 max_batch_delay :批处理聚合的最大延迟,默认100毫秒; initial_workers: 初始化worker数量,默认为0; synchronous:是否同步创建工作线程,默认为false。 response_timeout:响应时间。后端工作线程超时时间,超过时间视为无响应,重启启动,单位是秒,默认值为120ms。
实例:
curl -X POST "http://localhost:8081/models?url=https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar" { "status": "Model \"squeezenet_v1.1\" Version: 1.0 registered with 0 initial workers. Use scale workers API to add workers for the model." }
4.2.2 模型规模控制
设置模型进程规模
PUT /models/{model_name}
max_worker: 最大工作进程数;
synchronous: 调用是否同步;
timeout: 工作线程完成所有挂起请求的指定等待时间;
示例:
异步调用将立即返回 HTTP 代码 202:
curl -v -X PUT "http://localhost:8081/models/noop?min_worker=3" < HTTP/1.1 202 Accepted < content-type: application/json < x-request-id: 42adc58e-6956-4198-ad07-db6c620c4c1e < content-length: 47 < connection: keep-alive < { "status": "Processing worker updates..." }
调整完所有工作线程后,同步调用将返回 HTTP 代码 200。
curl -v -X PUT "http://localhost:8081/models/noop?min_worker=3&synchronous=true" < HTTP/1.1 200 OK < content-type: application/json < x-request-id: b72b1ea0-81c6-4cce-92c4-530d3cfe5d4a < content-length: 63 < connection: keep-alive < { "status": "Workers scaled to 3 for model: noop" }
特定版本操作方法
/models/{model_name}/{version}:
在使用 HTTP 代码 200 调整了模型“noop”的版本“2.0”的所有工作线程后,将返回以下同步调用。
curl -v -X PUT "http://localhost:8081/models/noop/2.0?min_worker=3&synchronous=true" < HTTP/1.1 200 OK < content-type: application/json < x-request-id: 3997ccd4-ae44-4570-b249-e361b08d3d47 < content-length: 77 < connection: keep-alive < { "status": "Workers scaled to 3 for model: noop, version: 2.0" }
4.2.3 查看模型信息
GET /models/{model_name}
curl http://localhost:8081/models/noop
[ { "modelName": "noop", "modelVersion": "1.0", "modelUrl": "noop.mar", "engine": "Torch", "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 1, "maxBatchDelay": 100, "workers": [ { "id": "9000", "startTime": "2018-10-02T13:44:53.034Z", "status": "READY", "gpu": false, "memoryUsage": 89247744 } ] } ]
GET /models/{model_name}/{version}
curl http://localhost:8081/models/noop/2.0
[ { "modelName": "noop", "modelVersion": "2.0", "modelUrl": "noop_2.mar", "engine": "Torch", "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 1, "maxBatchDelay": 100, "workers": [ { "id": "9000", "startTime": "2018-10-02T13:44:53.034Z", "status": "READY", "gpu": false, "memoryUsage": 89247744 } ] } ]
所有模型
GET /models/{model_name}/all
curl http://localhost:8081/models/noop/all
[ { "modelName": "noop", "modelVersion": "1.0", "modelUrl": "noop.mar", "engine": "Torch", "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 1, "maxBatchDelay": 100, "workers": [ { "id": "9000", "startTime": "2018-10-02T13:44:53.034Z", "status": "READY", "gpu": false, "memoryUsage": 89247744 } ] }, { "modelName": "noop", "modelVersion": "2.0", "modelUrl": "noop_2.mar", "engine": "Torch", "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 1, "maxBatchDelay": 100, "workers": [ { "id": "9000", "startTime": "2018-10-02T13:44:53.034Z", "status": "READY", "gpu": false, "memoryUsage": 89247744 } ] } ]
客制化信息:
GET /models/{model_name}/{model_version}?customized=true or GET /models/{model_name}?customized=true
4.2.4 注销模型
DELETE /models/{model_name}/{version}
curl -X DELETE http://localhost:8081/models/noop/1.0 { "status": "Model \"noop\" unregistered" }
4.2.5 模型清单
GET /models
参数:
limit
: (可选)显示数量;
next_page_token:(可选)下一页查询数量;
curl "http://localhost:8081/models"
curl "http://localhost:8081/models?limit=2&next_page_token=2" { "nextPageToken": "4", "models": [ { "modelName": "noop", "modelUrl": "noop-v1.0" }, { "modelName": "noop_v0.1", "modelUrl": "noop-v0.1" } ] }
查看全部模型
# To view all inference APIs: curl -X OPTIONS http://localhost:8080 # To view all management APIs: curl -X OPTIONS http://localhost:8081
4.2.6 设置模型默认版本
PUT /models/{model_name}/{version}/set-default
curl -v -X PUT http://localhost:8081/models/noop/2.0/set-default
5 日志
5.1 日志格式
当前支持两种日志, 访问日志(Access log)和Trochserve 日志
5.1.1 Access log
<RollingFile name="access_log" fileName="${env:LOG_LOCATION:-logs}/access_log.log" filePattern="${env:LOG_LOCATION:-logs}/access_log.%d{dd-MMM}.log.gz"> <PatternLayout pattern="%d{ISO8601} - %m%n"/> <Policies> <SizeBasedTriggeringPolicy size="100 MB"/> <TimeBasedTriggeringPolicy/> </Policies> <DefaultRolloverStrategy max="5"/> </RollingFile>
5.1.2 Torchserve log
<RollingFile name="ts_log" fileName="${env:LOG_LOCATION:-logs}/ts_log.log" filePattern="${env:LOG_LOCATION:-logs}/ts_log.%d{dd-MMM}.log.gz"> <PatternLayout pattern="%d{ISO8601} [%-5p] %t %c - %m%n"/> <Policies> <SizeBasedTriggeringPolicy size="100 MB"/> <TimeBasedTriggeringPolicy/> </Policies> <DefaultRolloverStrategy max="5"/> </RollingFile>
5.2 日志配置
5.2.1 日志配置文件
serve/frontend/server/src/main/resources/log4j2.xml
5.2.2 自定义日志配置文件
自定义一个log4j2.xml,并由以下步骤启动TorchServe:
首先,在config.properties文件设置:
vmargs=-Dlog4j.configurationFile=file:///path/to/custom/log4j2.xml
接着,在torchserve启动时,设置日志:
$ torchserve --start --ts-config /path/to/config.properties
$ torchserve --start --log-config /path/to/custom/log4j2.xml
5.3 同步日志
在config.properties中设置:
async_logging=true