推理server - 随笔分类 - MissSimple

flask 分析接口响应时间

摘要：https://github.com/muatik/flask-profiler 阅读全文

posted @ 2022-09-20 16:16 MissSimple 阅读(168) 评论(0) 推荐(0)

NVIDIA GPU 算力查询 compute capability

摘要：https://developer.nvidia.com/cuda-gpus 阅读全文

posted @ 2022-09-08 09:58 MissSimple 阅读(536) 评论(0) 推荐(0)

yolo v7使用triton部署

摘要：https://github.com/WongKinYiu/yolov7/tree/main/deploy/triton-inference-server 阅读全文

posted @ 2022-09-01 19:35 MissSimple 阅读(422) 评论(0) 推荐(0)

升级安装cuda

摘要：下载驱动： https://www.nvidia.com/Download/Find.aspx 在这里下载并按照指导安装： https://developer.nvidia.com/cuda-downloads 参考：https://blog.csdn.net/qq_30374237/article 阅读全文

posted @ 2022-08-31 19:18 MissSimple 阅读(58) 评论(0) 推荐(0)

如何查看nvidia官网发布的tensorrt镜像中都包含哪些包，trt版本是多少，cuda版本是多少？如何查看nvidia官网发布的triton镜像中都包含哪些包?

摘要：在这里查看trt镜像中包含哪些内容：https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-07.html#rel_21-07 对应的tensorrt release版本：https://catalog 阅读全文

posted @ 2022-08-31 18:46 MissSimple 阅读(514) 评论(0) 推荐(0)

推理serving框架

摘要：1、AI-Serving AI-Serving是一款开源的机器学习和深度学习模型部署推理（inference）系统，支持标准的PMML和ONNX格式，同时提供HTTP（REST API）和gRPC两种接口，方便在不同的生产环境中使用。 AI-Serving主要关注在标准交换格式的模型部署，目前PMM 阅读全文

posted @ 2022-04-12 20:18 MissSimple 阅读(154) 评论(0) 推荐(0)

LLVM架构

摘要：典型的编译器架构包括： Frontend:前端功能包括词法分析、语法分析、语义分析、生成中间代码 Optimizer:优化器中间代码优化 Backend:后端生成机器码 LLVM架构：不同的前端后端使用统一的中间代码LLVM Intermediate Representation (LLVM 阅读全文

posted @ 2022-01-07 17:12 MissSimple 阅读(210) 评论(0) 推荐(0)

模型部署工程师与模型训练工程师各应该掌握的技术

摘要：现在算法行业有这样几类从业者：训练阶段：提升算法精度的，工作职责就是紧跟最新研究工作，从论文中复现模型，用于自己公司的数据上，并调整模型结构、参数，让其更适用于自己公司的数据集。这类工作的招聘要求，大概有熟悉CV、NLP等领域内的算法，熟悉tensorflow、pytorch等框架，能熟练阅读论阅读全文

posted @ 2021-12-30 16:10 MissSimple 阅读(546) 评论(0) 推荐(0)

tensorrt与tvm

摘要：tensorrt 和tvm都是用于训练后模型的部署的，它们能够多模型进行量化、算子融合等，使模型运行更快。如果采用GPU硬件平台，一般会使用TensorRT方式部署，因为TensorRT能够充分发挥GPU平台的性能，同时也做了很多的优化（算子融合，量化等），所以在性能上有比较大的优势。不过，Ten 阅读全文

posted @ 2021-12-30 10:46 MissSimple 阅读(575) 评论(0) 推荐(0)

seldon core

摘要：KFServing 对 Seldon Core 的 DAG 推理图进行了简化。KFServing 只支持 Transformer，Predicator。在实现上，KFServing 因为进行了简化，所以不再需要 Seldon Core 中的 Engine 这一角色。请求在 Transformer 和阅读全文

posted @ 2021-12-10 14:32 MissSimple 阅读(270) 评论(0) 推荐(0)

kserve 1

摘要：kserve是为了解决训练过后的模型，如何上线服务的问题。其实，各大深度学习平台也注意到训练到上线服务这个gap，所以tensorflow、pytorch对应出了 TFServing、torchserve， nvidia出了triton。这些推理平台满足了基本的推理服务上线要求，而 kserve依托阅读全文

posted @ 2021-12-09 16:24 MissSimple 阅读(794) 评论(0) 推荐(0)

nvidia jetson mount plugin

摘要：libnvidia-container有专门的jetson分支，其中提到https://github.com/NVIDIA/libnvidia-container/blob/jetson/design/mount_plugins.md提到mount plugin技术，简单来说，就是在裸机上安装ten 阅读全文

posted @ 2021-11-22 16:54 MissSimple 阅读(700) 评论(0) 推荐(0)

推理server 三 torchserve示例

摘要：kserve在用torchserve时，需要按照这个文档中指示，将torchserve重新build一下：https://github.com/pytorch/serve/tree/master/kubernetes/kfserving docker run -it --network=host - 阅读全文

posted @ 2021-11-09 11:25 MissSimple 阅读(188) 评论(0) 推荐(0)

推理server 二 tensorflow serving使用示例

摘要：docker run --network=host -v /home/test/models/:/mnt/models tensorflow-serving:v1.14.0 --port=9045 --rest_api_port=8040 --model_name="flowers" --model 阅读全文

posted @ 2021-11-09 11:02 MissSimple 阅读(76) 评论(0) 推荐(0)

推理server 一

摘要：kserve 关于pytorch模型的支持： V1使用kserve的pytorchserver： https://github.com/kserve/kserve/tree/master/python/pytorchserver V2 torchserve适配kserve： https://gith 阅读全文

posted @ 2021-11-05 16:12 MissSimple 阅读(122) 评论(0) 推荐(0)

随笔分类 - 推理server