LMDeploy

https://lmdeploy.readthedocs.io/en/latest/index.html

LMDeploy has the following core features:

Efficient Inference: LMDeploy delivers up to 1.8x higher request throughput than vLLM, by introducing key features like persistent batch(a.k.a. continuous batching), blocked KV cache, dynamic split&fuse, tensor parallelism, high-performance CUDA kernels and so on.

Effective Quantization: LMDeploy supports weight-only and k/v quantization, and the 4-bit inference performance is 2.4x higher than FP16. The quantization quality has been confirmed via OpenCompass evaluation.

Effortless Distribution Server: Leveraging the request distribution service, LMDeploy facilitates an easy and efficient deployment of multi-model services across multiple machines and cards.

Interactive Inference Mode: By caching the k/v of attention during multi-round dialogue processes, the engine remembers dialogue history, thus avoiding repetitive processing of historical sessions.

Excellent Compatibility: LMDeploy supports KV Cache Quant, AWQ and Automatic Prefix Caching to be used simultaneously.

Vs

https://bentoml.com/blog/benchmarking-llm-inference-backends

https://cloud.tencent.com/developer/article/2428575

部署参考

https://zhuanlan.zhihu.com/p/678685048

https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html

https://www.bilibili.com/read/cv33455585/

LMDeploy 提供了一键式把 VL 模型封装为服务的工具。这里的服务可以是类似 OpenAI 的服务，也可以是 gradio 服务。相信总有一款适合你！

lmdeploy serve api_server Qwen/Qwen-VL-Chat --server-port 8000

lmdeploy serve gradio Qwen/Qwen-VL-Chat --server-port 8000

使用本地文件

CUDA_VISIBLE_DEVICES=1 lmdeploy serve api_server --model-name Qwen-VL-Chat --server-port 23334 /mnt/AI/models/Qwen-VL-Chat

CUDA_VISIBLE_DEVICES=1 lmdeploy serve gradio --model-name Qwen-VL-Chat --server-port 23334 /mnt/AI/models/Qwen-VL-Chat

https://xujinzh.github.io/2024/01/13/ai-internlm-lmdeploy/index.html#API-%E6%9C%8D%E5%8A%A1

https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html#integrate-with-openai

posted @ 2024-07-24 22:51 lightsong 阅读(10) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Stay Hungry,Stay Foolish!

lightsong

{Web: [React, Vue, NodeJS, HTTP]，DevOps:[Jenkins,Docker,K8S], Languages:[Python, JS, C, Lua, Shell, Groovy]}

LMDeploy

LMDeploy

Vs

部署参考

公告