LMDeploy
LMDeploy
https://lmdeploy.readthedocs.io/en/latest/index.html
LMDeploy has the following core features:
Efficient Inference: LMDeploy delivers up to 1.8x higher request throughput than vLLM, by introducing key features like persistent batch(a.k.a. continuous batching), blocked KV cache, dynamic split&fuse, tensor parallelism, high-performance CUDA kernels and so on.
Effective Quantization: LMDeploy supports weight-only and k/v quantization, and the 4-bit inference performance is 2.4x higher than FP16. The quantization quality has been confirmed via OpenCompass evaluation.
Effortless Distribution Server: Leveraging the request distribution service, LMDeploy facilitates an easy and efficient deployment of multi-model services across multiple machines and cards.
Interactive Inference Mode: By caching the k/v of attention during multi-round dialogue processes, the engine remembers dialogue history, thus avoiding repetitive processing of historical sessions.
Excellent Compatibility: LMDeploy supports KV Cache Quant, AWQ and Automatic Prefix Caching to be used simultaneously.
Vs
https://bentoml.com/blog/benchmarking-llm-inference-backends
https://cloud.tencent.com/developer/article/2428575
部署参考
https://zhuanlan.zhihu.com/p/678685048
https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html
https://www.bilibili.com/read/cv33455585/
LMDeploy 提供了一键式把 VL 模型封装为服务的工具。这里的服务可以是类似 OpenAI 的服务,也可以是 gradio 服务。相信总有一款适合你!
lmdeploy serve api_server Qwen/Qwen-VL-Chat --server-port 8000
lmdeploy serve gradio Qwen/Qwen-VL-Chat --server-port 8000
使用本地文件
CUDA_VISIBLE_DEVICES=1 lmdeploy serve api_server --model-name Qwen-VL-Chat --server-port 23334 /mnt/AI/models/Qwen-VL-Chat
CUDA_VISIBLE_DEVICES=1 lmdeploy serve gradio --model-name Qwen-VL-Chat --server-port 23334 /mnt/AI/models/Qwen-VL-Chat
https://xujinzh.github.io/2024/01/13/ai-internlm-lmdeploy/index.html#API-%E6%9C%8D%E5%8A%A1
https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html#integrate-with-openai
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· PowerShell开发游戏 · 打蜜蜂
· 在鹅厂做java开发是什么体验
· 百万级群聊的设计实践
· WPF到Web的无缝过渡:英雄联盟客户端的OpenSilver迁移实战
· 永远不要相信用户的输入:从 SQL 注入攻防看输入验证的重要性
2022-07-24 only using python to make web app
2022-07-24 strapi -- Open source Node.js Headless CMS to easily build customisable APIs
2021-07-24 AIOHTTP - Asynchronous HTTP Client/Server for asyncio and Python.
2019-07-24 基于内容的推荐例子(电影推荐)
2016-07-24 LUA 函数式编程demo