千问72-chat私有化部署

　　千问开源的版本挺多，版本有1和1.5，参数有1.8～72B，模态有语言、语音、视觉。72B就有Qwen-72b-chat（聊天）和Qwen-72b（基础/预训练）两个版本，以下为简单的Qwen-72b-chat的坑：

1、下载模型（魔塔社区），权重文件140+G

2、新建虚拟环境，基础要求：python>3.8、pytorch>1.12、cuda>11.4；依赖："transformers>=4.32.0" accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed。缺少一些操作系统级的组件会导致后面各种错误：glibc-devel、gcc、gcc-c++。还要注意的就是PATH环境，没有/usr/sbin/ldconfig就会有问题。

3、作为服务器运行：

下载框架：https://github.com/QwenLM/Qwen#vllm
使用openai模式提供服务（假定模型保存在/app/model/Qwen-72B-Chat）：python3 openai_api.py -c /app/model/Qwen-72B-Chat --server-name 0.0.0.0

4、远程调用：

http方式：

import requests
import json

# 你的OpenAI API密钥
OPEN_AI_API_KEY = 'none'

# 示例：调用Chat Completion API
endpoint_url = "http://192.168.1.2:8000/v1/chat/completions"

# 请求体参数
request_body = {
    "model": "Qwen-72b-Chat",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ]
}

# 添加请求头
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPEN_AI_API_KEY}"
}

# 发送POST请求
response = requests.post(endpoint_url, headers=headers, json=request_body)

# 检查请求是否成功
if response.status_code == 200:
    # 解析返回的JSON数据
    result = response.json()
    print(result)
else:
    print(f"请求失败，状态码：{response.status_code}")
    print(f"错误详情：{response.text}")

openai方式（openai 0.28.1版本），必须低于1.0。暂未搞定流式

import openai
openai.api_base = "http://192.168.1.2:8000/v1"
openai.api_key = "none"

response = openai.ChatCompletion.create(
    model="Qwen-72B-Chat",
    messages=[
        {"role": "user", "content": "你好"}
    ],
    stream=False,
    stop=[] # You can add custom stop words here, e.g., stop=["Observation:"] for ReAct prompting.
)
print(response.choices[0].message.content)

posted @ 2024-03-07 21:54 badwood 阅读(1791) 评论(0) 收藏举报

刷新页面返回顶部

千问72-chat私有化部署

公告