以容器方式部署通义千问 Qwen
准备服务器
- 阿里云云服务器
- 实例规格:轻量级 GPU 实例 ecs.vgn6i-m4-vws.xlarge(4vCPU 23GiB)
- 磁盘空间 :50G
- 操作系统:Ubuntu 22.04
安装 docker
apt install docker.io
安装 NVIDIA GRID 驱动
acs-plugin-manager --exec --plugin grid_driver_install
安装 NVIDIA Container Toolkit
- 安装命令
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update
apt-get install -y nvidia-container-toolkit
- 配置命令
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
- 验证是否安装成功
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
下载 model checkpoint
- 创建下载脚本 download-model-checkpoint.py
from modelscope import snapshot_download
from transformers import AutoModelForCausalLM, AutoTokenizer
# Downloading model checkpoint to a local dir model_dir
model_dir = snapshot_download('qwen/Qwen-7B-Chat')
# Loading local checkpoints
# trust_remote_code is still set as True since we still load codes from local dir instead of transformers
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
device_map="auto",
trust_remote_code=True
).eval()
- 安装脚本依赖包
pip install modelscope
pip install transformers
pip install torch
pip install tiktoken
pip install transformers_stream_generator
pip install accelerate
- 执行脚本下载 model checkpoints
python3 download-model-checkpoint.py
注:model checkpoints 文件会被下载到 ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat
文件夹中(这个路径就是 model_dir 变量的值)。
启动容器运行模型服务(OpenAI API 兼容方式)
- 签出通义千问的开源代码
git clone https://github.com/QwenLM/Qwen.git
- 使用下面的脚本启动容器
IMAGE_NAME=qwenllm/qwen:cu114
PORT=8901
CHECKPOINT_PATH=~/.cache/modelscope/hub/qwen/Qwen-7B-Chat
bash docker/docker_openai_api.sh -i ${IMAGE_NAME} -c ${CHECKPOINT_PATH} --port ${PORT}
注:qwenllm/qwen:cu114 镜像文件大小为 9.87G
- 确认容器是否启动成功
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b2bd3f3417af qwenllm/qwen:cu114 "/opt/nvidia/nvidia_…" 3 minutes ago Up 3 minutes 0.0.0.0:8901->80/tcp, :::8901->80/tcp qwen
启动成功!
- 确认 api 是否可以正常请求
# curl localhost:8901/v1/models | jq
输出内容
{
"object": "list",
"data": [
{
"id": "gpt-3.5-turbo",
"object": "model",
"created": 1707471911,
"owned_by": "owner",
"root": null,
"parent": null,
"permission": null
}
]
}
请求成功!可以正常兼容 openai 的 api。