以容器方式部署通义千问 Qwen

准备服务器

  • 阿里云云服务器
  • 实例规格:轻量级 GPU 实例 ecs.vgn6i-m4-vws.xlarge(4vCPU 23GiB)
  • 磁盘空间 :50G
  • 操作系统:Ubuntu 22.04

安装 docker

apt install docker.io

安装 NVIDIA GRID 驱动

acs-plugin-manager --exec --plugin grid_driver_install

安装 NVIDIA Container Toolkit

  • 安装命令
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update
apt-get install -y nvidia-container-toolkit
  • 配置命令
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
  • 验证是否安装成功
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

下载 model checkpoint

  • 创建下载脚本 download-model-checkpoint.py
from modelscope import snapshot_download
from transformers import AutoModelForCausalLM, AutoTokenizer

# Downloading model checkpoint to a local dir model_dir
model_dir = snapshot_download('qwen/Qwen-7B-Chat')

# Loading local checkpoints
# trust_remote_code is still set as True since we still load codes from local dir instead of transformers
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    device_map="auto",
    trust_remote_code=True
).eval()
  • 安装脚本依赖包
pip install modelscope
pip install transformers
pip install torch
pip install tiktoken
pip install transformers_stream_generator
pip install accelerate
  • 执行脚本下载 model checkpoints
python3 download-model-checkpoint.py 

注:model checkpoints 文件会被下载到 ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat 文件夹中(这个路径就是 model_dir 变量的值)。

启动容器运行模型服务(OpenAI API 兼容方式)

  • 签出通义千问的开源代码
git clone https://github.com/QwenLM/Qwen.git
  • 使用下面的脚本启动容器
IMAGE_NAME=qwenllm/qwen:cu114
PORT=8901
CHECKPOINT_PATH=~/.cache/modelscope/hub/qwen/Qwen-7B-Chat
bash docker/docker_openai_api.sh -i ${IMAGE_NAME} -c ${CHECKPOINT_PATH} --port ${PORT}

注:qwenllm/qwen:cu114 镜像文件大小为 9.87G

  • 确认容器是否启动成功
# docker ps
CONTAINER ID   IMAGE                COMMAND                  CREATED         STATUS         PORTS                                   NAMES
b2bd3f3417af   qwenllm/qwen:cu114   "/opt/nvidia/nvidia_…"   3 minutes ago   Up 3 minutes   0.0.0.0:8901->80/tcp, :::8901->80/tcp   qwen

启动成功!

  • 确认 api 是否可以正常请求
# curl localhost:8901/v1/models | jq

输出内容

{
  "object": "list",
  "data": [
    {
      "id": "gpt-3.5-turbo",
      "object": "model",
      "created": 1707471911,
      "owned_by": "owner",
      "root": null,
      "parent": null,
      "permission": null
    }
  ]
}

请求成功!可以正常兼容 openai 的 api。

posted @ 2024-02-09 17:57  dudu  阅读(1829)  评论(1编辑  收藏  举报