使用Infinity部署Embedding和Reranking模型

说明：

首次发表日期：2024-08-06
Infinity Github 仓库： https://github.com/michaelfeil/infinity
Infinity 官方文档： https://michaelfeil.github.io/infinity/

下载权重

pip install -U "huggingface_hub[cli]"

export HF_ENDPOINT=https://hf-mirror.com

huggingface-cli download BAAI/bge-m3
huggingface-cli download BAAI/bge-reranker-v2-m3

使用 hf-mirror.com 的镜像，下载模型权重

模型权重默认缓存到了$HOME/.cache下，使用tree命令检查下：

~/.cache$ tree -L 3 .
.
├── huggingface
│   └── hub
│       ├── huggingface
│       ├── models--BAAI--bge-m3
│       ├── models--BAAI--bge-reranker-v2-m3
│       └── version.txt

运行Docker容器

我有在~/.bashrc中设置环境变量（会影响映射到容器的路径）

export HF_HOME=/mnt/d/16-LLM-Cache/huggingface
export HF_ENDPOINT=https://hf-mirror.com
export HF_DATASETS_CACHE=/mnt/d/16-LLM-Cache/huggingface/dataset
export HUGGINGFACE_HUB_CACHE=/mnt/d/16-LLM-Cache/huggingface/hub
export TRANSFORMERS_CACHE=/mnt/d/16-LLM-Cache/huggingface/hub

docker run 启动infinity容器:

docker run -it --gpus all \
 -e HF_ENDPOINT=https://hf-mirror.com \
 -v /mnt/d/16-LLM-Cache/huggingface/hub:/app/.cache/huggingface/hub \
 -p 7997:7997 \
 michaelf34/infinity:latest \
 v2 \
 --model-id BAAI/bge-m3 \
 --model-id BAAI/bge-reranker-v2-m3 \
 --port 7997

其中

-it: 交互式运行容器，分配一个伪终端。（如果要在后台运行，将-it改为-d并添加--restart=always）
-e HF_ENDPOINT=https://hf-mirror.com 设置以使用https://hf-mirror.com作为huggingface的镜像
-v /mnt/d/16-LLM-Cache/huggingface/hub:/app/.cache/huggingface/hub：将本地下载的模型权重映射到Docker容器中了

后台运行：

docker run -d --restart=always --gpus all \
 -e HF_ENDPOINT=https://hf-mirror.com \
 -v /mnt/d/16-LLM-Cache/huggingface/hub:/app/.cache/huggingface/hub \
 -p 7997:7997 \
 michaelf34/infinity:latest \
 v2 \
 --model-id BAAI/bge-m3 \
 --model-id BAAI/bge-reranker-v2-m3 \
 --port 7997

启动完成后，可以看到以下日志：

INFO     2024-08-06 03:52:57,665 infinity_emb INFO:

         ♾️  Infinity - Embedding Inference Server                           
         MIT License; Copyright (c) 2023 Michael Feil                        
         Version 0.0.53
                                                      
         Open the Docs via Swagger UI:                                       
         http://0.0.0.0:7997/docs
                                                 
         Access model via 'GET':                                             
         curl http://0.0.0.0:7997/models
                                 
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)

打开 http://xx.xx.xx.xx:7997/docs，可以看到 API 文档。（其中 xx.xx.xx.xx代指 IP 地址）

调用测试

Embedding模型：

curl --location 'http://xx.xx.xx.xx:7997/embeddings' \
--header 'Content-Type: application/json' \
--data '{
    "input": "喝水吃饼干",
    "model": "BAAI/bge-m3"
  }'

Reranking模型：

curl -X 'POST' \
  'http://xx.xx.xx.xx:7997/rerank' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": "like it",
  "documents": [
    "like", "hate", "sky"
  ],
  "return_documents": false,
  "model": "BAAI/bge-reranker-v2-m3"
}'