给vllm添加热添加lora的功能
写在前面
原生vllm并不支持热添加lora,但是考虑到微调机微调后,需要在不停机的情况下传递lora,于是我们需要增加一个逻辑
修改VLLM包中的vllm/entrypoints/openai/api_server 1 from pydantic import Bas 2
3 class AddLoraRequest(BaseModel): 4 lora_name: str 5 lora_path: str 6 7 @app.post("/v1/load_lora_adapter") 8 async def add_lora(request: AddLoraRequest): 9 openai_serving_chat.add_lora(request.lora_name, request.lora_path) 10 return Response(status_code=200)
随后,修改同目录下的serving_engine.py,在OpenAIServing的类中,添加下面的代码
1 def add_lora(self, lora_name, lora_local_path): 2 self.lora_requests.append(LoRARequest( 3 lora_name=lora_name, 4 lora_int_id=len(self.lora_requests) + 1, 5 lora_local_path=lora_local_path 6 )) 7 return None
V0.5.4补丁,为了让/v1/models能够正常捕获到添加进去的模型,需要修改openai_serving_completion为openai_serving_chat
1 @router.get("/v1/models") 2 async def show_available_models(): 3 models = await openai_serving_chat.show_available_models() 4 return JSONResponse(content=models.model_dump())