给vllm添加热添加lora的功能
写在前面
原生vllm并不支持热添加lora,但是考虑到微调机微调后,需要在不停机的情况下传递lora,于是我们需要增加一个逻辑
修改VLLM包中的vllm/entrypoints/openai/api_server.py文件,添加下方的代码:
1 from pydantic import BaseModel 2 3 class AddLoraRequest(BaseModel): 4 lora_name: str 5 lora_local_path: str 6 7 @app.get("/add_lora") 8 async def add_lora(request: AddLoraRequest): 9 openai_serving_chat.add_lora(request.lora_name, request.lora_local_path) 10 return Response(status_code=200)
随后,修改同目录下的serving_engine.py,在OpenAIServing的类中,添加下面的代码
1 def add_lora(self, lora_name, lora_local_path): 2 self.lora_requests.append(LoRARequest( 3 lora_name=lora_name, 4 lora_int_id=len(self.lora_requests) + 1, 5 lora_local_path=lora_local_path 6 )) 7 return None