LitServe 服务多worker启动简单说明
LitServe 是一个基于fastapi 包装的快速推理api 服务,以下只简单说明下关于server 启动部分的处理
参考使用
我们可以通过配置devices 以及每个device 对应的worker 数执行以那种模式进行server 的启动(多线程还是多进程)
- 参考使用
if __name__ == "__main__":
# Enable the OpenAISpec in LitServer
api = SimpleLitAPI()
server = ls.LitServer(api,workers_per_device=2, spec=ls.OpenAISpec())
server.run(port=8000)
代码处理
- server 启动 处理上实际上使用了类似uvicorn 多worker 的处理,因为默认使用了模块类,而不是字符串,多worker 是不能生效的,所以LitServe 自己使用类似uvicorn的机制包装了下 运行处理
def run(
self,
port: Union[str, int] = 8000,
num_api_servers: Optional[int] = None,
log_level: str = "info",
generate_client_file: bool = True,
api_server_worker_type: Optional[str] = None,
**kwargs,
):
if generate_client_file:
self.generate_client_file()
port_msg = f"port must be a value from 1024 to 65535 but got {port}"
try:
port = int(port)
except ValueError:
raise ValueError(port_msg)
if not (1024 <= port <= 65535):
raise ValueError(port_msg)
# 此处创建socket bind 信息,后续实际server会复用socket
config = uvicorn.Config(app=self.app, host="0.0.0.0", port=port, log_level=log_level, **kwargs)
sockets = [config.bind_socket()]
if num_api_servers is None:
num_api_servers = len(self.workers)
if num_api_servers < 1:
raise ValueError("num_api_servers must be greater than 0")
if sys.platform == "win32":
print("Windows does not support forking. Using threads api_server_worker_type will be set to 'thread'")
api_server_worker_type = "thread"
elif api_server_worker_type is None:
api_server_worker_type = "process"
# 基于配置的devices 以及workers_per_device 信息创建多进程
manager, litserve_workers = self.launch_inference_worker(num_api_servers)
try:
# 基于uvicorn 启动多server进程
servers = self._start_server(port, num_api_servers, log_level, sockets, api_server_worker_type, **kwargs)
print(f"Swagger UI is available at http://0.0.0.0:{port}/docs")
for s in servers:
s.join()
finally:
print("Shutting down LitServe")
for w in litserve_workers:
w.terminate()
w.join()
manager.shutdown()
_start_server 处理
def _start_server(self, port, num_uvicorn_servers, log_level, sockets, uvicorn_worker_type, **kwargs):
servers = []
for response_queue_id in range(num_uvicorn_servers):
self.app.response_queue_id = response_queue_id
if self.lit_spec:
self.lit_spec.response_queue_id = response_queue_id
app = copy.copy(self.app)
config = uvicorn.Config(app=app, host="0.0.0.0", port=port, log_level=log_level, **kwargs)
server = uvicorn.Server(config=config)
# 此处会基于使用线程还是进程进行server 的创建以及启动,复用了socket
if uvicorn_worker_type == "process":
ctx = mp.get_context("fork")
w = ctx.Process(target=server.run, args=(sockets,))
elif uvicorn_worker_type == "thread":
w = threading.Thread(target=server.run, args=(sockets,))
else:
raise ValueError("Invalid value for api_server_worker_type. Must be 'process' or 'thread'")
w.start()
servers.append(w)
return servers
说明
以上只是LitServe关于fastapi 服务基于uvicorn server 启动部分的说明,其他部分的处理后续会介绍,实际上难度并不难,核心是了解内部机制
参考资料
https://github.com/Lightning-AI/LitServe
https://github.com/encode/uvicorn/blob/master/uvicorn/supervisors/multiprocess.py