1.0 Block I/O
Because flask is a block I/O, even torch.cuda.empty_cache() still cannot help for it.
Eventually, I find that the solution is: creating the new thread for SD pipeline .
2.0 Create new thread
import threading import torch import json import gc from PIL import Image from diffusers import StableDiffusionPipeline from flask import Flask, request, render_template app = Flask(__name__, static_url_path='', static_folder='', template_folder='') @app.route("/", methods=["POST"]) def index(): # 1.0 thread class MyThread(threading.Thread): def __init__(self,): threading.Thread.__init__(self) # 1.1 pipeline def run(self): # 1.2 clean cuda if torch.cuda.is_available(): gc.collect() torch.cuda.empty_cache() torch.cuda.ipc_collect() model_id, prompt = "YOUR_MODEL_ID", "YOUR_PROMPT" pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to('cuda') image = pipe(prompt).images[0] # 2.0 create 5 thread threads = [] for i in range(5): threads.append(MyThread(i)) threads[i].start() return app.response_class(response=json.dumps({'status': 'success'}), status=200, mimetype='application/json') if __name__ == '__main__': app.debug = True app.run(host='0.0.0.0', port=82)
PS: 這是我從項目簡化出來的代碼,未經測試。
First, 1.0 create a pipeline thread instance
Second, 1.2 clean the cuda space before running pipeline
Finally, 2.0 start a pipeline thread
If pipline is in a new thread, the cuda space can be released by torch.cuda.empty_cache().
3.0 Project Code
https://github.com/kenny-chen/ai.diffusers