【多进程】python多进程CPU密集型任务的进程数选择
实验思路
从1加到100000000, 分别用单进程,多进程方案去做。
实验代码
from multiprocessing import Pool, Process, Queue import os, time, random def test_func(left, right): res = 0 for i in range(left, right): res += i return res def join(q): r = 0 while True: res = q.get() print('merge进程正在干活') if res is None: break r += res q.put(r) def long_time_task(left, right, queue): """ 子进程 :param left: :param right: :param queue: :return: """ res = test_func(left, right) print('分离计算进程干完活了') queue.put(res) def get_split_step(target, thread_num): """ 获取任务列表, 保证列表长度等于线程长度 :param target: :param thread_num: :return: """ if thread_num == 1: return [[1, target]] step = target // thread_num remains = target % thread_num res = [] for i in range(thread_num): res.append([i*step, (i+1)*step]) res[-1][-1] += remains return res def get_res(target): r = 0 for i in range(1, target): r += i return r if __name__ == '__main__': multi_start = time.time() pool = Pool() queue = Queue() p_res_merge = Process(target=join, args=(queue, )) p_list = [] thread_num = 8 target = 100000000 task_list = get_split_step(target, thread_num) for i in range(thread_num): p_list.append(Process(target=long_time_task, args=(task_list[i][0], task_list[i][1], queue))) for pp in p_list: pp.start() p_res_merge.start() for pp in p_list: pp.join() queue.put(None) p_res_merge.join() multi_end = time.time() print(f"多进程res: {queue.get()}, cost: {multi_end-multi_start}") single_start = time.time() single_res = get_res(target) single_end = time.time() print(f"单进程res: {single_res}, cost: {single_end-single_start}")
mac M1 8c 测试结果:
linux 48c 测试结果:
据此推断,大概有70%的开销花在了进程创建、切换上。
结论
按照核心数分配最大进程数是合理的,但也要考虑系统中进程的数量。如果进程多,那进程被调度的机会就少。所以具体应用要做实验。但如果是拍脑袋的经典值,那就是按照核心数=最大进程数来给建议。