这里主要想记录下今天碰到的一个小知识点:Python中的并行编程速率如何?
我想把AutoTool做一个并行化改造,主要目的当然是想提高多任务的执行速度。第一反应就是想到用多线程执行不同模块任务,但是在我收集Python多线程编程资料的时候发现一个非常奇怪的信息,那就是Python的多线程并不是真正的多线程,因为有一个GIL的存在(可以参考这篇文章讲解《Python最难的问题》)导致Python实际上默认(CPython解释器)只能是单线程执行。
这里我写了一个例子可以看看:
1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 # @File : batch_swig_runner.py 4 # @Time : 2019/7/8 18:09 5 # @Author : KuLiuheng 6 # @Email : liuheng.klh@alibaba-inc.com 7 8 from swig_runner import SwigRunner 9 10 import time 11 import logging 12 from threading import Thread 13 from multiprocessing import Pool 14 15 16 class TestRunner(Thread): 17 def __init__(self, name, path): 18 super(TestRunner, self).__init__() 19 self.name = name 20 self.path = path 21 22 def run(self): 23 logging.warning("Message from the thread-%s START" % self.name) 24 for i in range(10000000): # 耗时操作模拟 25 j = int(i) * 10.1 26 # time.sleep(1) 27 logging.warning("Message from the thread-%s END" % self.name) 28 return self.path 29 30 31 def multi_process(mname, mpath): 32 logging.warning("Message from the thread-%s START" % mname) 33 for i in range(10000000): # 耗时操作模拟 34 j = int(i) * 10.1 35 # time.sleep(1) 36 logging.warning("Message from the thread-%s END" % mname) 37 38 39 class BatchSwigRunner(object): 40 def __init__(self, modules=None): 41 """ 42 用模块信息字典(工程名: 工程路径)来初始化 43 :param modules: {工程名: 工程路径} 44 """ 45 if modules is not None: 46 self._modules = modules 47 else: 48 self._modules = dict() 49 50 def add_module_info(self, name, path): 51 self._modules[name] = path 52 53 def start(self): 54 """ 55 启动批量任务执行,并返回执行过程中的错误信息 56 :return: list(工程序号,工程名称) 出错的工程信息列表 57 """ 58 runners = list() 59 for (project_name, project_path) in self._modules.items(): 60 # logging.warning('BatchSwigRunner.start() [%s][%s]' % (project_name, project_path)) 61 sub_runner = TestRunner(project_name, project_path) 62 sub_runner.daemon = True 63 sub_runner.start() 64 runners.append(sub_runner) 65 66 for runner in runners: 67 runner.join() 68 69 70 if __name__ == '__main__': 71 batch_runner = BatchSwigRunner() 72 batch_runner.add_module_info('name1', 'path1') 73 batch_runner.add_module_info('name2', 'path2') 74 batch_runner.add_module_info('name3', 'path3') 75 batch_runner.add_module_info('name4', 'path4') 76 start_time = time.time() 77 batch_runner.start() 78 79 print 'Total time comsumed = %.2fs' % (time.time() - start_time) 80 81 print('========================================') 82 start_time = time.time() 83 84 for index in range(4): 85 logging.warning("Message from the times-%d START" % index) 86 for i in range(10000000): # 耗时操作模拟 87 j = int(i) * 10.1 88 # time.sleep(1) 89 logging.warning("Message from the times-%d END" % index) 90 91 print '>>Total time comsumed = %.2fs' % (time.time() - start_time) 92 93 print('----------------------------------------------') 94 start_time = time.time() 95 96 pool = Pool(processes=4) 97 for i in range(4): 98 pool.apply_async(multi_process, ('name++%d' % i, 'path++%d' % i)) 99 pool.close() 100 pool.join() 101 print '>>>> Total time comsumed = %.2fs' % (time.time() - start_time)
看结果就发现很神奇的结论:
C:\Python27\python.exe E:/VirtualShare/gitLab/GBL-310/GBL/AutoJNI/autoTool/common/batch_swig_runner.py WARNING:root:Message from the thread-name4 START WARNING:root:Message from the thread-name2 START WARNING:root:Message from the thread-name3 START WARNING:root:Message from the thread-name1 START WARNING:root:Message from the thread-name2 END WARNING:root:Message from the thread-name4 END WARNING:root:Message from the thread-name3 END Total time comsumed = 15.92s ======================================== WARNING:root:Message from the thread-name1 END WARNING:root:Message from the times-0 START WARNING:root:Message from the times-0 END WARNING:root:Message from the times-1 START WARNING:root:Message from the times-1 END WARNING:root:Message from the times-2 START WARNING:root:Message from the times-2 END WARNING:root:Message from the times-3 START WARNING:root:Message from the times-3 END >>Total time comsumed = 11.59s ---------------------------------------------- WARNING:root:Message from the thread-name++0 START WARNING:root:Message from the thread-name++1 START WARNING:root:Message from the thread-name++2 START WARNING:root:Message from the thread-name++3 START WARNING:root:Message from the thread-name++1 END WARNING:root:Message from the thread-name++0 END WARNING:root:Message from the thread-name++2 END WARNING:root:Message from the thread-name++3 END >>>> Total time comsumed = 5.69s Process finished with exit code 0
其运行速度是(计算密集型):multiprocessing > normal > threading.Thread
请注意这里用的是持续计算来模拟耗时操作:
for i in range(10000000): # 耗时操作模拟 j = int(i) * 10.1
如果用空等待(time.sleep(1)类似IO等待)来模拟耗时操作,那么结果就是(IO等待型):threading.Thread > multiprocessing > normal
C:\Python27\python.exe E:/VirtualShare/gitLab/GBL-310/GBL/AutoJNI/autoTool/common/batch_swig_runner.py WARNING:root:Message from the thread-name4 START WARNING:root:Message from the thread-name2 START WARNING:root:Message from the thread-name3 START WARNING:root:Message from the thread-name1 START WARNING:root:Message from the thread-name3 END WARNING:root:Message from the thread-name4 END WARNING:root:Message from the thread-name2 END WARNING:root:Message from the thread-name1 END WARNING:root:Message from the times-0 START Total time comsumed = 1.01s ======================================== WARNING:root:Message from the times-0 END WARNING:root:Message from the times-1 START WARNING:root:Message from the times-1 END WARNING:root:Message from the times-2 START WARNING:root:Message from the times-2 END WARNING:root:Message from the times-3 START WARNING:root:Message from the times-3 END >>Total time comsumed = 4.00s ---------------------------------------------- WARNING:root:Message from the thread-name++0 START WARNING:root:Message from the thread-name++1 START WARNING:root:Message from the thread-name++2 START WARNING:root:Message from the thread-name++3 START WARNING:root:Message from the thread-name++0 END WARNING:root:Message from the thread-name++1 END WARNING:root:Message from the thread-name++2 END WARNING:root:Message from the thread-name++3 END >>>> Total time comsumed = 1.73s Process finished with exit code 0
为何会有这样的结果呢?
(1)threading机制中因为GIL的存在,实际上是一把全局锁让多线程变成了CPU线性执行,只可能用到一颗CPU计算。当sleep这样是释放CPU操作发生时,可以迅速切换线程,切换速度可以接受(比multiprocessing快),比normal(阻塞等待)当然快的多;
(2)这里用了多进程Pool,可以真正意义上使用多CPU,对于CPU计算密集型的操作(上面的for循环计算)那么肯定是多核比单核快。所以就出现了第一种测试场景的结果。