Python 多任务之间的效率比较

1. 多进程访问 1000 个 url

2. 多线程访问 1000 个 url

3. 协程访问 1000 个 url

4. 多进程+协程访问 1000 个 url

5. 多线程+协程访问 1000 个 url

6. 总结

代码下载：https://github.com/juno3550/MultitaskCompare

1. 多进程访问 1000 个 url

示例 1：使用 5 个子进程访问 1000 个 url

 1 from multiprocessing import Process, Queue
 2 import requests
 3 import time
 4 
 5 
 6 # 任务函数
 7 def visit_url(q, i):
 8     while not q.empty():
 9         try:
10             url = q.get_nowait()
11             r = requests.get(url, timeout=5)    
12             print("【第%s个子进程】响应状态码 [%s]：%s" % (i, r.status_code, url))
13         except Exception as e:
14             print("【第%s个子进程】访问异常[%s]，原因：%s" % (i, url, e))
15 
16 
17 if __name__ == "__main__":
18     q = Queue()
19     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
20         for url in f:
21             q.put(url.strip())  # 去掉末尾换行符等
22     print("*"*20+"开始计时"+"*"*20)
23     start = time.time()
24     p_list = []
25     # 创建5个子进程
26     for i in range(5):
27         p = Process(target=visit_url, args=(q, i+1))
28         p_list.append(p)
29         p.start()
30         print(p)
31     # 等待所有子进程执行完成
32     for p in p_list:
33         p.join()
34         print(p)
35     end = time.time()
36     print("*"*20+"结束计时"+"*"*20)
37     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
【第4个子进程】响应状态码 [200]：https://www.sohu.com/a/438336257_212351?scm=1004.753292626492588032.0.0.0
【第1个子进程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
【第3个子进程】响应状态码 [200]：https://www.sohu.com/a/438336257_212351?scm=1004.753292626492588032.0.0.0
【第2个子进程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
【第5个子进程】响应状态码 [200]：https://www.sohu.com/a/438336257_212351?scm=1004.753292626492588032.0.0.0
【第4个子进程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
【第1个子进程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
<Process(Process-1, stopped)>
<Process(Process-2, stopped)>
<Process(Process-3, stopped)>
<Process(Process-4, stopped)>
<Process(Process-5, stopped)>
********************结束计时********************
总耗时：69.90861248970032秒

示例 2：使用 1000 个子进程访问 1000 个 url

 1 from multiprocessing import Process, Queue
 2 import requests
 3 import time
 4 
 5 
 6 # 任务函数
 7 def visit_url(q, i):
 8     while not q.empty():
 9         try:
10             url = q.get_nowait()
11             r = requests.get(url, timeout=5)    
12             print("【第%s个子进程】响应状态码 [%s]：%s" % (i, r.status_code, url))
13         except Exception as e:
14             print("【第%s个子进程】访问异常[%s]，原因：%s" % (i, url, e))
15 
16 
17 if __name__ == "__main__":
18     q = Queue()
19     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
20         for url in f:
21             q.put(url.strip())  # 去掉末尾换行符等
22     print("*"*20+"开始计时"+"*"*20)
23     start = time.time()
24     p_list = []
25     # 创建1000个子进程
26     for i in range(1000):
27         p = Process(target=visit_url, args=(q, i+1))
28         p_list.append(p)
29         p.start()
30         print(p)
31     # 等待所有子进程执行完成
32     for p in p_list:
33         p.join()
34         print(p)
35     end = time.time()
36     print("*"*20+"结束计时"+"*"*20)
37     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
<Process(Process-988, stopped)>
<Process(Process-989, stopped)>
<Process(Process-990, stopped)>
<Process(Process-991, stopped)>
<Process(Process-992, stopped)>
<Process(Process-993, stopped)>
<Process(Process-994, stopped)>
<Process(Process-995, stopped)>
<Process(Process-996, stopped)>
<Process(Process-997, stopped)>
<Process(Process-998, stopped)>
<Process(Process-999, stopped)>
<Process(Process-1000, stopped)>
********************结束计时********************
总耗时：104.71422028541565秒

2. 多线程访问 1000 个 url

示例 1：使用 5 个子线程访问 1000 个 url

 1 from threading import Thread
 2 import queue
 3 import requests
 4 import time
 5 
 6 
 7 # 任务函数
 8 def visit_url(q, i):
 9     while not q.empty():
10         try:
11             url = q.get()
12             r = requests.get(url, timeout=5)    
13             print("【第%s个子线程】响应状态码 [%s]：%s" % (i, r.status_code, url))
14         except Exception as e:
15             print("【第%s个子线程】访问异常[%s]，原因：%s" % (i, url, e))
16 
17 
18 if __name__ == "__main__":
19     q = queue.Queue()
20     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
21         for url in f:
22             q.put(url.strip())  # 去掉末尾换行符等
23     print("*"*20+"开始计时"+"*"*20)
24     start = time.time()
25     t_list = []
26     # 创建5个子线程
27     for i in range(5):
28         t = Thread(target=visit_url, args=(q, i+1))
29         t_list.append(t)
30         t.start()
31         print(t)
32     # 等待所有子线程执行完成
33     for t in t_list:
34         t.join()
35         print(t)
36     end = time.time()
37     print("*"*20+"结束计时"+"*"*20)
38     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
【第3个子线程】响应状态码 [200]：http://business.sohu.com
【第4个子线程】响应状态码 [200]：https://www.sohu.com/a/438238938_120774106?scm=1004.753292626492588032.0.0.0
【第1个子线程】响应状态码 [200]：https://www.sohu.com/a/438238938_120774106?scm=1004.753292626492588032.0.0.0
【第5个子线程】响应状态码 [200]：https://www.sohu.com/a/438336257_212351?scm=1004.753292626492588032.0.0.0
【第2个子线程】响应状态码 [200]：https://www.sohu.com/a/438238938_120774106?scm=1004.753292626492588032.0.0.0
【第3个子线程】响应状态码 [200]：https://www.sohu.com/a/438336257_212351?scm=1004.753292626492588032.0.0.0
【第1个子线程】响应状态码 [200]：https://www.sohu.com/a/438336257_212351?scm=1004.753292626492588032.0.0.0
【第4个子线程】响应状态码 [200]：https://www.sohu.com/a/438336257_212351?scm=1004.753292626492588032.0.0.0
【第5个子线程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
【第2个子线程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
【第3个子线程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
【第1个子线程】响应状态码 [200]：https://www.sohu.com/a/438221545_116237?scm=1004.753292626492588032.0.0.0
<Thread(Thread-1, stopped 13348)>
<Thread(Thread-2, stopped 2100)>
<Thread(Thread-3, stopped 12300)>
<Thread(Thread-4, stopped 14432)>
<Thread(Thread-5, stopped 10780)>
********************结束计时********************
总耗时：68.118173122406秒

示例 2：使用 1000 个子线程访问 1000 个 url

 1 from threading import Thread
 2 import queue
 3 import requests
 4 import time
 5 
 6 
 7 # 任务函数
 8 def visit_url(q, i):
 9     while not q.empty():
10         try:
11             url = q.get()
12             r = requests.get(url, timeout=5)    
13             print("【第%s个子线程】响应状态码 [%s]：%s" % (i, r.status_code, url))
14         except Exception as e:
15             print("【第%s个子线程】访问异常[%s]，原因：%s" % (i, url, e))
16 
17 
18 if __name__ == "__main__":
19     q = queue.Queue()
20     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
21         for url in f:
22             q.put(url.strip())  # 去掉末尾换行符等
23     print("*"*20+"开始计时"+"*"*20)
24     start = time.time()
25     t_list = []
26     # 创建1000个子线程
27     for i in range(1000):
28         t = Thread(target=visit_url, args=(q, i+1))
29         t_list.append(t)
30         t.start()
31         print(t)
32     # 等待所有子线程执行完成
33     for t in t_list:
34         t.join()
35         print(t)
36     end = time.time()
37     print("*"*20+"结束计时"+"*"*20)
38     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
<Thread(Thread-987, stopped 3804)>
<Thread(Thread-988, stopped 24804)>
<Thread(Thread-989, stopped 34064)>
<Thread(Thread-990, stopped 28316)>
<Thread(Thread-991, stopped 24660)>
<Thread(Thread-992, stopped 26520)>
<Thread(Thread-993, stopped 19980)>
<Thread(Thread-994, stopped 20036)>
<Thread(Thread-995, stopped 26488)>
<Thread(Thread-996, stopped 34764)>
<Thread(Thread-997, stopped 5136)>
<Thread(Thread-998, stopped 6416)>
<Thread(Thread-999, stopped 13992)>
<Thread(Thread-1000, stopped 20072)>
********************结束计时********************
总耗时：34.05519509315491秒

3. 协程访问 1000 个 url

示例 1：5 个协程访问 1000 个 url

 1 from gevent import monkey; monkey.patch_all()
 2 import gevent
 3 from tornado.queues import Queue  # 若是multiprocessing的Queue与协程同时使用，会有问题
 4 import requests
 5 import time
 6 
 7 
 8 # 任务函数
 9 def visit_url(url_list, i):
10     while url_list:
11         try:
12             url = url_list.pop()
13             r = requests.get(url, timeout=5)
14             print("【第%s个协程】响应状态码 [%s]：%s" % (i, r.status_code, url))
15         except Exception as e:
16             print("【第%s个协程】访问异常[%s]，原因：%s" % (i, url, e))
17 
18 # 创建协程
19 def gevent_maker(q):
20     url_list = []
21     tasks = []
22     i = 1
23     while not q.empty():
24         url_list.append(q.get()._result)  # tornado的Queue取元素的值时要用._result
25         # 每满200个url就交给协程，共需5个协程
26         if len(url_list) == 200:
27             tasks.append(gevent.spawn(visit_url, url_list, i))
28             url_list = []
29             i += 1
30     gevent.joinall(tasks)
31 
32 
33 if __name__ == "__main__":
34     q = Queue()
35     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
36         for url in f:
37             q.put(url.strip())  # 去掉末尾的换行符
38     print("*"*20+"开始计时"+"*"*20)
39     start = time.time()
40     gevent_maker(q)
41     end = time.time()
42     print("*"*20+"结束计时"+"*"*20)
43     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
【第2个协程】响应状态码 [200]：https://www.sohu.com/a/438252523_118035
【第3个协程】响应状态码 [200]：http://www.sohu.com/upload/uiue20171218/chban.html
【第3个协程】响应状态码 [200]：http://py.qianlong.com/
【第3个协程】响应状态码 [200]：http://www.sohu.com/upload/uiue20171218/chubanwu.html
【第2个协程】响应状态码 [200]：https://www.sohu.com/a/438262224_347781
【第2个协程】响应状态码 [200]：https://www.sohu.com/a/438288471_122187
【第2个协程】响应状态码 [200]：http://db.auto.sohu.com/autoshow/page/pic/bj2020.html?picGroupId=151458
【第2个协程】响应状态码 [200]：http://db.auto.sohu.com/guangqitoyota/1679/pic_m__29657065.html
【第2个协程】响应状态码 [200]：http://db.auto.sohu.com/autoshow/page/pic/gz2020.html?picGroupId=151671
【第2个协程】响应状态码 [200]：https://www.sohu.com/a/438134717_430526
********************结束计时********************
总耗时：132.53881287574768秒

示例 2：1000 个协程访问 1000 个 url

 1 from gevent import monkey; monkey.patch_all()
 2 import gevent
 3 from tornado.queues import Queue  # 若是multiprocessing的Queue与协程同时使用，会有问题
 4 import requests
 5 import time
 6 
 7 
 8 # 任务函数
 9 def visit_url(url, i):
10     try:
11         r = requests.get(url, timeout=5)
12         print("【第%s个协程】响应状态码 [%s]：%s" % (i, r.status_code, url))
13     except Exception as e:
14         print("【第%s个协程】访问异常[%s]，原因：%s" % (i, url, e))
15 
16 # 创建协程
17 def gevent_maker(q):
18     i = 1
19     tasks = []
20     while not q.empty():
21         # 1个协程访问1个url
22         # tornado的Queue取元素的值时要用._result
23         tasks.append(gevent.spawn(visit_url, q.get()._result, i))
24         i += 1
25     gevent.joinall(tasks)
26 
27 
28 if __name__ == "__main__":
29     q = Queue()
30     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
31         for url in f:
32             q.put(url.strip())  # 去掉末尾的换行符
33     print("*"*20+"开始计时"+"*"*20)
34     start = time.time()
35     gevent_maker(q)
36     end = time.time()
37     print("*"*20+"结束计时"+"*"*20)
38     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
【第253个协程】响应状态码 [200]：http://astro.women.sohu.com/
【第818个协程】响应状态码 [200]：http://astro.women.sohu.com/
【第849个协程】响应状态码 [200]：http://www.sohu.com/a/257276765_742667
【第284个协程】响应状态码 [200]：http://www.sohu.com/a/257276765_742667
【第845个协程】响应状态码 [200]：http://www.sohu.com/a/257276765_742667
【第310个协程】响应状态码 [200]：http://www.sohu.com/a/255685071_479794
【第975个协程】响应状态码 [200]：http://investors.sohu.com/
【第410个协程】访问异常[http://investors.sohu.com/]，原因：HTTPSConnectionPool(host='investors.sohu.com', port=443): Read timed out.
【第905个协程】访问异常[http://travel.sohu.com/1447]，原因：HTTPSConnectionPool(host='travel.sohu.com', port=443): Read timed out. (read timeout=5)
********************结束计时********************
总耗时：55.665443420410156秒

4. 多进程+协程访问 1000 个 url

示例：5 个子进程 + 各 200 个协程访问 1000 个 url

 1 from gevent import monkey; monkey.patch_all()
 2 import gevent
 3 from tornado.queues import Queue
 4 import requests
 5 import time
 6 from multiprocessing import Process
 7 
 8 
 9 # 任务函数
10 def visit_url(args):
11         try:
12             r = requests.get(args.split(" ")[0], timeout=5)    
13             print("【第%s个子进程的第%s个协程】响应状态码 [%s]：%s" % (args.split(" ")[1], args.split(" ")[2], r.status_code, args.split(" ")[0]))
14         except Exception as e:
15             print("【第%s个子进程的第%s个协程】访问异常[%s]，原因：%s" % (args.split(" ")[1], args.split(" ")[2], args.split(" ")[0], e))
16 
17 # 创建协程
18 def gevent_maker(q, p_i):
19     url_list = []
20     c_i = 1
21     while not q.empty():
22         url_list.append(q.get()._result)
23         # 每个进程存满200个url后，则创建200个协程来访问url
24         if len(url_list) == 200:
25             tasks = []
26             for url in url_list:
27                 # 由于传值需要是字符串，故使用join()拼接所需实参
28                 tasks.append(gevent.spawn(visit_url, " ".join([url, str(p_i), str(c_i)])))
29                 c_i += 1
30             gevent.joinall(tasks)
31     return
32 
33 
34 if __name__ == "__main__":
35     q = Queue()
36     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
37         for url in f:
38             q.put(url.strip())  # 去掉末尾的换行符
39     print("*"*20+"开始计时"+"*"*20)
40     start = time.time()
41     p_list = []
42     # 创建5个子进程
43     for i in range(5):
44         p = Process(target=gevent_maker, args=(q, i+1))
45         p_list.append(p)
46         p.start()
47         print(p)
48     for p in p_list:
49         p.join()
50         print(p)
51     end = time.time()
52     print("*"*20+"结束计时"+"*"*20)
53     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
【第5个子进程的第98个协程】响应状态码 [200]：http://yule.sohu.com/
【第4个子进程的第175个协程】响应状态码 [200]：http://business.sohu.com
【第3个子进程的第12个协程】响应状态码 [200]：http://cul.sohu.com
【第5个子进程的第173个协程】响应状态码 [200]：http://business.sohu.com/996
【第5个子进程的第4个协程】响应状态码 [200]：http://home.focus.cn/
【第4个子进程的第182个协程】响应状态码 [200]：https://www.sohu.com/a/438307245_120919952?scm=1004.759738464039272448.0.0.0
<Process(Process-3, stopped)>
<Process(Process-4, stopped)>
【第5个子进程的第2个协程】响应状态码 [200]：http://www.focus.cn/
<Process(Process-5, stopped)>
********************结束计时********************
总耗时：24.992220401763916秒

5. 多线程+协程访问 1000 个 url

示例：5 个子线程 + 各 200 个协程访问 1000 个 url

 1 from gevent import monkey; monkey.patch_all()
 2 import gevent
 3 import queue
 4 import requests
 5 import time
 6 from threading import Thread
 7 
 8 
 9 # 任务函数
10 def visit_url(args):
11         try:
12             r = requests.get(args.split(" ")[0], timeout=5)    
13             print("【第%s个子线程的第%s个协程】响应状态码 [%s]：%s" % (args.split(" ")[1], args.split(" ")[2], r.status_code, args.split(" ")[0]))
14         except Exception as e:
15             print("【第%s个子线程的第%s个协程】访问异常[%s]，原因：%s" % (args.split(" ")[1], args.split(" ")[2], args.split(" ")[0], e))
16 
17 # 创建协程
18 def gevent_maker(q, p_i):
19     url_list = []
20     c_i = 1
21     while not q.empty():
22         url_list.append(q.get())
23         # 每个线程存满200个url后，则创建200个协程来访问url
24         if len(url_list) == 200:
25             tasks = []
26             for url in url_list:
27                 # 由于传值需要是字符串，故使用join()拼接所需实参
28                 tasks.append(gevent.spawn(visit_url, " ".join([url, str(p_i), str(c_i)])))
29                 c_i += 1
30             gevent.joinall(tasks)
31     return
32 
33 
34 if __name__ == "__main__":
35     q = queue.Queue()
36     with open("e:\\url.txt") as f:  # 存储了1000个url的本地文件
37         for url in f:
38             q.put(url.strip())  # 去掉末尾的换行符
39     print("*"*20+"开始计时"+"*"*20)
40     start = time.time()
41     t_list = []
42     # 创建5个子线程
43     for i in range(5):
44         t = Thread(target=gevent_maker, args=(q, i+1))
45         t_list.append(t)
46         t.start()
47         print(t)
48     for t in t_list:
49         t.join()
50         print(t)
51     end = time.time()
52     print("*"*20+"结束计时"+"*"*20)
53     print("总耗时：%s秒" % (end-start))

执行结果：

……
……
【第5个子线程的第18个协程】响应状态码 [200]：http://astro.women.sohu.com/
【第2个子线程的第84个协程】响应状态码 [200]：http://www.sohu.com/a/257276765_742667
【第5个子线程的第177个协程】响应状态码 [200]：http://business.sohu.com/
【第2个子线程的第110个协程】响应状态码 [200]：http://www.sohu.com/a/255685071_479794
<Thread(Thread-2, stopped 1734213543240)>
<Thread(Thread-3, stopped 1734213543496)>
<Thread(Thread-4, stopped 1734213543752)>
【第5个子线程的第49个协程】响应状态码 [200]：http://www.sohu.com/a/257276765_742667
【第5个子线程的第105个协程】访问异常[http://travel.sohu.com/1447]，原因：HTTPSConnectionPool(host='travel.sohu.com', port=443): Read timed out. (read timeout=5)
【第5个子线程的第175个协程】响应状态码 [200]：http://investors.sohu.com/
<Thread(Thread-5, stopped 1734213544008)>
********************结束计时********************
总耗时：56.801339864730835秒

6. 总结

并发访问 1000 个 url 的耗时比较

多进程：
- 5 个子进程总耗时：69.90861248970032秒
- 1000 个子进程总耗时：104.71422028541565秒
多线程：
- 5 个子线程总耗时：68.118173122406秒
- 1000 个子线程总耗时：34.05519509315491秒
协程：
- 5 个协程总耗时：132.53881287574768秒
- 1000 个协程总耗时：55.665443420410156秒
多进程/线程 + 协程：
- 5 个子进程 + 各 200 个协程总耗时：24.992220401763916秒
- 5 个子线程 + 各 200 个协程总耗时：56.801339864730835秒

小结，对于 I/O 密集型任务：

在单种多任务方式少量并发（如 5 个）的比较下，性能由优到劣排序：多进程 ≈ 多线程 > 协程
在单种多任务方式大量并发（如 1000 个）的比较下，性能由优到劣排序：多线程 > 协程 > 多进程
在混合多任务的比较下，性能由优到劣排序：多进程+协程 > 多线程+协程
在单种与混合多任务的比较下，性能由优到劣排序：多进程+协程 > 多线程（大量并发）> 协程（大量并发）> 多线程+协程 > 多线程（少量并发）> 多进程（少量并发）> 多进程（大量并发）> 协程（少量并发）

MultitaskCompare

posted @ 2020-12-18 00:42 Juno3550 阅读(137) 评论(0) 编辑收藏举报

刷新页面返回顶部

1. 多进程访问 1000 个 url

示例 1：使用 5 个子进程访问 1000 个 url

执行结果：

示例 2：使用 1000 个子进程访问 1000 个 url

执行结果：

2. 多线程访问 1000 个 url

示例 1：使用 5 个子线程访问 1000 个 url

执行结果：

示例 2：使用 1000 个子线程访问 1000 个 url

执行结果：

3. 协程访问 1000 个 url

示例 1：5 个协程访问 1000 个 url

执行结果：

示例 2：1000 个协程访问 1000 个 url

执行结果：

4. 多进程+协程访问 1000 个 url

示例：5 个子进程 + 各 200 个协程访问 1000 个 url

执行结果：

5. 多线程+协程访问 1000 个 url

示例：5 个子线程 + 各 200 个协程访问 1000 个 url

执行结果：

6. 总结

并发访问 1000 个 url 的耗时比较

小结，对于 I/O 密集型任务：

公告