Python之并发编程-协程
目录
一、介绍
协程:是单线程下的并发,又称微线程,纤程。英文名Coroutine。一句话说明什么是线程:协程是一种用户态的轻量级线程,即协程是由用户程序自己控制调度的。
cpu正在运行一个任务,会在两种情况下切走去执行其他的任务(切换由操作系统强制控制),一种情况是该任务发生了阻塞,另外一种情况是该任务计算的时间过长或有一个优先级更高的程序替代了它
#1. python的线程属于内核级别的,即由操作系统控制调度(如单线程遇到io或执行时间过长就会被迫交出cpu执行权限,切换其他线程运行)
#2. 单线程内开启协程,一旦遇到io,就会从应用程序级别(而非操作系统)控制切换,以此来提升效率(!!!非io操作的切换与效率无关)
协程的特点
1、必须在只有一个单线程里实现并发 2、修改共享数据不需加锁 3、用户程序里自己保存多个控制流的上下文栈 4、附加:一个协程遇到IO操作自动切换到其它协程(如何实现检测IO,yield、greenlet都无法实现,就用到了gevent模块(select机制))
############## 优点 ############## #1. 协程的切换开销更小,属于程序级别的切换,操作系统完全感知不到,因而更加轻量级 #2. 单线程内就可以实现并发的效果,最大限度地利用cpu ############## 缺点 ############## #1. 协程的本质是单线程下,无法利用多核,可以是一个程序开启多个进程,每个进程内开启多个线程,每个线程内开启协程 #2. 协程指的是单个线程,因而一旦协程出现阻塞,将会阻塞整个线程
二、yield、greenlet、gevent介绍
1、yield
#串行执行 import time def consumer(res): '''任务1:接收数据,处理数据''' pass def producer(): '''任务2:生产数据''' res=[] for i in range(10000000): res.append(i) return res start=time.time() #串行执行 res=producer() consumer(res) #写成consumer(producer())会降低执行效率 stop=time.time() print(stop-start) #1.5536692142486572 #基于yield并发执行 import time def consumer(): '''任务1:接收数据,处理数据''' while True: x=yield def producer(): '''任务2:生产数据''' g=consumer() next(g) for i in range(10000000): g.send(i) start=time.time() #基于yield保存状态,实现两个任务直接来回切换,即并发的效果 #PS:如果每个任务中都加上打印,那么明显地看到两个任务的打印是你一次我一次,即并发执行的. producer() stop=time.time() print(stop-start) #2.0272178649902344 ''' 1.2388806343078613 1.4825549125671387 '''
import time def consumer(): '''任务1:接收数据,处理数据''' while True: x=yield def producer(): '''任务2:生产数据''' g=consumer() next(g) for i in range(10000000): g.send(i) time.sleep(2) start=time.time() producer() #并发执行,但是任务producer遇到io就会阻塞住,并不会切到该线程内的其他任务去执行 stop=time.time() print(stop-start)
2、greenlet
如果我们在单个线程内有20个任务,要想实现在多个任务之间切换,使用yield生成器的方式过于麻烦(需要先得到初始化一次的生成器,然后再调用send。。。非常麻烦),而使用greenlet模块可以非常简单地实现这20个任务直接的切换
注意:greenlet无法解决io阻塞
pip install greenlet
from greenlet import greenlet def eat(name): print('%s eat 1' %name) g2.switch('b') print('%s eat 2' %name) g2.switch() def play(name): print('%s play 1' %name) g1.switch() print('%s play 2' %name) g1=greenlet(eat) g2=greenlet(play) g1.switch('a')#可以在第一次switch时传入参数,以后都不需要 ''' a eat 1 b play 1 a eat 2 b play 2 '''
#顺序执行 import time def f1(): res=1 for i in range(100000000): res+=i def f2(): res=1 for i in range(100000000): res*=i start=time.time() f1() f2() stop=time.time() print('run time is %s' %(stop-start)) #10.985628366470337 #切换 from greenlet import greenlet import time def f1(): res=1 for i in range(100000000): res+=i g2.switch() def f2(): res=1 for i in range(100000000): res*=i g1.switch() start=time.time() g1=greenlet(f1) g2=greenlet(f2) g1.switch() stop=time.time() print('run time is %s' %(stop-start)) # 52.763017892837524
greenlet只是提供了一种比generator更加便捷的切换方式,当切到一个任务执行时如果遇到io,那就原地阻塞,仍然是没有解决遇到IO自动切换来提升效率的问题。
3、gevent
(1)说明
Gevent 是一个第三方库,可以轻松通过gevent实现并发同步或异步编程,在gevent中用到的主要模式是Greenlet, 它是以C扩展模块形式接入Python的轻量级协程。 Greenlet全部运行在主程序操作系统进程的内部,但它们被协作式地调度。
(2)使用
pip install gevent
#用法 g1=gevent.spawn(func,1,,2,3,x=4,y=5)创建一个协程对象g1,spawn括号内第一个参数是函数名,如eat,后面可以有多个参数,可以是位置实参或关键字实参,都是传给函数eat的 g2=gevent.spawn(func2) g1.join() #等待g1结束 g2.join() #等待g2结束 #或者上述两步合作一步:gevent.joinall([g1,g2]) g1.value#拿到func1的返回值
import gevent def eat(name): print('%s eat 1' %name) gevent.sleep(2) print('%s eat 2' %name) def play(name): print('%s play 1' %name) gevent.sleep(1) print('%s play 2' %name) g1=gevent.spawn(eat,'a') g2=gevent.spawn(play,name='b') g1.join() g2.join() #或者gevent.joinall([g1,g2]) print('主') """ a eat 1 b play 1 b play 2 a eat 2 主 """
上例gevent.sleep(2)模拟的是gevent可以识别的io阻塞,
而time.sleep(2)或其他的阻塞,gevent是不能直接识别的需要用下面一行代码,打补丁,就可以识别了
from gevent import monkey;monkey.patch_all()必须放到被打补丁者的前面,如time,socket模块之前
from gevent import monkey;monkey.patch_all() import gevent import time def eat(): print('eat food 1') time.sleep(2) # 此处已是由原来的gevent.sleep(2)变为正常的io print('eat food 2') def play(): print('play 1') time.sleep(1) print('play 2') g1=gevent.spawn(eat) g2=gevent.spawn(play) gevent.joinall([g1,g2]) print('主') """ eat food 1 play 1 play 2 eat food 2 主 """
(3)gevent的同步和异步
from gevent import spawn,joinall,monkey;monkey.patch_all() import time def task(pid): """ Some non-deterministic task """ time.sleep(0.5) print('Task %s done' % pid) def synchronous(): for i in range(10): task(i) def asynchronous(): g_l=[spawn(task,i) for i in range(10)] joinall(g_l) if __name__ == '__main__': print('Synchronous:') synchronous() # 串行 print('Asynchronous:') asynchronous() # 并行 #上面程序的重要部分是将task函数封装到Greenlet内部线程的gevent.spawn。 初始化的greenlet列表存放在数组threads中,此数组被传给gevent.joinall 函数,后者阻塞当前流程,并执行所有给定的greenlet。执行流程只会在 所有greenlet执行完后才会继续向下走。 ''' Synchronous: Task 0 done Task 1 done Task 2 done Task 3 done Task 4 done Task 5 done Task 6 done Task 7 done Task 8 done Task 9 done Asynchronous: Task 0 done Task 9 done Task 8 done Task 7 done Task 6 done Task 5 done Task 4 done Task 3 done Task 2 done Task 1 done '''
(4)获取函数的返回值
get() # 获取函数的返回值 def func(): return "xxx" html_l = [ gevent.spawn(func,*args), gevent.spawn..... ] res = gevent.joinall(html_l) for item in res: print(item.get())
参考:
https://blog.csdn.net/caimouse/article/details/77823428
https://blog.csdn.net/ououming123/article/details/78983516
(5)timeout
# -*- coding:utf-8 -*- from gevent import spawn,joinall,Timeout,monkey;monkey.patch_all() import time def func(x): time.sleep(1) return 2**x # 情况一 # 情况1.1 # timeout = Timeout(0.5,True) # timeout.start() # # l = [spawn(func,x=x) for x in range(5)] # res = joinall(l) # for i in res: # print(i.get()) """ 结果 TypeError: exceptions must be classes, or instances, not bool Sun Oct 14 01:13:58 2018 <timer at 0x9d08f60 callback=<built-in method throw of greenlet.greenlet object at 0x09C1BDA0> args=(True,)> failed with TypeError 16 8 4 2 1 """ # 情况1.2 # timeout = Timeout(0.5,False) # timeout.start() # # l = [spawn(func,x=x) for x in range(5)] # res = joinall(l) # for i in res: # print(i.get()) """ # 报错 Traceback (most recent call last): File "D:/Python相关/项目/爬虫_奇葩买家/tmp1.py", line 36, in <module> res = joinall(l) File "C:\Python\Python36\lib\site-packages\gevent\greenlet.py", line 649, in joinall return wait(greenlets, timeout=timeout, count=count) File "C:\Python\Python36\lib\site-packages\gevent\hub.py", line 1037, in wait return list(iwait(objects, timeout, count)) File "C:\Python\Python36\lib\site-packages\gevent\hub.py", line 984, in iwait item = waiter.get() File "C:\Python\Python36\lib\site-packages\gevent\hub.py", line 938, in get Waiter.get(self) File "C:\Python\Python36\lib\site-packages\gevent\hub.py", line 898, in get return self.hub.switch() File "C:\Python\Python36\lib\site-packages\gevent\hub.py", line 630, in switch return RawGreenlet.switch(self) gevent.timeout.Timeout: 0.5 seconds (silent) """ # 情况二 # 2.1 # with Timeout(0.5,True) as timeout: # l = [spawn(func, x=x) for x in range(5)] # res = joinall(l) # for i in res: # print(i.get()) """ 结果 TypeError: exceptions must be classes, or instances, not bool Sun Oct 14 01:19:17 2018 <timer at 0x9e98f60 callback=<built-in method throw of greenlet.greenlet object at 0x09DABDA0> args=(True,)> failed with TypeError 16 8 4 2 1 """ # 2.2 # with Timeout(0.5,False) as timeout: # l = [spawn(func, x=x) for x in range(5)] # res = joinall(l) # for i in res: # print(i.get()) """ 结果 空白 """ # 情况三 # 3.1 l = [spawn(func,x=x) for x in range(5)] res = joinall(l,timeout=0.5,raise_error=True) for i in res: print(i.get()) """ 结果 空白 """ # 3.2 # l = [spawn(func,x=x) for x in range(5)] # res = joinall(l,timeout=0.5,raise_error=False) # for i in res: # print(i.get()) """ 结果 空白 """
参考:
https://blog.csdn.net/ououming123/article/details/78983516
(6)应用举例
from gevent import monkey;monkey.patch_all() import gevent import requests import time def get_page(url): print('GET: %s' %url) response=requests.get(url) if response.status_code == 200: print('%d bytes received from %s' %(len(response.text),url)) start_time=time.time() gevent.joinall([ gevent.spawn(get_page,'https://www.python.org/'), gevent.spawn(get_page,'https://www.yahoo.com/'), gevent.spawn(get_page,'https://github.com/'), ]) stop_time=time.time() print('run time is %s' %(stop_time-start_time)) ''' GET: https://www.python.org/ GET: https://www.yahoo.com/ GET: https://github.com/ 516957 bytes received from https://www.yahoo.com/ 49056 bytes received from https://www.python.org/ 54497 bytes received from https://github.com/ run time is 2.434030294418335 '''
通过gevent实现单线程下的socket并发(from gevent import monkey;monkey.patch_all()一定要放到导入socket模块之前,否则gevent无法识别socket的阻塞)
from gevent import monkey;monkey.patch_all() from socket import * import gevent #如果不想用money.patch_all()打补丁,可以用gevent自带的socket # from gevent import socket # s=socket.socket() def server(server_ip,port): s=socket(AF_INET,SOCK_STREAM) s.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) s.bind((server_ip,port)) s.listen(5) while True: conn,addr=s.accept() gevent.spawn(talk,conn,addr) def talk(conn,addr): try: while True: res=conn.recv(1024) print('client %s:%s msg: %s' %(addr[0],addr[1],res)) conn.send(res.upper()) except Exception as e: print(e) finally: conn.close() if __name__ == '__main__': server('127.0.0.1',8080)
#_*_coding:utf-8_*_ __author__ = 'Linhaifeng' from socket import * client=socket(AF_INET,SOCK_STREAM) client.connect(('127.0.0.1',8080)) while True: msg=input('>>: ').strip() if not msg:continue client.send(msg.encode('utf-8')) msg=client.recv(1024) print(msg.decode('utf-8'))
from threading import Thread from socket import * import threading def client(server_ip,port): c=socket(AF_INET,SOCK_STREAM) #套接字对象一定要加到函数内,即局部名称空间内,放在函数外则被所有线程共享,则大家公用一个套接字对象,那么客户端端口永远一样了 c.connect((server_ip,port)) count=0 while True: c.send(('%s say hello %s' %(threading.current_thread().getName(),count)).encode('utf-8')) msg=c.recv(1024) print(msg.decode('utf-8')) count+=1 if __name__ == '__main__': for i in range(500): t=Thread(target=client,args=('127.0.0.1',8080)) t.start()
import gevent import requests from gevent import monkey monkey.patch_all() def fetch_async(method, url, req_kwargs): print(method, url, req_kwargs) response = requests.request(method=method, url=url, **req_kwargs) print(response.url, response.content) # ##### 发送请求 ##### gevent.joinall([ gevent.spawn(fetch_async, method='get', url='https://www.python.org/', req_kwargs={}), gevent.spawn(fetch_async, method='get', url='https://www.yahoo.com/', req_kwargs={}), gevent.spawn(fetch_async, method='get', url='https://github.com/', req_kwargs={}), ]) # ##### 发送请求(协程池控制最大协程数量) ##### # from gevent.pool import Pool # pool = Pool(None) # gevent.joinall([ # pool.spawn(fetch_async, method='get', url='https://www.python.org/', req_kwargs={}), # pool.spawn(fetch_async, method='get', url='https://www.yahoo.com/', req_kwargs={}), # pool.spawn(fetch_async, method='get', url='https://www.github.com/', req_kwargs={}), # ])