Python全栈之路系列----之-----协程(单线程并发)/Greenlet
协程
概念
协程:是单线程下的并发,又称微线程,纤程。英文名Coroutine
协程:协程是一种用户态的轻量级线程,即协程是由用户程序自己控制调度的。
对于单线程下,我们不可避免程序中出现io操作,但如果我们能在自己的程序中(即用户程序级别,而非操作系统级别)控制单线程下的多个任务能在一个任务遇到io阻塞时就切换到另外一个任务去计算
这样就保证了该线程能够最大限度地处于就绪态,即随时都可以被cpu执行的状态,相当于我们在用户程序级别将自己的io操作最大限度地隐藏起来
从而可以迷惑操作系统,让其看到:该线程好像是一直在计算,io比较少,从而更多的将cpu的执行权限分配给我们的线程。
单线程实现并发即在一个主线程内实现并发;本质是:切换+保存状态,这种并发只基于io阻塞
1. 可以控制多个任务之间的切换,切换之前将任务的状态保存下来,以便重新运行时,可以基于暂停的位置继续执行。
2. 可以检测io操作,在遇到io操作的情况下才发生切换
在任务一遇到io情况下,切到任务二去执行,这样就可以利用任务一阻塞的时间完成任务二的计算,效率的提升就在于此。
#1 yiled可以保存状态,yield的状态保存与操作系统的保存线程状态很像,但是yield是代码级别控制的,更轻量级 #2 send可以把一个函数的结果传给另外一个函数,以此实现单线程内程序之间的切换 单纯地切换反而会降低运行效率 #串行执行 import time def consumer(res): '''任务1:接收数据,处理数据''' pass def producer(): '''任务2:生产数据''' res=[] for i in range(10000000): res.append(i) return res start=time.time() #串行执行 res=producer() consumer(res) #写成consumer(producer())会降低执行效率 stop=time.time() print(stop-start) #1.5536692142486572 #基于yield并发执行 import time def consumer(): '''任务1:接收数据,处理数据''' while True: x=yield def producer(): '''任务2:生产数据''' g=consumer() next(g) for i in range(10000000): g.send(i) start=time.time() #基于yield保存状态,实现两个任务直接来回切换,即并发的效果 #PS:如果每个任务中都加上打印,那么明显地看到两个任务的打印是你一次我一次,即并发执行的. producer() stop=time.time() print(stop-start) #2.0272178649902344
import time def consumer(): '''任务1:接收数据,处理数据''' while True: x=yield def producer(): '''任务2:生产数据''' g=consumer() next(g) for i in range(10000000): g.send(i) time.sleep(2) start=time.time() producer() #并发执行,但是任务producer遇到io就会阻塞住,并不会切到该线程内的其他任务去执行 stop=time.time() print(stop-start)
协程特点
协程的优缺点:
优点
- 无需线程上下文切换的开销
- 无需原子操作锁定及同步的开销(更改一个变量)
- 方便切换控制流,简化编程模型
- 高并发+高扩展性+低成本:一个CPU支持上万的协程都不是问题。所以很适合用于高并发处理。
缺点:
- 无法利用多核资源:协程的本质是个单线程,它不能多核,协程需要和进程配合才能运行在多CPU上,当然我们日常所编写的绝大部分应用都没有这个必要,除非是CPU密集型应用。
- 进行阻塞(Blocking)操作(如IO时)会阻塞掉整个程序
Greenlet
greenlet模块可以非常简单地实现多个任务的切换,但是检测不到io阻塞,需要手动添加
实现协程实例
def consumer(name):
print("--->starting eating baozi...")
while True:
new_baozi = yield # 直接返回
print("[%s] is eating baozi %s" % (name, new_baozi))
def producer():
r = con.__next__()
r = con2.__next__()
n = 0
while n < 5:
n += 1
con.send(n) # 唤醒生成器的同时传入一个参数
con2.send(n)
print("\033[32;1m[producer]\033[0m is making baozi %s" % n)
if __name__ == '__main__':
con = consumer("c1")
con2 = consumer("c2")
p = producer()
Greenlet
安装greenlet >>> pip3 install greenlet
from greenlet import greenlet def func1(): print(12) #遇到switch时切换,手动切换 gr2.switch() print(34) gr2.switch() def func2(): print(56) gr1.switch() print(78) #创建两个协程 gr1=greenlet(func1) gr2=greenlet(func2) gr1.switch()
单纯的切换(在没有io的情况下或者没有重复开辟内存空间的操作),反而会降低程序的执行速度
顺序执行 import time def f1(): res=1 for i in range(100000000): res+=i def f2(): res=1 for i in range(100000000): res*=i start=time.time() f1() f2() stop=time.time() print('run time is %s' %(stop-start)) #10.985628366470337 #切换 from greenlet import greenlet import time def f1(): res=1 for i in range(100000000): res+=i g2.switch() def f2(): res=1 for i in range(100000000): res*=i g1.switch() start=time.time() g1=greenlet(f1) g2=greenlet(f2) g1.switch() stop=time.time() print('run time is %s' %(stop-start)) # 52.763017892837524
Gevent
单线程里的这多个任务的代码通常会既有计算操作又有阻塞操作,我们完全可以在执行任务1时遇到阻塞,就利用阻塞的时间去执行任务2。。。。如此,才能提高效率,这就用到了Gevent模块
#用法 g1=gevent.spawn(func,1,,2,3,x=4,y=5)创建一个协程对象g1,spawn括号内第一个参数是函数名,如eat,后面可以有多个参数,可以是位置实参或关键字实参,都是传给函数eat的 g2=gevent.spawn(func2) g1.join() #等待g1结束 g2.join() #等待g2结束 #或者上述两步合作一步:gevent.joinall([g1,g2]) g1.value#拿到func1的返回值
遇到IO阻塞时会自动切换任务
import gevent def eat(name): print('%s eat 1' %name) gevent.sleep(2) print('%s eat 2' %name) def play(name): print('%s play 1' %name) gevent.sleep(1) print('%s play 2' %name) g1=gevent.spawn(eat,'egon') g2=gevent.spawn(play,name='egon') g1.join() g2.join() #或者gevent.joinall([g1,g2]) print('主')
gevent.sleep(2)模拟的是gevent可以识别的io阻塞,补丁必须放在开头位置
而time.sleep(2)或其他的阻塞,gevent是不能直接识别的需要用下面一行代码,打补丁,就可以识别了
我们可以用threading.current_thread().getName()来查看每个g1和g2,查看的结果为DummyThread-n,即假线程
from gevent import monkey;monkey.patch_all() import gevent import time def eat(): print('eat food 1') time.sleep(2) print('eat food 2') def play(): print('play 1') time.sleep(1) print('play 2') g1=gevent.spawn(eat) g2=gevent.spawn(play_phone) gevent.joinall([g1,g2]) print('主')
#协程:单线程下实现并发,用户从应用程序级别控制单线程下任务的切换,注意一定是遇到IO才切 # import gevent # #1.检测IO # #2.自动切换 # import time # def eat(name): # print('%s eat 1' %name) # gevent.sleep(2) # print('%s eat 2' %name) # def play(name): # print('%s play 1' %name) # gevent.sleep(1) # print('%s play 2' %name) # # start=time.time() # g1=gevent.spawn(eat,'alex') # g2=gevent.spawn(play,'egon') # # # g1.join() # # g2.join() # gevent.joinall([g1,g2]) # stop=time.time() # print(stop-start) # import gevent # import os # #1.检测IO # #2.自动切换 # import time # def eat(): # print('%s eat 1' %os.getpid()) # gevent.sleep(2) # print('%s eat 2' %os.getpid()) # def play(): # print('%s play 1' %os.getpid()) # gevent.sleep(1) # print('%s play 2' %os.getpid()) # # start=time.time() # g1=gevent.spawn(eat,) # g2=gevent.spawn(play,) # # # g1.join() # # g2.join() # gevent.joinall([g1,g2]) # stop=time.time() # print(stop-start) # import gevent # import os # from threading import current_thread # #1.检测IO # #2.自动切换 # import time # def eat(): # print('%s eat 1' %current_thread().getName()) # gevent.sleep(2) # print('%s eat 2' %current_thread().getName()) # def play(): # print('%s play 1' %current_thread().getName()) # gevent.sleep(1) # print('%s play 2' %current_thread().getName()) # # start=time.time() # g1=gevent.spawn(eat,) # g2=gevent.spawn(play,) # # # g1.join() # # g2.join() # gevent.joinall([g1,g2]) # stop=time.time() # print(stop-start) from gevent import monkey;monkey.patch_all() import gevent import os from threading import current_thread #1.检测IO #2.自动切换 import time def eat(): print('%s eat 1' %current_thread().getName()) time.sleep(2) print('%s eat 2' %current_thread().getName()) def play(): print('%s play 1' %current_thread().getName()) time.sleep(1) print('%s play 2' %current_thread().getName()) start=time.time() g1=gevent.spawn(eat,) g2=gevent.spawn(play,) # g1.join() # g2.join() gevent.joinall([g1,g2]) stop=time.time() print(stop-start)
例子
from urllib import request from gevent import monkey import gevent import time monkey.patch_all() # 当前程序中只要设置到IO操作的都做上标记 def wget(url): print('GET: %s' % url) resp = request.urlopen(url) data = resp.read() print('%d bytes received from %s.' % (len(data), url)) urls = [ 'https://www.python.org/', 'https://www.python.org/', 'https://github.com/', 'https://blog.ansheng.me/', ] # 串行抓取 start_time = time.time() for n in urls: wget(n) print("串行抓取使用时间:", time.time() - start_time) # 并行抓取 ctrip_time = time.time() gevent.joinall([ gevent.spawn(wget, 'https://www.python.org/'), gevent.spawn(wget, 'https://www.python.org/'), gevent.spawn(wget, 'https://github.com/'), gevent.spawn(wget, 'https://blog.ansheng.me/'), ]) print("并行抓取使用时间:", time.time() - ctrip_time) 输出 GET: https://www.python.org/ 47424 bytes received from https://www.python.org/. GET: https://www.python.org/ 47424 bytes received from https://www.python.org/. GET: https://github.com/ 25735 bytes received from https://github.com/. GET: https://blog.ansheng.me/ 82693 bytes received from https://blog.ansheng.me/. 串行抓取使用时间: 15.143015384674072 GET: https://www.python.org/ GET: https://www.python.org/ GET: https://github.com/ GET: https://blog.ansheng.me/ 25736 bytes received from https://github.com/. 47424 bytes received from https://www.python.org/. 82693 bytes received from https://blog.ansheng.me/. 47424 bytes received from https://www.python.org/. 并行抓取使用时间: 3.781306266784668
from gevent import monkey;monkey.patch_all() import gevent from multiprocessing import Process from socket import * def server(ip,port): s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s.bind((ip,port)) s.listen(5) while True: conn,addr=s.accept() print('%s:%s' % (addr[0], addr[1])) g1=gevent.spawn(talk,conn,addr) def talk(conn,addr): while True: try: data=conn.recv(1024) print('%s:%s [%s]' %(addr[0],addr[1],data)) if not data:break conn.send(data.upper()) except ConnectionResetError: break conn.close() if __name__ == '__main__': server('127.0.0.1',8091)
# from socket import * # c=socket(AF_INET,SOCK_STREAM) # c.connect(('127.0.0.1',8090)) # # while True: # msg=input('>>: ').strip() # if not msg:continue # c.send(msg.encode('utf-8')) # data=c.recv(1024) # print(data.decode('utf-8')) from threading import Thread from socket import * def client(): c=socket(AF_INET,SOCK_STREAM) c.connect(('127.0.0.1',8091)) while True: c.send('hello'.encode('utf-8')) data=c.recv(1024) print(data.decode('utf-8')) if __name__ == '__main__': for i in range(500): t=Thread(target=client) t.start()