Python自动化开发课堂笔记【Day10】 - Python进阶(线程)
线程
定义:一条流水线的执行过程是一个线程,一条流水线必须属于一个车间,一个车间的运行过程就是一个进程(一个进程内至少有一个线程)
进程是资源单位,而线程才是CPU上的执行单位,线程创建的开销远远小于进程
多线程:一个车间内有多条流水线,多个流水线共享该车间的资源(多线程共享一个进程的资源)
为何要创建多线程:
1. 资源共享
2. 创建开销小
开启线程的两种方式:
方式一: from threading import Thread def work(name): print('%s say hello' % name) if __name__ == '__main__': t = Thread(target=work,args=('Albert',)) t.start() print('main thread') 方式二: from threading import Thread class MyThread(Thread): def __init__(self,name): super().__init__() self.name = name def run(self): print('%s say hello' % self.name) if __name__ == '__main__': t = MyThread('Albert') t.start() print('main thread')
P.S. 主进程和主线程公用同一个PID, 验证:
from threading import Thread import os def work(): print('%s say hello' % os.getpid()) if __name__ == '__main__': t = Thread(target=work,) t.start() print('main thread:%s' % os.getpid())
多线程练习:
练习一: 服务端: import socket import threading server = socket.socket(socket.AF_INET,socket.SOCK_STREAM) server.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) server.bind(('127.0.0.1',8080)) server.listen(5) def action(conn): while True: try: data = conn.recv(1024) if not data: break print(data) conn.send(data.upper()) except Exception: break if __name__ == '__main__': while True: conn,addr = server.accept() p = threading.Thread(target=action,args=(conn,)) p.start() 客户端: import socket client = socket.socket(socket.AF_INET,socket.SOCK_STREAM) client.connect(('127.0.0.1',8080)) while True: msg = input('>>>:').strip() if not msg: continue client.send(msg.encode()) back_msg = client.recv(1024) print(back_msg.decode()) 练习二: from threading import Thread data_l = [] format_data_l = [] def inp(): while True: data = input('>>>:').strip() if not data:continue data_l.append(data) def format(): while True: if data_l: data = data_l.pop() format_data = data.upper() format_data_l.append(format_data) def write(): while True: if format_data_l: data = format_data_l.pop() with open('c.txt','a',encoding='utf-8') as f: f.write(data + '\n') if __name__ == '__main__': t1 = Thread(target=inp) t2 = Thread(target=format) t3 = Thread(target=write) t1.start() t2.start() t3.start()
线程一些其他属性
import threading from threading import Thread import os def work(): print('%s say hello' % threading.current_thread().getName()) if __name__ == '__main__': t = Thread(target=work,) t.setDaemon(True) #线程的守护进程 t.start() t.join() print(threading.enumerate()) #以列表形式显示当前活跃线程 print(threading.active_count()) #活跃线程数量统计 print('main thread:%s' % threading.current_thread().getName()) #获取当前线程名称
GIL锁
由于python GIL的存在,在Cpython解释器中,同一个进程下开启的多线程,同一时刻只能有一个线程执行,无法利用多核优势。
GIL并不是Python的特性,它是在实现Python解析器(CPython)时所引入的一个概念
有了GIL的存在,同一时刻统一进程中只有一个线程被执行
结论:
对计算来说,cpu越多越好,但是对于I/O来说,再多的cpu也没用
当然对于一个程序来说,不会是纯计算或者纯I/O,我们只能相对的去看一个程序到底是计算密集型还是I/O密集型,从而进一步分析python的多线程有无用武之地
现在的计算机基本上都是多核,python对于计算密集型的任务开多线程的效率并不能带来多大性能上的提升,甚至不如串行(没有大量切换),但是,对于IO密集型的任务效率还是有显著提升的。
应用:
多线程用于IO密集型,如socket,爬虫,web
多进程用于计算密集型,如金融分析
注意:
GIL 与Lock是两把锁,保护的数据不一样,前者是解释器级别的(当然保护的就是解释器级别的数据,比如垃圾回收的数据),
后者是保护用户自己开发的应用程序的数据,很明显GIL不负责这件事,只能用户自定义加锁处理,即Lock
示例: I/O密集型 from threading import Thread from multiprocessing import Process import time import os def work(): time.sleep(1) print(os.getpid()) if __name__ == '__main__': tp_l = [] start_time = time.time() for i in range(100): tp = Thread(target=work) #run_time is 1.0190582275390625 # tp = Process(target=work) #run_time is 10.807618141174316 tp_l.append(tp) tp.start() for tp in tp_l: tp.join() stop_time = time.time() print('run_time is %s' % (stop_time - start_time)) 计算密集型 from threading import Thread from multiprocessing import Process import os import time def work(): res = 0 for i in range(100000): res+=i if __name__ == '__main__': tp_l = [] start_time = time.time() for i in range(300): # tp = Thread(target=work) # run_time is 4.402251720428467 tp = Process(target=work) # run_time is 28.153610229492188 tp_l.append(tp) tp.start() for tp in tp_l: tp.join() stop_time = time.time() print('run_time is %s' % (stop_time - start_time))
互斥锁
from threading import Thread, Lock import time n = 100 def work(): with mutex: global n temp = n time.sleep(0.1) n = temp - 1 if __name__ == '__main__': mutex = Lock() t_l = [] for i in range(100): t = Thread(target=work) t_l.append(t) t.start() for i in t_l: i.join() print(n)
死锁与递归锁
所谓死锁: 是指两个或两个以上的进程或线程在执行过程中,因争夺资源而造成的一种互相等待的现象,若无外力作用,它们都将无法推进下去。
此时称系统处于死锁状态或系统产生了死锁,这些永远在互相等待的进程称为死锁进程,如下就是死锁:
死锁示例: from threading import Thread,Lock import time class MyThread(Thread): def run(self): self.f1() self.f2() def f1(self): mutexA.acquire() print('\033[40m%s get LockA\033[0m' % self.name) mutexB.acquire() print('\033[41m%s get LockB\033[0m' % self.name) mutexB.release() mutexA.release() def f2(self): mutexB.acquire() time.sleep(1) print('\033[41m%s get LockB\033[0m' % self.name) mutexA.acquire() print('\033[40m%s get LockA\033[0m' % self.name) mutexA.release() mutexB.release() if __name__ == '__main__': mutexA = Lock() mutexB = Lock() for i in range(20): t = MyThread() t.start()
如何解决死锁问题:
递归锁,在Python中为了支持在同一线程中多次请求同一资源,python提供了可重入锁RLock。
这个RLock内部维护着一个Lock和一个counter变量,counter记录了acquire的次数,从而使得资源可以被多次require。
直到一个线程所有的acquire都被release,其他的线程才能获得资源。上面的例子如果使用RLock代替Lock,则不会发生死锁:
from threading import Thread,RLock import time class MyThread(Thread): def run(self): self.f1() self.f2() def f1(self): mutexA.acquire() print('\033[40m%s get LockA\033[0m' % self.name) mutexB.acquire() print('\033[41m%s get LockB\033[0m' % self.name) mutexB.release() mutexA.release() def f2(self): mutexB.acquire() time.sleep(1) print('\033[41m%s get LockB\033[0m' % self.name) mutexA.acquire() print('\033[40m%s get LockA\033[0m' % self.name) mutexA.release() mutexB.release() if __name__ == '__main__': # mutexA = Lock() # mutexB = Lock() # 同时引用为一把锁,不要误认为是两把锁 mutexA = mutexB = RLock() #一个线程拿到锁,counter加1,该线程内又碰到加锁的情况,则counter继续加1,这期间所有其他线程都只能等待,等待该线程释放所有锁,即counter递减到0为止 for i in range(20): t = MyThread() t.start()
信号量(Semaphore)
同进程的一样
Semaphore管理一个内置的计数器,
每当调用acquire()时内置计数器-1;
调用release() 时内置计数器+1;
计数器不能小于0;当计数器为0时,acquire()将阻塞线程直到其他线程调用release()。
from threading import Thread,Semaphore import time def work(id): with sem: time.sleep(2) print('%s say hello' %id) if __name__ == '__main__': sem = Semaphore(5) for i in range(20): t = Thread(target=work,args=(i,)) t.start()
事件(Event)
event.isSet():返回event的状态值; event.wait():如果 event.isSet()==False将阻塞线程; event.set(): 设置event的状态值为True,所有阻塞池的线程激活进入就绪状态, 等待操作系统调度; event.clear():恢复event的状态值为False。
from threading import Event ,Thread import threading import time def conn_mysql(): print('%s waiting...' % threading.current_thread().getName()) print(e.isSet()) #False e.wait() print('%s start to connect mysql...' % threading.current_thread().getName()) print(e.isSet()) #True time.sleep(2) def check_mysql(): print('%s is checking...' % threading.current_thread().getName()) time.sleep(3) print(e.isSet()) #False e.set() print(e.isSet()) #True if __name__ == '__main__': e = Event() t1 = Thread(target=conn_mysql) t2 = Thread(target=conn_mysql) t3 = Thread(target=conn_mysql) t4 = Thread(target=check_mysql) t1.start() t2.start() t3.start() t4.start()
定时器
指定n秒后执行某操作 from threading import Timer def hello(): print('hello, world') t = Timer(3,hello) t.start()
线程queue
import queue q = queue.Queue() #先进先出--->队列 q.put('first') q.put('second') q.put((1,2,3,4)) print(q.get()) print(q.get()) print(q.get()) q = queue.LifoQueue() #后进先出--->堆栈 q.put('first') q.put('second') q.put((1,2,3,4)) print(q.get()) print(q.get()) print(q.get()) q = queue.PriorityQueue() #优先级queue,数字越小,优先级越高 q.put((1,'a')) q.put((4,'b')) q.put((3,'c')) print(q.get()) print(q.get()) print(q.get())
协程
定义:单线程下的并发,又称微线程。协程是一种用户态的轻量级线程,即协程是由用户程序自己控制调度的。
要实现协程,关键在于用户程序自己控制程序切换,切换之前必须由用户程序自己保存协程上一次调用时的状态,如此,每次重新调用时,能够从上次的位置继续执行
(详细的:协程拥有自己的寄存器上下文和栈。协程调度切换时,将寄存器上下文和栈保存到其他地方,在切回来的时候,恢复先前保存的寄存器上下文和栈)
协程的定义(满足1,2,3就可称为协程):
1.必须在只有一个单线程里实现并发
2.修改共享数据不需加锁
3.用户程序里自己保存多个控制流的上下文栈
4.附加:一个协程遇到IO操作自动切换到其它协程(如何实现检测IO,yield、greenlet都无法实现,就用到了gevent模块(select机制))
需要强调的是:
1. python的线程属于内核级别的,即由操作系统控制调度(如单线程一旦遇到io就被迫交出cpu执行权限,切换其他线程运行)
2. 单线程内开启协程,一旦遇到io,从应用程序级别(而非操作系统)控制切换
对比操作系统控制线程的切换,用户在单线程内控制协程的切换,优点如下:
1. 协程的切换开销更小,属于程序级别的切换,操作系统完全感知不到,因而更加轻量级
2. 单线程内就可以实现并发的效果,最大限度地利用cpu
yield:
1. yiled可以保存状态,yield的状态保存与操作系统的保存线程状态很像,但是yield是代码级别控制的,更轻量级
2. send可以把一个函数的结果传给另外一个函数,以此实现单线程内程序之间的切换
缺点:
协程的本质是单线程下,无法利用多核,可以是一个程序开启多个进程,每个进程内开启多个线程,每个线程内开启协程
协程指的是单个线程,因而一旦协程出现阻塞,将会阻塞整个线程
无yield方式: from threading import Thread import time def consumer(item): print(item) x = 1 y = 2 z = 3 def producer(target,seq): for item in seq: target(item) s_time = time.time() producer(consumer,range(500000)) e_time = time.time() print('run time %s' % (e_time - s_time)) #4.764272451400757 yield方式: from threading import Thread import time def consumer(): x = 1 y = 2 z = 3 while True: item = yield def producer(target,seq): for item in seq: target.send(item) g=consumer() next(g) s_time = time.time() producer(g,range(500000)) e_time = time.time() print('run time %s' % (e_time - s_time)) #run time 0.12200713157653809
Greenlet模块
greenlet是一个用C实现的协程模块,相比与python自带的yield,它可以使你在任意函数之间随意切换,而不需把这个函数先声明为generator
from greenlet import greenlet def test1(): print('test1,1') gr2.switch() print('test1,2') gr2.switch() def test2(): print('test2,1') gr1.switch() print('test2,2') gr1 = greenlet(test1) gr2 = greenlet(test2) gr1.switch()
Gevent模块
实现单线程下遇到I/O自动切换 from gevent import monkey monkey.patch_all() import gevent import time def eat(name): print('%s eat food first' % name) # gevent.sleep(2) time.sleep(2) print('%s eat food second' % name) def play(name): print('%s play phone 1' % name) # gevent.sleep(1) time.sleep(1) print('%s play phone 2' % name) def drink(name): print('%s is drinking' % name) # gevent.sleep(4) time.sleep(4) print('%s is drinking' % name) g1 = gevent.spawn(eat,'Albert') g2 = gevent.spawn(play,'Albert') g3 = gevent.spawn(drink,'Albert') g1.join() g2.join() g3.join() print('main thread') 协程实现并发爬取网页 from gevent import monkey monkey.patch_all() import gevent import requests import time def get_page(url): print('GET Page: %s' % url) res = requests.get(url) if res.status_code == 200: print(res.text) s_time = time.time() gevent.joinall([gevent.spawn(get_page,'https://www.python.org/'), gevent.spawn(get_page,'https://github.com/')]) e_time = time.time() print('run time %s' % (e_time - s_time))
单线程实现并发的socket
from gevent import monkey monkey.patch_all() from socket import * import gevent def server(ip,port): s = socket(AF_INET,SOCK_STREAM) s.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) s.bind((ip,port)) s.listen(5) while True: conn,addr = s.accept() gevent.spawn(talk,conn,addr) def talk(conn,addr): try: while True: res = conn.recv(1024) print('client %s : %s msg: %s' % (addr[0],addr[1],res)) conn.send(res.upper()) except Exception as e: print(e) finally: conn.close() if __name__ == '__main__': server('127.0.0.1',8080)
from threading import Thread from socket import * import threading def client(ip,port): c = socket(AF_INET,SOCK_STREAM) c.connect((ip,port)) count = 0 while True: c.send(('%s say hello %s' % (threading.current_thread().getName(),count)).encode()) msg = c.recv(1024) print(msg.decode()) count += 1 if __name__ == '__main__': for i in range(100): t = Thread(target=client,args=('127.0.0.1',8080)) t.start()
socketserver
import socketserver class MyHandler(socketserver.BaseRequestHandler): def handle(self): while True: res = self.request.recv(1024) print('client %s msg: %s' % (self.client_address,res)) self.request.send(res.upper()) if __name__ == '__main__': s = socketserver.ThreadingTCPServer(('127.0.0.1',8080),MyHandler) s.serve_forever()
import socket client = socket.socket(socket.AF_INET,socket.SOCK_STREAM) client.connect(('127.0.0.1',8080)) while True: msg = input('>>>:').strip() if not msg: continue client.send(msg.encode()) back_msg = client.recv(1024) print(back_msg.decode())
基于UDP的socket
# 非并发效果 # from socket import * # # s = socket(AF_INET,SOCK_DGRAM) # s.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) # s.bind(('127.0.0.1',8080)) # # while True: # msg,addr = s.recvfrom(1024) # print(msg) # s.sendto(msg.upper(),addr) # 基于socketserver的并发效果 import socketserver class MyUDPhandler(socketserver.BaseRequestHandler): def handle(self): client_msg,s = self.request s.sendto(client_msg.upper(),self.client_address) if __name__ == '__main__': s = socketserver.ThreadingUDPServer(('127.0.0.1',8080),MyUDPhandler) s.serve_forever()
from socket import * c = socket(AF_INET,SOCK_DGRAM) while True: msg = input('>>>:').strip() c.sendto(msg.encode(),('127.0.0.1',8080)) back_msg,addr= c.recvfrom(1024) print('from server %s:%s' % (addr,back_msg.decode()))