一、Python标准模块——concurrent.futures
官方文档:https://docs.python.org/dev/library/concurrent.futures.html
二、介绍
concurrent.futures模块提供了高度封装的异步调用接口
ThreadPoolExecutor:线程池,提供异步调用
ProcessPoolExecutor:进程池,提供异步调用
Both implement the same interface, which is defined by the abstract Executor class.
三、基本方法
submit(fn, *args, **kwargs)
:异步提交任务
map(func, *iterables, timeout=None, chunksize=1)
:取代for循环submit的操作
shutdown(wait=True)
:相当于进程池的pool.close()+pool.join()
操作
- wait=True,等待池内所有任务执行完毕回收完资源后才继续
- wait=False,立即返回,并不会等待池内的任务执行完毕
- 但不管wait参数为何值,整个程序都会等到所有任务执行完毕
- submit和map必须在shutdown之前
result(timeout=None)
:取得结果
add_done_callback(fn)
:回调函数
done()
:判断某一个线程是否完成
cancle()
:取消某个任务
四、ProcessPoolExecutor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
|
五、ThreadPoolExecutor
六、map的用法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
import os,time,random def task(n): print('%s is runing' %os.getpid()) time.sleep(random.randint(1,3)) return n**2
if __name__ == '__main__':
executor=ThreadPoolExecutor(max_workers=3)
|
七、回调函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
|
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor from multiprocessing import Pool import requests import json import os
def get_page(url): print('<进程%s> get %s' %(os.getpid(),url)) respone=requests.get(url) if respone.status_code == 200: return {'url':url,'text':respone.text}
def parse_page(res): res=res.result() print('<进程%s> parse %s' %(os.getpid(),res['url'])) parse_res='url:<%s> size:[%s]\n' %(res['url'],len(res['text'])) with open('db.txt','a') as f: f.write(parse_res)
if __name__ == '__main__': urls=[ 'https://www.baidu.com', 'https://www.python.org', 'https://www.openstack.org', 'https://help.github.com/', 'http://www.sina.com.cn/' ]
|
八 多线程爬取网页
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
import requests
def get_page(url): res=requests.get(url) name=url.rsplit('/')[-1]+'.html' return {'name':name,'text':res.content}
def call_back(fut): print(fut.result()['name']) with open(fut.result()['name'],'wb') as f: f.write(fut.result()['text'])
if __name__ == '__main__': pool=ThreadPoolExecutor(2) urls=['http://www.baidu.com','http://www.cnblogs.com','http://www.taobao.com'] for url in urls: pool.submit(get_page,url).add_done_callback(call_back)
|
九 定时器