python threading ThreadPoolExecutor
线程池,为什么要使用线程池:
1. 线程中可以获取某一个线程的状态或者某一个任务的状态,以及返回值
2. 当一个线程完成的时候我们主线程能立即知道
3. futures可以让多线程和多进程编码接口一致
获取状态或关闭任务
import time def get_html(times): time.sleep(times) print("get page {} success".format(times)) return times executor = ThreadPoolExecutor(max_workers=2)# 最大执行线程数 #通过submit函数提交执行的函数到线程池中, submit 是立即返回,不会造成主线程阻塞 task1 = executor.submit(get_html, (3)) task2 = executor.submit(get_html, (2)) print(task1.done()) # 判断任务是否完成 print(task2.cancel()) #取消任务,只能在任务没有开始的时候进行cancel
获取成功的task返回
方法一:
import time from concurrent.futures import ThreadPoolExecutor, as_completed def get_html(times): time.sleep(times) print("get page {} success".format(times)) return times executor = ThreadPoolExecutor(max_workers=2) urls = [3, 2, 4] all_task = [executor.submit(get_html, (url)) for url in urls] for future in as_completed(all_task): # 返回已经成功的任务的返回值,不会按照线程执行的顺序返回 data = future.result() print("get {} page".format(data))
方法二:
import time from concurrent.futures import ThreadPoolExecutor, as_completed def get_html(times): time.sleep(times) print("get page {} success".format(times)) return times executor = ThreadPoolExecutor(max_workers=2) urls = [3, 2, 4] #通过executor的map获取已经完成的task的值,这个会按照执行线程的顺序执行,会按照执行线程顺序返回 for data in executor.map(get_html, urls): print("get {} page".format(data))
wait,wait第几个任务执行完了以后才执行主线程,不然的话主线程会一直阻塞
from concurrent.futures import ThreadPoolExecutor, wait import time def get_html(times): time.sleep(times) print("get page {} success".format(times)) return times executor = ThreadPoolExecutor(max_workers=2) urls = [3, 2, 4] all_task = [executor.submit(get_html, (url)) for url in urls] wait(all_task, return_when=FIRST_COMPLETED) print("main")