Python 线程池-进程池
原始实现线程池
import os, queue, threading, requests from datetime import datetime from time import sleep class DownloadThread(threading.Thread): def __init__(self, queue): super(DownloadThread, self).__init__() self.queue = queue def run(self): while True: url = self.queue.get() print(self.name+':'+"begin download"+url+"...") self.download(url)
self.queue.task_done() def download(self, url): response = requests.get(url) print(len(response.content)) if __name__ == '__main__': start_time = datetime.today() urls = [ "http://www.baidu.com", "http://www.sina.com", "http://www.163.com", "http://www.qq.com", "http://www.sohu.com" ] Q = queue.Queue() for i in range(2): t = DownloadThread(Q) # t.setDaemon(True) t.start() for url in urls: Q.put(url) Q.join()
concurrent.futures模块
是在Python3.2中添加的。根据Python的官方文档,concurrent.futures模块提供给开发者一个执行异步调用的高级接口。concurrent.futures基本上就是在Python的threading和multiprocessing模块之上构建的抽象层,更易于使用。尽管这个抽象层简化了这些模块的使用,但是也降低了很多灵活性,所以如果你需要处理一些定制化的任务,concurrent.futures或许并不适合你。
concurrent.futures包括抽象类Executor,它并不能直接被使用,所以你需要使用它的两个子类:ThreadPoolExecutor或者ProcessPoolExecutor。正如你所猜的,这两个子类分别对应着Python的threading和multiprocessing接口。这两个子类都提供了池,你可以将线程或者进程放入其中。
Executor is an abstract class that provides methods to execute calls asynchronously. It should not be used directly, but through its two subclasses: ThreadPoolExecutor and ProcessPoolExecutor.
Executor:还有它的两个子类ThreadPoolExecutor和ProcessPoolExecutor
Future:有Executor.submit产生多任务
ThreadPoolExecutor
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
print(list(executor.map(sleeper, x)))
官方的介绍直白,用submit注册你的函数,以及要传递的相关的参数。
Executor.submit(fn, *args, **kwargs)
map的用法更加简洁,用法类似map函数
Executor.map(fn, *iterables)
线程池来实现爬虫
import requests from concurrent.futures import ThreadPoolExecutor import time def downloadurl(url): time.sleep(1) response = requests.get(url) print(response.status_code) urls=[ "http://www.baidu.com", "http://www.sina.com", "http://www.163.com", "http://www.qq.com", "http://www.sohu.com" ] if __name__ == '__main__': t = ThreadPoolExecutor(max_workers=2) for url in urls: t.submit(downloadurl, url)
我们可以使用concurrent.futures中的map方法,让代码更加简洁:
import requests from concurrent.futures import ThreadPoolExecutor import time def downloadurl(url): time.sleep(1) response = requests.get(url) print(response.status_code) urls=[ "http://www.baidu.com", "http://www.sina.com", "http://www.163.com", "http://www.qq.com", "http://www.sohu.com" ] if __name__ == '__main__': t = ThreadPoolExecutor(max_workers=2) t.map(downloadurl, urls)
进程池用法一样,只需要将 ThreadPoolExecutor 替换为 ProcessPoolExecutor 即可。