Python 线程池-进程池

原始实现线程池

import os, queue, threading, requests
from datetime import datetime
from time import sleep

class DownloadThread(threading.Thread):
    def __init__(self, queue):
        super(DownloadThread, self).__init__()
        self.queue = queue

    def run(self):
        while True:
            url = self.queue.get()
            print(self.name+':'+"begin download"+url+"...")
            self.download(url)
       self.queue.task_done()
def download(self, url): response = requests.get(url) print(len(response.content)) if __name__ == '__main__': start_time = datetime.today() urls = [ "http://www.baidu.com", "http://www.sina.com", "http://www.163.com", "http://www.qq.com", "http://www.sohu.com" ] Q = queue.Queue() for i in range(2): t = DownloadThread(Q) # t.setDaemon(True) t.start() for url in urls: Q.put(url) Q.join()

 

concurrent.futures模块

是在Python3.2中添加的。根据Python的官方文档,concurrent.futures模块提供给开发者一个执行异步调用的高级接口。concurrent.futures基本上就是在Python的threading和multiprocessing模块之上构建的抽象层,更易于使用。尽管这个抽象层简化了这些模块的使用,但是也降低了很多灵活性,所以如果你需要处理一些定制化的任务,concurrent.futures或许并不适合你。

concurrent.futures包括抽象类Executor,它并不能直接被使用,所以你需要使用它的两个子类:ThreadPoolExecutor或者ProcessPoolExecutor。正如你所猜的,这两个子类分别对应着Python的threading和multiprocessing接口。这两个子类都提供了池,你可以将线程或者进程放入其中。

Executor is an abstract class that provides methods to execute calls asynchronously. It should not be used directly, but through its two subclasses: ThreadPoolExecutor and ProcessPoolExecutor.

Executor:还有它的两个子类ThreadPoolExecutor和ProcessPoolExecutor
Future:有Executor.submit产生多任务

 

ThreadPoolExecutor

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    print(list(executor.map(sleeper, x)))

 

官方的介绍直白,用submit注册你的函数,以及要传递的相关的参数。 
Executor.submit(fn, *args, **kwargs)

map的用法更加简洁,用法类似map函数

Executor.map(fn, *iterables)

 

线程池来实现爬虫

import requests
from concurrent.futures import ThreadPoolExecutor
import time


def downloadurl(url):
    time.sleep(1)
    response = requests.get(url)
    print(response.status_code)


urls=[
    "http://www.baidu.com",
    "http://www.sina.com",
    "http://www.163.com",
    "http://www.qq.com",
    "http://www.sohu.com"
]

if __name__ == '__main__':
    t = ThreadPoolExecutor(max_workers=2)
    for url in urls:
        t.submit(downloadurl, url)

 

我们可以使用concurrent.futures中的map方法,让代码更加简洁:

import requests
from concurrent.futures import ThreadPoolExecutor
import time


def downloadurl(url):
    time.sleep(1)
    response = requests.get(url)
    print(response.status_code)


urls=[
    "http://www.baidu.com",
    "http://www.sina.com",
    "http://www.163.com",
    "http://www.qq.com",
    "http://www.sohu.com"
]

if __name__ == '__main__':
    t = ThreadPoolExecutor(max_workers=2)
    t.map(downloadurl, urls)

 

进程池用法一样,只需要将 ThreadPoolExecutor 替换为 ProcessPoolExecutor 即可。

posted @ 2017-03-18 20:40  Vincen_shen  阅读(592)  评论(0)    收藏  举报