Python 之线程池

Python 之线程池

系统启动一个新线程的成本是比较高的，因为它涉及与操作系统的交互。在这种情形下，使用线程池可以很好地提升性能，尤其是当程序中需要创建大量生存期很短暂的线程时，更应该考虑使用线程池。

线程池在系统启动时即创建大量空闲的线程，程序只要将一个函数提交给线程池，线程池就会启动一个空闲的线程来执行它。当该函数执行结束后，该线程并不会死亡，而是再次返回到线程池中变成空闲状态，等待执行下一个函数。

此外，使用线程池可以有效地控制系统中并发线程的数量。当系统中包含有大量的并发线程时，会导致系统性能急剧下降，甚至导致 Python 解释器崩溃，而线程池的最大线程数参数可以控制系统中并发线程的数量不超过此数。

一、线程池的介绍

线程池的基类是 concurrent.futures 模块中的 Executor，Executor 提供了两个子类，即 ThreadPoolExecutor 和 ProcessPoolExecutor，其中 ThreadPoolExecutor 用于创建线程池，而 ProcessPoolExecutor 用于创建进程池。

如果使用线程池/进程池来管理并发编程，那么只要将相应的 task 函数提交给线程池/进程池，剩下的事情就由线程池/进程池来搞定。

Exectuor 提供了如下常用方法：

submit(fn, args, **kwargs)：将 fn 函数提交给线程池。args 代表传给 fn 函数的参数，*kwargs 代表以关键字参数的形式为 fn 函数传入参数。
map(func, *iterables, timeout=None, chunksize=1)：该函数类似于全局函数 map(func, *iterables)，只是该函数将会启动多个线程，以异步方式立即对 iterables 执行 map 处理。
shutdown(wait=True)：关闭线程池。

程序将 task 函数提交（submit）给线程池后，submit 方法会返回一个 Future 对象，Future 类主要用于获取线程任务函数的返回值。由于线程任务会在新线程中以异步方式执行，因此，线程执行的函数相当于一个“将来完成”的任务，所以 Python 使用 Future 来代表。

Future 提供了如下方法：

cancel()：取消该 Future 代表的线程任务。如果该任务正在执行，不可取消，则该方法返回 False；否则，程序会取消该任务，并返回 True。
cancelled()：返回 Future 代表的线程任务是否被成功取消。
running()：如果该 Future 代表的线程任务正在执行、不可被取消，该方法返回 True。
done()：如果该 Funture 代表的线程任务被成功取消或执行完成，则该方法返回 True。
result(timeout=None)：获取该 Future 代表的线程任务最后返回的结果。如果 Future 代表的线程任务还未完成，该方法将会阻塞当前线程，其中 timeout 参数指定最多阻塞多少秒。
exception(timeout=None)：获取该 Future 代表的线程任务所引发的异常。如果该任务成功完成，没有异常，则该方法返回 None。
add_done_callback(fn)：为该 Future 代表的线程任务注册一个“回调函数”，当该任务成功完成时，程序会自动触发该 fn 函数。

在用完一个线程池后，应该调用该线程池的 shutdown() 方法，该方法将启动线程池的关闭序列。调用 shutdown() 方法后的线程池不再接收新任务，但会将以前所有的已提交任务执行完成。当线程池中的所有任务都执行完成后，该线程池中的所有线程都会死亡。

使用线程池来执行线程任务的步骤如下：

调用 ThreadPoolExecutor 类的构造器创建一个线程池。
定义一个普通函数作为线程任务。
调用 ThreadPoolExecutor 对象的 submit() 方法来提交线程任务。
当不想提交任何任务时，调用 ThreadPoolExecutor 对象的 shutdown() 方法来关闭线程池。

1. 使用 submit 方法提交任务

从Python3.2开始，标准库为我们提供了concurrent.futures 模块，它提供了 ThreadPoolExecutor (线程池)和 ProcessPoolExecutor (进程池)两个类。

相比 threading 等模块，该模块通过 submit 返回的是一个 future 对象，它是一个未来可期的对象，通过它可以获取某一个线程执行的状态或者某一个任务执行的状态及返回值：

主线程可以获取某一个线程（或者任务的）的状态，以及返回值。
当一个线程完成的时候，主线程能够立即知道。
submit(fn, args, **kwargs)：将 fn 函数提交给线程池。args 代表传给 fn 函数的参数，*kwargs 代表以关键字参数的形式为 fn 函数传入参数。

import threading
import time
from concurrent.futures import ThreadPoolExecutor


def test(value1, value2=None):
    print("%s threading is printed %s, %s" % (threading.current_thread().name, value1, value2))
    time.sleep(2)
    return 'finished'


def test_result(future):
    print(future.result())


if __name__ == "__main__":
    threadPool = ThreadPoolExecutor(max_workers=3, thread_name_prefix="test_")
    for i in range(0, 3):
        future = threadPool.submit(test, i, i + 1)
        print(future.result())

    threadPool.shutdown(wait=True)

结果如下：

结果：

test__0 threading is printed 0, 1
finished
test__0 threading is printed 1, 2
finished
test__0 threading is printed 2, 3
finished

1.2 使用 map 方法提交任务

map 方法是对序列中每一个元素都执行 action 方法，主要有两个特点：

不需要将任务submit到线程池
返回结果的顺序和元素的顺序相同，即使子线程先返回也不会获取结果

map(fn, *iterables, timeout=None)

fn：第一个参数 fn 是需要线程执行的函数；
iterables：第二个参数接受一个可迭代对象；
timeout：第三个参数 timeout 跟 wait() 的 timeout 一样，但由于 map 是返回线程执行的结果，如果 timeout小于线程执行时间会抛异常 TimeoutError。

import threading
import time
from concurrent.futures import ThreadPoolExecutor


def test(value1, value2=None):
    print("%s threading is printed %s, %s" % (threading.current_thread().name, value1, value2))
    time.sleep(2)
    return threading.current_thread().name + '  finished'


if __name__ == "__main__":
    threadPool = ThreadPoolExecutor(max_workers=4, thread_name_prefix="test_")
    for i in range(0, 4):
        for result in threadPool.map(test, [i], [i + 1]):
            print(result)
    threadPool.shutdown(wait=True)

# 结果
test__0 threading is printed 0, 1
test__0  finished
test__0 threading is printed 1, 2
test__0  finished
test__0 threading is printed 2, 3
test__0  finished
test__0 threading is printed 3, 4
test__0  finished

1.3 使用上下文管理器

可以通过 with 关键字来管理线程池，当线程池任务完成之后自动关闭线程池。

import time
from concurrent.futures import ThreadPoolExecutor


def action(second):
    print(second)
    time.sleep(second)
    return second


lists = [4, 5, 2, 3]
all_task = []
with ThreadPoolExecutor(max_workers=2) as pool:
    for second in lists:
        all_task.append(pool.submit(action, second))

    result = [i.result() for i in all_task]
    print(f"result:{result}")

4
5
2
3
result:[4, 5, 2, 3]

1.4 等待

在需要返回值的场景下，主线程需要等到所有子线程返回再进行下一步，阻塞在当前。比如下载图片统一保存，这时就需要在主线程中一直等待，使用wait方法完成。

wait(fs, timeout=None, return_when=ALL_COMPLETED)

wait 接受三个参数：
fs: 表示需要执行的序列
timeout: 等待的最大时间，如果超过这个时间即使线程未执行完成也将返回
return_when：表示wait返回结果的条件，默认为 ALL_COMPLETED 全部执行完成再返回，可选 FIRST_COMPLETED

import time
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED


def action(second):
    print(second)
    time.sleep(second)
    return second


lists = [4, 5, 2, 3]
all_task = []
with ThreadPoolExecutor(max_workers=2) as pool:
    for second in lists:
        all_task.append(pool.submit(action, second))

    # 主线程等待所有子线程完成
    wait(all_task, return_when=ALL_COMPLETED)
    print("----complete-----")

二、获取执行结果(future)

2.1 阻塞线程获取每一个结果

import threading
import time
from concurrent.futures import ThreadPoolExecutor


def test(value1, value2=None):
    print("%s threading is printed %s, %s" % (threading.current_thread().name, value1, value2))
    time.sleep(2)
    return 'finished'


def test_result(future):
    print(future.result())


if __name__ == "__main__":
    threadPool = ThreadPoolExecutor(max_workers=2, thread_name_prefix="test_")

    for i in range(0, 2):
        future = threadPool.submit(test, i, i + 1)
        print(future.result())
    threadPool.shutdown(wait=True)
    print('main finished')

# 结果为：
test__0 threading is printed 0, 1
finished
test__0 threading is printed 1, 2
finished
main finished

2.2 add_done_callback() 回调函数来获取返回值

前面程序调用了 Future 的 result() 方法来获取线程任务的运回值，但该方法会阻塞当前主线程，只有等到钱程任务完成后，result() 方法的阻塞才会被解除。

如果程序不希望直接调用 result() 方法阻塞线程，则可通过 Future 的 add_done_callback() 方法来添加回调函数，该回调函数形如 fn(future)。当线程任务完成后，程序会自动触发该回调函数，并将对应的 Future 对象作为参数传给该回调函数。
直接调用result函数结果

import threading
import time
from concurrent.futures import ThreadPoolExecutor


def test(value1, value2=None):
    print("%s threading is printed %s, %s" % (threading.current_thread().name, value1, value2))
    time.sleep(2)
    return threading.current_thread().name + ' finished'


future_list = []


def test_result(future):
    # print(future.result())
    future_list.append(future.result())


if __name__ == "__main__":
    threadPool = ThreadPoolExecutor(max_workers=2, thread_name_prefix="test_")
    for i in range(0, 2):
        future = threadPool.submit(test, i, i + 1)
        future.add_done_callback(test_result)
    threadPool.shutdown(wait=True)
    print('main finished')
    for result in future_list:
        print(result)

结果：

test__0 threading is printed 0, 1
test__1 threading is printed 1, 2
main finished
test__0 finished
test__1 finished

2.3 使用 futures.wait 来全局等待

import threading
import time
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor


def test(value1, value2=None):
    print("%s threading is printed %s, %s" % (threading.current_thread().name, value1, value2))
    time.sleep(2)
    return threading.current_thread().name + '  finished'


if __name__ == "__main__":
    feature_list = []
    threadPool = ThreadPoolExecutor(max_workers=2, thread_name_prefix="test_")
    for i in range(0, 2):
        future = threadPool.submit(test, i, i + 1)
        feature_list.append(future)
    # 等待所有线程运行完毕
    futures.wait(feature_list)
    threadPool.shutdown(wait=True)
    # 打印结果
    for feature in feature_list:
        result = feature.result()
        print(result)

#结果如下
test__0 threading is printed 0, 1
test__1 threading is printed 1, 2
test__0  finished
test__1  finished

三、注意：

线程的数量要大于等于任务的数量，不然就会出现丢任务一说

参考地址：https://www.cnblogs.com/hoojjack/p/10846010.html
参考地址：https://www.cnblogs.com/goldsunshine/p/16878089.html

posted @ 2024-09-04 10:10 快乐小王子帅气哥哥阅读(266) 评论(0) 编辑收藏举报

刷新页面返回顶部

快乐小王子

Python 之线程池