Python之multiprocessing模块的使用
1、开启多进程的简单示例,处理函数无带参数
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing def worker(): print('工作中') if __name__ == '__main__': for i in range(5): p = multiprocessing.Process(target=worker) p.start()
运行效果
[root@ mnt]# python3 multiprocessing_simple.py
工作中
工作中
工作中
工作中
工作中
2、开启多进程的简单示例,处理函数有带参数
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing def worker(num): print('工作id: %s' % num) if __name__ == '__main__': for i in range(5): p = multiprocessing.Process(target=worker, args=(i,)) p.start()
运行效果
[root@ mnt]# python3 multiprocessing_simple_args.py 工作id: 1 工作id: 2 工作id: 3 工作id: 4 工作id: 0
3、多进程处理导入模块里面的任务
#!/usr/bin/env python # -*- coding: utf-8 -*- def worker(): print('工作中') return
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import multiprocessing_import_worker if __name__ == '__main__': for i in range(5): p = multiprocessing.Process( target=multiprocessing_import_worker.worker, ) p.start()
运行效果
[root@ mnt]# python3 multiprocessing_import_main.py
工作中
工作中
工作中
工作中
工作中
4、多进程自定义进程名字
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import logging import time logging.basicConfig( level=logging.DEBUG, format="(%(threadName)-10s) %(message)s", ) def worker(): name = multiprocessing.current_process().name logging.debug('%s 开始' % name) time.sleep(3) logging.debug('%s 结束' % name) def my_service(): name = multiprocessing.current_process().name logging.debug('%s 开始' % name) time.sleep(3) logging.debug('%s 结束' % name) if __name__ == '__main__': service = multiprocessing.Process( name='my_service', target=my_service, ) worker_1 = multiprocessing.Process( name='worker_1', target=worker, ) worker_2 = multiprocessing.Process( target=worker, ) service.start() worker_1.start() worker_2.start()
运行结果
[root@ mnt]# python3 multiprocessing_names.py (MainThread) worker_1 开始 (MainThread) Process-3 开始 (MainThread) my_service 开始 (MainThread) worker_1 结束 (MainThread) Process-3 结束 (MainThread) my_service 结束
5、守护进程无等待的方式
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time import logging logging.basicConfig( level=logging.DEBUG, format='(%(threadName)-10s) %(message)s', ) def daemon(): p = multiprocessing.current_process() logging.debug('%s %s 开始' % (p.name, p.pid)) time.sleep(2) logging.debug('%s %s 结束' % (p.name, p.pid)) def no_daemon(): p = multiprocessing.current_process() logging.debug('%s %s 开始' % (p.name, p.pid)) logging.debug('%s %s 结束' % (p.name, p.pid)) if __name__ == '__main__': daemon_obj = multiprocessing.Process( target=daemon, name='daemon' ) daemon_obj.daemon = True no_daemon_obj = multiprocessing.Process( target=no_daemon, name='no_daemon' ) no_daemon_obj.daemon = False daemon_obj.start() time.sleep(1) no_daemon_obj.start()
运行结果
[root@ mnt]# python3 multiprocessing_daemon.py (MainThread) daemon 21931 开始 (MainThread) no_daemon 21932 开始 (MainThread) no_daemon 21932 结束
6、守护进程等待所有进程执行完成
运行效果
[root@ mnt]# python3 multiprocessing_daemon_join.py (MainThread) daemon 21948 开始 (MainThread) no_daemon 21949 开始 (MainThread) no_daemon 21949 结束 (MainThread) daemon 21948 结束
7、守护进程设置等待超时时间
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time import logging logging.basicConfig( level=logging.DEBUG, format='(%(threadName)-10s) %(message)s', ) def daemon(): p = multiprocessing.current_process() logging.debug('%s %s 开始' % (p.name, p.pid)) time.sleep(2) logging.debug('%s %s 结束' % (p.name, p.pid)) def no_daemon(): p = multiprocessing.current_process() logging.debug('%s %s 开始' % (p.name, p.pid)) logging.debug('%s %s 结束' % (p.name, p.pid)) if __name__ == '__main__': daemon_obj = multiprocessing.Process( target=daemon, name='daemon' ) daemon_obj.daemon = True no_daemon_obj = multiprocessing.Process( target=no_daemon, name='no_daemon' ) no_daemon_obj.daemon = False daemon_obj.start() time.sleep(1) no_daemon_obj.start() daemon_obj.join(1) logging.debug('daemon_obj.is_alive():%s' % daemon_obj.is_alive()) no_daemon_obj.join()
运行效果
[root@ mnt]# python3 multiprocessing_daemon_join_timeout.py (MainThread) daemon 21997 开始 (MainThread) no_daemon 21998 开始 (MainThread) no_daemon 21998 结束 (MainThread) daemon_obj.is_alive():True
8、进程的终止,注意:terminate的时候,需要使用join()进程,保证进程成功终止
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time import logging logging.basicConfig( level=logging.DEBUG, format='(%(threadName)-10s) %(message)s', ) def slow_worker(): print('开始工作') time.sleep(0.1) print('结束工作') if __name__ == '__main__': p = multiprocessing.Process( target=slow_worker ) logging.debug('开始之前的状态%s' % p.is_alive()) p.start() logging.debug('正在运行的状态%s' % p.is_alive()) p.terminate() logging.debug('调用终止进程的状态%s' % p.is_alive()) p.join() logging.debug('等待所有进程运行完成,状态%s' % p.is_alive())
运行结果
[root@ mnt]# python3 multiprocessing_terminate.py
(MainThread) 开始之前的状态False
(MainThread) 正在运行的状态True
(MainThread) 调用终止进程的状态True
(MainThread) 等待所有进程运行完成,状态False
9、进程退出状态码
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import sys import time def exit_error(): sys.exit(1) def exit_ok(): return def return_value(): return 1 def raises(): raise RuntimeError('运行时的错误') def terminated(): time.sleep(3) if __name__ == '__main__': jobs = [] funcs = [ exit_error, exit_ok, return_value, raises, terminated, ] for func in funcs: print('运行进程的函数名 %s' % func.__name__) j = multiprocessing.Process( target=func, name=func.__name__ ) jobs.append(j) j.start() jobs[-1].terminate() for j in jobs: j.join() print('{:>15}.exitcode={}'.format(j.name, j.exitcode))
运行效果
[root@ mnt]# python3 multiprocessing_exitcode.py 运行进程的函数名 exit_error 运行进程的函数名 exit_ok 运行进程的函数名 return_value 运行进程的函数名 raises 运行进程的函数名 terminated Process raises: exit_error.exitcode=1 exit_ok.exitcode=0 return_value.exitcode=0 Traceback (most recent call last): File "/usr/local/Python-3.6.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/local/Python-3.6.6/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "multiprocessing_exitcode.py", line 25, in raises raise RuntimeError('运行时的错误') RuntimeError: 运行时的错误 #注意的是,抛出异常,退出码默认是1 raises.exitcode=1 terminated.exitcode=-15
10、多进程全局日志的开启
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import logging import sys def worker(): print('工作中...') sys.stdout.flush() if __name__ == '__main__': multiprocessing.log_to_stderr(logging.DEBUG) p = multiprocessing.Process(target=worker, ) p.start() p.join()
运行效果
[root@ mnt]# python3 multiprocessing_log_to_stderr.py [INFO/Process-1] child process calling self.run() 工作中... [INFO/Process-1] process shutting down [DEBUG/Process-1] running all "atexit" finalizers with priority >= 0 [DEBUG/Process-1] running the remaining "atexit" finalizers [INFO/Process-1] process exiting with exitcode 0 [INFO/MainProcess] process shutting down [DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0 [DEBUG/MainProcess] running the remaining "atexit" finalizers
11、多进程日志开启之设置日志的显示级别
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import logging import sys def worker(): print('工作中...') sys.stdout.flush() if __name__ == '__main__': multiprocessing.log_to_stderr() logger = multiprocessing.get_logger() logger.setLevel(logging.INFO) p = multiprocessing.Process(target=worker, ) p.start() p.join()
测试效果
[root@ mnt]# python3 multiprocessing_get_logger.py [INFO/Process-1] child process calling self.run() 工作中... [INFO/Process-1] process shutting down [INFO/Process-1] process exiting with exitcode 0 [INFO/MainProcess] process shutting down
12、利用继承multiprocessing.Process类,实现无参的多进程
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import logging import sys class Worker(multiprocessing.Process): def run(self): print('当前运行进程名字: %s' % self.name) if __name__ == '__main__': jobs = [] for i in range(5): p = Worker() jobs.append(p) p.start() for j in jobs: j.join()
运行效果
[root@ mnt]# python3 multiprocessing_subclass.py 当前运行进程名字: Worker-2 当前运行进程名字: Worker-3 当前运行进程名字: Worker-4 当前运行进程名字: Worker-5 当前运行进程名字: Worker-1
13、多进程队列multiprocessing.Queue()的使用
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing class MyFancyClass(object): def __init__(self, name): self.name = name def do_something(self): proc_name = multiprocessing.current_process().name print('当前进程名字: %s,当前实例化初始名字:%s' % (proc_name, self.name)) def worker(q): obj = q.get() obj.do_something() if __name__ == '__main__': queue = multiprocessing.Queue() #开启进程并且传进队列的实例化对象,此时队列是空,所以会阻塞等数据的到来 p = multiprocessing.Process( target=worker, args=(queue,) ) p.start() #往队列增加数据 queue.put(MyFancyClass('Mrs Suk')) queue.close() #队列等待进程处理完成 queue.join_thread() p.join()
运行效果
[root@ mnt]# python3 multiprocessing_queue.py 当前进程名字: Process-1,当前实例化初始名字:Mrs Suk
14、多进程队列multiprocessing.JoinableQueue()的使用,示例:实现数字乘法运算,并且把结果存入队列中,最后再从队列中取出打印出来
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time class Consumer(multiprocessing.Process): """消费者类""" def __init__(self, task_queue, result_queue, *args, **kwargs): super(Consumer, self).__init__(*args, **kwargs) self.task_queue = task_queue self.result_queue = result_queue def run(self): proc_name = self.name # 获取进程名字 while True: next_task = self.task_queue.get() if next_task is None: #如果获取到对象为空的话,则队列已经退出 print('%s 退出' % proc_name) self.task_queue.task_done() break print('{}:{}'.format(proc_name, next_task)) answer = next_task() # 这里会调用_Task类_call__方法 self.task_queue.task_done() #处理完成,向队列发送task_done(),让该队列不要在join,如果没有发送task_done(),则队列一直是join self.result_queue.put(answer) # 将运行结果放在results队列中 class Task(object): def __init__(self, a, b): self.a = a self.b = b def __call__(self, *args, **kwargs): time.sleep(0.1) return '{self.a} * {self.b} = {product}'.format(self=self, product=self.a * self.b) def __str__(self): return '{self.a} * {self.b}'.format(self=self) if __name__ == '__main__': # 队列比Queue多了两个方法,task_done(),join() tasks = multiprocessing.JoinableQueue() # 结果存放的队列 results = multiprocessing.Queue() # 获取电脑CPU核数 num_consumers = multiprocessing.cpu_count() * 2 print('创建{}位消费者'.format(num_consumers)) consumers = [ Consumer(tasks, results) for i in range(num_consumers) ] # 开启消费者多进程 for w in consumers: w.start() # 往排队队列增加数据 num_jobs = 10 for i in range(10): tasks.put(Task(i, i)) # 往每一个消费队列设置默认值 None for i in range(num_consumers): tasks.put(None) # 等待所有的任务完成 tasks.join() # 打印处理的结果 while num_jobs: result = results.get() print('运算结果:', result) num_jobs -= 1
运行结果
[root@ mnt]# python3 multiprocessing_producer_consumer.py 创建2位消费者 #因为测试机只有2核,所以产生两位消费者 Consumer-1:0 * 0 Consumer-2:1 * 1 Consumer-1:2 * 2 Consumer-2:3 * 3 Consumer-1:4 * 4 Consumer-2:5 * 5 Consumer-1:6 * 6 Consumer-2:7 * 7 Consumer-1:8 * 8 Consumer-2:9 * 9 Consumer-1 退出 Consumer-2 退出 运算结果: 1 * 1 = 1 运算结果: 0 * 0 = 0 运算结果: 2 * 2 = 4 运算结果: 3 * 3 = 9 运算结果: 5 * 5 = 25 运算结果: 4 * 4 = 16 运算结果: 7 * 7 = 49 运算结果: 6 * 6 = 36 运算结果: 8 * 8 = 64 运算结果: 9 * 9 = 81
15、多进程事件设置
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time def wait_for_event(event_obj): print('无超时等待事件开始') event_obj.wait() print('阻塞事件状态:', event_obj.is_set()) def wait_for_event_timeout(event_obj, timeout): print('设置超时等待事件开始') event_obj.wait(timeout) print('非阻塞事件状态:', event_obj.is_set()) if __name__ == '__main__': event_obj = multiprocessing.Event() block_task = multiprocessing.Process( name='block_task', target=wait_for_event, args=(event_obj,) ) block_task.start() non_block_task = multiprocessing.Process( name='non_block_task', target=wait_for_event_timeout, args=(event_obj, 2) ) non_block_task.start() print('等待3秒,让所有进程都正常开启') time.sleep(3) event_obj.set() print('设置事件状态为set()=True')
运行效果
[root@ mnt]# python3 multiprocessing_event.py 等待3秒,让所有进程都正常开启 设置超时等待事件开始 无超时等待事件开始 非阻塞事件状态: False 设置事件状态为set()=True 阻塞事件状态: True
16、多进程资源控制访问,锁的使用
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import sys def worker_with(lock, stream): with lock: stream.write('通过with获取得到锁\n') def worker_no_with(lock, stream): lock.acquire() try: stream.write('通过lock.acquire()获取得到锁\n') finally: lock.release() if __name__ == '__main__': lock = multiprocessing.Lock() w = multiprocessing.Process( target=worker_with, args=(lock, sys.stdout,) ) nw = multiprocessing.Process( target=worker_no_with, args=(lock, sys.stdout,) ) w.start() nw.start() w.join() nw.join()
运行效果
[root@ mnt]# python3 multiprocessing_lock.py
通过lock.acquire()获取得到锁
通过with获取得到锁
17、多进程multiprocessing.Condition()同步
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time def task_1(condition_obj): proc_name = multiprocessing.current_process().name print('开始 %s' % proc_name) with condition_obj: print('%s运行结束,开始运行task_2' % proc_name) condition_obj.notify_all() def task_2(condition_obj): proc_name = multiprocessing.current_process().name print('开始 %s' % proc_name) with condition_obj: condition_obj.wait() print('task_2 %s 运行结束' % proc_name) if __name__ == '__main__': condition_obj = multiprocessing.Condition() s1 = multiprocessing.Process(name='s1', target=task_1, args=(condition_obj,)) s2_clients = [ multiprocessing.Process( name='task_2[{}]'.format(i), target=task_2, args=(condition_obj,), ) for i in range(1, 3) ] for c in s2_clients: c.start() time.sleep(1) s1.start() s1.join() for c in s2_clients: c.join()
运行效果
[root@ mnt]# python3 multiprocessing_condition.py 开始 task_2[1] 开始 task_2[2] 开始 s1 s1运行结束,开始运行task_2 task_2 task_2[1] 运行结束 task_2 task_2[2] 运行结束
18、利用multiprocessing.Semaphore()自定义控制资源的并发访问
#!/usr/bin/env python # -*- coding: utf-8 -*- import random import multiprocessing import time class ActivePool: def __init__(self, *args, **kwargs): super(ActivePool, self).__init__(*args, **kwargs) self.mgr = multiprocessing.Manager() self.active = self.mgr.list() self.lock = multiprocessing.Lock() def makeActive(self, name): with self.lock: self.active.append(name) def makeInactive(self, name): with self.lock: self.active.remove(name) def __str__(self): with self.lock: return str(self.active) def worker(s, pool): name = multiprocessing.current_process().name with s: pool.makeActive(name) print('Activating {} now running {}'.format( name, pool)) time.sleep(random.random()) pool.makeInactive(name) if __name__ == '__main__': pool = ActivePool() s = multiprocessing.Semaphore(3) jobs = [ multiprocessing.Process( target=worker, name=str(i), args=(s, pool), ) for i in range(10) ] for j in jobs: j.start() while True: alive = 0 for j in jobs: if j.is_alive(): alive += 1 j.join(timeout=0.1) print('Now running {}'.format(pool)) if alive == 0: # all done break
运行效果
[root@ mnt]# python3 multiprocessing_semaphore.py Activating 9 now running ['9'] Activating 5 now running ['9', '5'] Activating 4 now running ['9', '5', '4'] Activating 1 now running ['9', '5', '1'] Now running ['9', '5', '1'] Now running ['9', '5', '1'] Now running ['9', '5', '1'] Now running ['9', '5', '1'] Activating 2 now running ['9', '1', '2'] Now running ['9', '1', '2'] Now running ['9', '1', '2'] Now running ['9', '1', '2'] Now running ['9', '1', '2'] Activating 6 now running ['9', '2', '6'] Now running ['9', '2', '6'] Now running ['9', '2', '6'] Activating 7 now running ['2', '6', '7'] Activating 8 now running ['2', '7', '8'] Now running ['2', '7', '8'] Now running ['2', '7', '8'] Now running ['2', '7', '8'] Now running ['2', '7', '8'] Activating 3 now running ['7', '8', '3'] Now running ['7', '8', '3'] Now running ['7', '8', '3'] Activating 0 now running ['7', '3', '0'] Now running ['7', '0'] Now running ['7'] Now running ['7'] Now running ['7'] Now running []
19、多进程multiprocessing.Manager()共享字典或列表数据
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing def worker(dict_obj, key, value): dict_obj[key] = value if __name__ == '__main__': #创建一个多进程共享的字典,所有进程都能看到字典的内容 mgr = multiprocessing.Manager() mgr_dict = mgr.dict() jobs = [ multiprocessing.Process( target=worker, args=(mgr_dict, i, i * 2), ) for i in range(10) ] #开启worker任务 for j in jobs: j.start() ##等待worker任务执行完成 for j in jobs: j.join() print('运行结果:', mgr_dict)
运行效果
[root@ mnt]# python3 multiprocessing_manager_dict.py 运行结果: {5: 10, 6: 12, 1: 2, 2: 4, 3: 6, 7: 14, 8: 16, 9: 18, 4: 8, 0: 0}
20、多进程multiprocessing.Manager()共享命名空间,字符串类型:全局可以获得值
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time def producer(namespace_obj, event): """生产者""" namespace_obj.value = '命名空间设置的值:1234' event.set() def consumer(namespace_obj, event): """"消费者""" """ 生产者和消费者首次进程开启的时候, namespace_obj.value不存在,所以会抛异常, 当生产者事件设置set()的时候, 消费者event.wait()不阻塞,继续执行后面的结果 """ try: print('进程事件前的值: {}'.format(namespace_obj.value)) except Exception as err: print('进程事件前错误:', str(err)) event.wait() print('进程事件后的值:', namespace_obj.value) if __name__ == '__main__': # 创建一个共享管理器 mgr = multiprocessing.Manager() # 创建一个命名空间类型共享类型 namespace = mgr.Namespace() # 创建多进程的事件 event = multiprocessing.Event() p = multiprocessing.Process( target=producer, args=(namespace, event), ) c = multiprocessing.Process( target=consumer, args=(namespace, event), ) c.start() time.sleep(1) p.start() c.join() p.join()
运行效果
[root@ mnt]# python3 multiprocessing_namespace.py 进程事件前错误: 'Namespace' object has no attribute 'value' 进程事件后的值: 命名空间设置的值:1234
21、多进程multiprocessing.Manager()共享命名空间,列表类型:全局不可以获得值
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import time def producer(namespace_obj, event): """生产者""" namespace_obj.my_list.append('命名空间设置的值:1234') event.set() def consumer(namespace_obj, event): """"消费者""" """ 生产者和消费者首次进程开启的时候, namespace_obj.value不存在,所以会抛异常, 当生产者事件设置set()的时候, 消费者event.wait()不阻塞,继续执行后面的结果 """ try: print('进程事件前的值: {}'.format(namespace_obj.my_list)) except Exception as err: print('进程事件前错误:', str(err)) event.wait() print('进程事件后的值:', namespace_obj.my_list) if __name__ == '__main__': # 创建一个共享管理器 mgr = multiprocessing.Manager() # 创建一个命名空间类型共享类型 namespace = mgr.Namespace() # 如果是列表类型,不是能全局更换列表 namespace.my_list = [] # 创建多进程的事件 event = multiprocessing.Event() p = multiprocessing.Process( target=producer, args=(namespace, event), ) c = multiprocessing.Process( target=consumer, args=(namespace, event), ) c.start() p.start() c.join() p.join()
运行效果
[root@ mnt]# python3 multiprocessing_namespace_mutable.py
进程事件前的值: []
进程事件后的值: []
22、进程池之列表数字的运算
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing def do_calculation(data): return data * 2 def start_process(): print('进程开始', multiprocessing.current_process().name) if __name__ == '__main__': inputs = list(range(10)) print('inputs :', inputs) #使用内置的map方法运算 builtin_outputs = map(do_calculation, inputs) print('Built-in:', list(builtin_outputs)) pool_size = multiprocessing.cpu_count() * 2 pool = multiprocessing.Pool( processes=pool_size, initializer=start_process, ) #使用进程池进行运算 pool_outputs = pool.map(do_calculation, inputs) pool.close() pool.join() print('Pool :', pool_outputs)
运行效果
[root@ mnt]# python3 multiprocessing_pool.py inputs : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Built-in: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] 进程开始 ForkPoolWorker-2 进程开始 ForkPoolWorker-1 Pool : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
23、进程池设置一个进程最多运行多少次(maxtasksperchild)就执行重启进程,作用:避免工作进程长时间运行消耗很多的系统资源
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing def do_calculation(data): return data * 2 def start_process(): print('进程开始', multiprocessing.current_process().name) if __name__ == '__main__': inputs = list(range(100)) print('inputs :', inputs) # 使用内置的map方法运算 builtin_outputs = map(do_calculation, inputs) print('Built-in:', list(builtin_outputs)) pool_size = multiprocessing.cpu_count() * 2 pool = multiprocessing.Pool( processes=pool_size, initializer=start_process, maxtasksperchild=2 ) # 使用进程池进行运算 pool_outputs = pool.map(do_calculation, inputs) pool.close() pool.join() print('Pool :', pool_outputs)
运行效果
[root@ mnt]# python3 multiprocessing_pool_maxtasksperchild.py inputs : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Built-in: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] 进程开始 ForkPoolWorker-2 进程开始 ForkPoolWorker-1 进程开始 ForkPoolWorker-4 进程开始 ForkPoolWorker-3 Pool : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
24、利用多进程的进程池实例MapReduce,下面示例简单:读取文件内容,分词计数器
#!/usr/bin/env python # -*- coding: utf-8 -*- import collections import itertools import multiprocessing class SimpleMapReduce: def __init__(self, map_func, reduce_func, num_workers=None): """ :param map_func: 会调用file_to_words(filename)函数 :param reduce_func: 会调用count_words(item)的函数 :param num_workers: """ self.map_func = map_func self.reduce_func = reduce_func self.pool = multiprocessing.Pool(num_workers) def partition(self, mapped_values): """包装一个字典集合""" partitioned_data = collections.defaultdict(list) for key, value in mapped_values: partitioned_data[key].append(value) return partitioned_data.items() def __call__(self, inputs, chunksize=1): """ :param inputs:文件名 :param chunksize: 处理块的大小 :return: """ #这里返回值是:[(word,1)...] map_responses = self.pool.map( self.map_func, inputs, chunksize=chunksize, ) # 返回的是collections.defaultdict().items()的key,value partitioned_data = self.partition( itertools.chain(*map_responses) ) #将包组好的dict_items()对象,调用传入count_words(item)的item里面,这样子,就可以使聚合函数sum()生效 reduced_values = self.pool.map( self.reduce_func, partitioned_data, ) return reduced_values
#!/usr/bin/env python # -*- coding: utf-8 -*- import multiprocessing import string from multiprocessing_mapreduce import SimpleMapReduce def file_to_words(filename): """作用:读取文件内容,分词+计数""" # 怱略统计字符串集合 STOP_WORDS = set([ 'a', 'an', 'and', 'are', 'as', 'be', 'by', 'for', 'if', 'in', 'is', 'it', 'of', 'or', 'py', 'rst', 'that', 'the', 'to', 'with', ]) TR = str.maketrans({ p: ' ' for p in string.punctuation }) print('进程:{} 读取文件名:{}'.format(multiprocessing.current_process().name, filename)) output = [] with open(filename, 'rt', encoding='utf-8') as f: for line in f: #怱略注释..开头 if line.lstrip().startswith('..'): continue line = line.translate(TR) # 去除TR包含的符号 for word in line.split():#通过空格分割 word = word.lower() if word.isalpha() and word not in STOP_WORDS: output.append((word, 1)) return output def count_words(item): """词的聚合函数求合""" word, occurences = item return (word, sum(occurences)) if __name__ == '__main__': import operator import glob #搜索当前文件,后缀为*.rst结尾的文件 input_files = glob.glob('*.rst') #实例化一个MapReduce对象 mapper = SimpleMapReduce(file_to_words, count_words) word_counts = mapper(input_files) #这里会调用SimpleMapReduce类里面的__call__方法 word_counts.sort(key=operator.itemgetter(1)) #获取word_counts的下标为1,作为排序 word_counts.reverse() #倒序 print('\nTOP 20 WORDS BY FREQUENCY\n') top20 = word_counts[:20] longest = max(len(word) for word, count in top20) for word, count in top20: print('{word:<{len}}: {count:5}'.format( len=longest + 1, word=word, count=count) )
If there is a relationship() from Parent to Child, but there is not a reverse-relationship that links a particular Child to each Parent, SQLAlchemy will not have any awareness that when deleting this particular Child object, it needs to maintain the “secondary” table that links it to the Parent. No delete of the “secondary” table will occur. If there is a relationship that links a particular Child to each Parent, suppose it’s called Child.parents, SQLAlchemy by default will load in the Child.parents collection to locate all Parent objects, and remove each row from the “secondary” table which establishes this link. Note that this relationship does not need to be bidirectional; SQLAlchemy is strictly looking at every relationship() associated with the Child object being deleted. A higher performing option here is to use ON DELETE CASCADE directives with the foreign keys used by the database. Assuming the database supports this feature, the database itself can be made to automatically delete rows in the “secondary” table as referencing rows in “child” are deleted. SQLAlchemy can be instructed to forego actively loading in the Child.parents collection in this case using the passive_deletes directive on relationship(); see Using Passive Deletes for more details on this. Note again, these behaviors are only relevant to the secondary option used with relationship(). If dealing with association tables that are mapped explicitly and are not present in the secondary option of a relevant relationship(), cascade rules can be used instead to automatically delete entities in reaction to a related entity being deleted - see Cascades for information on this feature.
运行效果
[root@python-mysql mnt]# python3 multiprocessing_wordcount.py 进程:SpawnPoolWorker-1 读取文件名:test.rst TOP 20 WORDS BY FREQUENCY child : 8 relationship : 8 this : 7 parent : 5 on : 4 delete : 4 table : 4 sqlalchemy : 4 not : 4 can : 3 database : 3 used : 3 option : 3 deleted : 3 parents : 3 will : 3 each : 3 particular : 3 links : 3 there : 3