# 本文代码基于Python3
什么是进程?
程序并不能单独运行,只有将程序装载到内存中,系统为它分配资源才能运行,而这种执行的程序就称之为进程。程序和进程的区别就在于:程序是指令的集合,它是进程运行的静态描述文本;进程是程序的一次执行活动,属于动态概念。
An executing instance of a program is called a process.
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution.
Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
- 每个进程的启动都是从一个线程开始的,这个线程被称为主线程。进程可以从一个线程中创建多个线程。
- 一个进程有多个线程。 A thread is a context of execution, while a process is a bunch of resources associated with a computation. A process can have one or many threads.
-
多道编程,允许多个程序同时加载到内存,实现多进程的并发执行,用户感觉自己独享CPU
-
在进程内部,只能在一个时间干一件事。
-
进程在执行的过程中如果阻塞,例如等待输入,整个进程就会挂起,即使进程中有些工作不依赖于输入的数据,也将无法执行。
什么是线程?
线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中,是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流,一个进程中可以并发多个线程,每条线程并行执行不同的任务
A thread is an execution context, which is all the information a CPU needs to execute a stream of instructions.
理解:CPU的分时复用
Suppose you're reading a book, and you want to take a break right now, but you want to be able to come back and resume reading from the exact point where you stopped. One way to achieve that is by jotting down the page number, line number, and word number. So your execution context for reading a book is these 3 numbers.
If you have a roommate, and she's using the same technique, she can take the book while you're not using it, and resume reading from where she stopped. Then you can take it back, and resume it from where you were.
Threads work in the same way. A CPU is giving you the illusion that it's doing multiple computations at the same time. It does that by spending a bit of time on each computation. It can do that because it has an execution context for each computation. Just like you can share a book with your friend, many tasks can share a CPU.
On a more technical level, an execution context (therefore a thread) consists of the values of the CPU's registers.
线程与进程的区别:
- Threads share the address space of the process that created it; processes have their own address space.
同一个进程创建的多个进程,共享着这个进程的地址空间。而每个进程,有自己独立的地址空间
2. Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
线程可以直接访问其进程中的其他数据区。进程有其父进程数据区的专属拷贝。
3. Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
线程能直接与其进程下的其他线程通信。进程必须使用中间进程来与它的兄弟进程通信。
4. New threads are easily created; new processes require duplication of the parent process.
可以很容易地创建一个新的线程。而创建新的进程,需要其父进程的完整拷贝
5. Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
线程可以控制其所在进程的其他线程。进程只能控制其子进程。
6. Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.
对主线程的修改可能影响其所在进程的其他线程的行为。对父进程的修改不能影响子进程
关于线程的一个例子:
关于线程对象的join方法请看:https://docs.python.org/3/library/threading.html#threading.Thread.join
import threading def run(name): print('this is my application',name) Apps = ('App1','App2','App3') start_time = time.time() t_objs = [] for name in Apps: # 创建线程对象。第一个参数是函数名,第二个参数是由函数参数组成的元组 t = threading.Thread(target=run,args=(name,)) #主线程开启一个子线程 t.start() t_objs.append(t) for t in t_objs: # 等待,直到被调用的子线程运行结束,才会调回主线程,执行主线程剩下的操作 t.join() print('-------all threads are terminated!----') print('time cost:',time.time() - start_time)
什么是daemon 守护线程 ?
将子线程设为守护线程后,当主线程终止运行,守护线程也会随之终止运行,即使守护线程的任务并未执行完成
import threading def run(): print('this is my application') # 创建一个线程 t = threading.Thread(target=run) # 将该进程设为守护进程 t = setDaemon(True) # 开启这个守护进程 t.start()
GIL Global Interpreter Lock 全局解释锁
GIL
并不是Python的特性,而是其解释器CPython中的一个概念。大部分环境下,CPython是默认的Python解释器,无论开启多少个线程、有多少个cpu, 在同一时刻只允许一个线程运行。注意:同一段Python代码,可以通过CPython,PyPy,Jpython, Psyco等不同的Python解释器执行。CPython有GIL,Pypy, JPython等没有GIL。
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
推荐阅读:
https://docs.python.org/3.5/glossary.html#term-global-interpreter-lock
http://www.dabeaz.com/python/UnderstandingGIL.pdf
线程锁(Thread Lock)
又叫互斥锁(mutex), 同时只允许一个线程更改数据 https://docs.python.org/3/library/threading.html#threading.Lock
import threading # 创建互斥对象 lock = threading.Lock()
# 获得锁 lock.acquire()
# 执行任务 pass
# 释放锁 lock.release()
RLock 递归锁
一个大锁中包含一个或多个子锁 https://docs.python.org/3/library/threading.html#rlock-objects
import threading # 创建递归锁对象 lock = threading.RLock()
# 获得锁 lock.acquire()
# 执行任务 pass
# 释放锁 lock.release()
Semaphore 信号量
Semaphore同时允许一定数量的线程更改数据。该对象本质上还是锁。
import threading # 创建一个带信号量的锁对象,最多允许5个线程同时运行 mySemaphore = threading.BoundedSemaphore(5) # 获得锁 mySemaphore.acquire() # 释放锁 mySemaphore.release()
Timer 计时器
只有在超过一段时间后才会被执行的行为。an action that should be run only after a certain amount of time has passed. https://docs.python.org/3/library/threading.html#timer-objects
import threading def hello(): print("hello, world") t = threading.Timer(10, hello) t.start() # after 10 seconds, "hello, world" will be printed
Events 事件
通过Event来实现两个或多个线程间的交互, one thread signals an event and other threads wait for it. An event object manages an internal flag that can be set to true with the set()
method and reset to false with the clear()
method. The wait()
method blocks until the flag is true.
https://docs.python.org/3/library/threading.html#event-objects
import threading # 创建一个event对象 myEvent = threading.Event() # 设置标志位为True myEvent.set() # 设置标志位为False myEvent.clear() # 标志位等待被设定 myEvent.wait() # 判断标志位是否为True myEvent.is_set()
queue 队列
queue is especially useful in threaded programming when information must be exchanged safely between multiple threads. https://docs.python.org/3/library/queue.html#queue-objects
其作用在于:实现代码的松耦合、模块化,提高程序执行效率
import queue # 实例化一个先进先出的队列 FIFO q = queue.Queue() # 实例化一个后进先出的队列 LIFO q = queue.LifoQueue() # 实例化一个队列,存储数据时可以自行设置优先级 q = queue.PriorityQueue() q.put(3,'kaye') q.put(-1,'leo') # 入列一个元素 q.put(1) q.put('day1') # 出列一个元素 q.get() # 若列表为空,抛出异常,等同于q.get(False) q.get_nowait() # 抛出异常,等同于q.put(False) q.put_nowait() # 返回队列的大小(包含元素的个数) q.qsize() # 清空 q.empty() # 任务完成 q.task_done() # join q.join()
生产者消费者模型
生产者消费者模式是通过一个容器来解决生产者和消费者的强耦合问题。生产者和消费者彼此之间不直接通讯,而通过阻塞队列来进行通讯,所以生产者生产完数据之后不用等待消费者处理,直接扔给阻塞队列,消费者不找生产者要数据,而是直接从阻塞队列里取,阻塞队列就相当于一个缓冲区,平衡了生产者和消费者的处理能力。
import threading,queue,time def producer(): count = 1 while True: q.put('骨头[%s]' % count) count += 1 print('生产了骨头[%s]' %count) time.sleep(1) def consumer(n): while True: print('[%s]取到%s' %(n, q.get())) q.task_done() # 告知这个任务执行完了 q = queue.Queue(maxsize=10) p = threading.Thread(target=producer, ) p.start() c1 = threading.Thread(target=consumer, args=('李闯',)) c2 = threading.Thread(target=consumer, args=('王森',)) c1.start() c2.start()
什么是多线程?
多个线程同时运行。多线程是为了解决CPU资源的闲置浪费,取消底层计算中不必要的锁机制、和上下文的来回频繁切换。
多线程的应用场景:CPU密集操作型的任务。它不适用于IO密集操作型的任务,因为IO操作是不占用CPU资源的。
什么是多进程?
同时运行多个进程,每个进程独立使用一份CPU资源(独占);通过使用子进程,有效规避了全局解释锁GIL,从而更充分地利用了CPU多核处理器这一物理特性。
https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing
import process def run(name): print('name') # 生成一个多进程对象 mp = multiprocessing.process(target=run, args=('bob',)) mp.start() mp.join()
多进程间如何通信?
不同进程间内存是不共享的,要想实现两个进程间的数据交换,可以用这个方法:在创建子进程时,克隆一份父进程中创建的queue对象,并将其作为参数传递给子进程。本质上是有两个queue对象,它们在内存中不是同一块数据,后者是前者的一份克隆。
https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes
import multiprocessing def run(q): q.put([42, None, 'hello']) if __name__ == '__main__': q = multiprocessing.Queue() # 创建子进程时,克隆主进程创建的queue对象,将其作为参数传给子进程 # The Queue class is a near clone of queue.Queue. p = multiprocessing.Process(target=run, args=(q,)) p.start() # prints "[42, None, 'hello']" print(q.get()) p.join()