Day11 多进程与多线程编程
一、进程与线程
1.什么是进程(process)?
An executing instance of a program is called a process.
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
程序并不能单独运行,只有将程序装载到内存中,系统为它分配资源才能运行,而这种执行的程序就称之为进程。程序和进程的区别就在于:程序是指令的集合,它是进程运行的静态描述文本;进程是程序的一次执行活动,属于动态概念。
在多道编程中,我们允许多个程序同时加载到内存中,在操作系统的调度下,可以实现并发地执行。这是这样的设计,大大提高了CPU的利用率。进程的出现让每个用户感觉到自己独享CPU,因此,进程就是为了在CPU上实现多道编程而提出的,总之一句话,进程是为了提高CPU的利用率!
2.有了进程为什么还要线程?
1.什么是线程?
A thread is an execution context, which is all the information a CPU needs to execute a stream of instructions.
也就是说:线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中,是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流,一个进程中可以并发多个线程,每条线程并行执行不同的任务
2.为啥要有线程?
- Threads share the address space of the process that created it; processes have their own address space.
- Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
- Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
- New threads are easily created; new processes require duplication of the parent process.
- Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
- Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.
从上面对进程的英文解释来看,说白了:
1.进程本身就是一堆资源的集合,本身不工作,线程才是真正工作的!(进程就好比我们的教室,而线程就好比我们教室中的每个人,没有人所有的物件都是死的!)
2.进程虽然也是可以完成并发效果的,但是由于进程之间是独立的,资源不共享,所以如果要是用多个进程同时运行QQ,模拟并发的话,需要每个线程都加载一份属于自己的数据,如果每个需要加载500M的话,当有3个QQ同时运行,就需要加载1.5G,浪费资源是其次的,但是我们这里还涉及到进程间数据同步问题!所以为了能够让我们的程序既能够做多件事,又能够共享同一块资源,我们设计了多线程!
3.进程和线程的区别:
1.进程 内存独立,线程 共享同一进程的内存
2.进程是资源的集合,线程是执行单位(工作的最小单元!)
3.进程之间不能直接互相访问,线程可以互相通信
4.一个应用程序最少有一个进程,进程中最少有一个线程!
5.创建新进程非常耗费资源[开辟内存空间,进程号等],线程非常轻量(只保存线程需要运行时的必要数据,如:上下文,程序堆栈信息)!
3.Python GIL
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
上面的核心意思就是,无论你启多少个线程,你有多少个cpu, Python在执行的时候会淡定的在同一时刻只允许一个线程运行,擦。。。,那这还叫什么多线程呀?莫如此早的下结结论,听我现场讲。
首先需要明确的一点是GIL
并不是Python的特性,它是在实现Python解析器(CPython)时所引入的一个概念。就好比C++是一套语言(语法)标准,但是可以用不同的编译器来编译成可执行代码。有名的编译器例如GCC,INTEL C++,Visual C++等。Python也一样,同样一段代码可以通过CPython,PyPy,Psyco等不同的Python执行环境来执行。像其中的JPython就没有GIL。然而因为CPython是大部分环境下默认的Python执行环境。所以在很多人的概念里CPython就是Python,也就想当然的把GIL
归结为Python语言的缺陷。所以这里要先明确一点:GIL并不是Python的特性,Python完全可以不依赖于GIL。
这篇文章透彻的剖析了GIL对python多线程的影响,强烈推荐看一下:http://www.dabeaz.com/python/UnderstandingGIL.pdf