进程间通信（管道）,多线程

Ⅰ 进程间通信（管道）

【一】引入

借助于消息队列，进程可以将消息放入队列中，然后由另一个进程从队列中取出。
这种通信方式是非阻塞的，即发送进程不需要等待接收进程的响应即可继续执行。
multiprocessing模块支持两种形式：队列和管道，这两种方式都是使用消息传递的
进程间通信（IPC）方式二：管道（不推荐使用，了解即可）

【二】介绍

【1】管道类介绍

# Pipe
from multiprocessing import Pipe

（1）创建管道对象

left_pipe, right_pipe = Pipe()  # 默认参数 是 dumplex : 默认双通道的管道

（2）主要的方法

接收数据

# 先将另一端关闭 ---> 一端取数据
left_pipe.close()
right_pipe.recv()

发送数据

left_pipe.close()
right_pipe.send()

【2】创建管道的类

Pipe([duplex])
- 在进程之间创建一条管道，并返回元组（conn1,conn2）,其中conn1，conn2表示管道两端的连接对象
- 强调一点：必须在产生Process对象之前产生管道

【3】参数介绍

dumplex
- 默认管道是全双工的，如果将duplex射成False，conn1只能用于接收，conn2只能用于发送。

【4】主要方法

conn1.recv()
- 接收conn2.send(obj)发送的对象。
  - 如果没有消息可接收，recv方法会一直阻塞。
  - 如果连接的另外一端已经关闭，那么recv方法会抛出EOFError。
- conn1.send(obj)
- 通过连接发送对象。obj是与序列化兼容的任意对象
【5】次要方法
- conn1.close()
  - 关闭连接。如果conn1被垃圾回收，将自动调用此方法
- conn1.fileno()
  - 返回连接使用的整数文件描述符
- conn1.poll([timeout])
  - 如果连接上的数据可用，返回True。
  - timeout指定等待的最长时限。
  - 如果省略此参数，方法将立即返回结果。
  - 如果将timeout射成None，操作将无限期地等待数据到达。
- conn1.recv_bytes([maxlength])
  - 接收c.send_bytes()方法发送的一条完整的字节消息。
  - maxlength指定要接收的最大字节数。
  - 如果进入的消息，超过了这个最大值，将引发IOError异常，并且在连接上无法进行进一步读取。
  - 如果连接的另外一端已经关闭，再也不存在任何数据，将引发EOFError异常。
- conn.send_bytes(buffer [, offset [, size]])
  - 通过连接发送字节数据缓冲区，buffer是支持缓冲区接口的任意对象，offset是缓冲区中的字节偏移量，而size是要发送字节数。
  - 结果数据以单条消息的形式发出，然后调用c.recv_bytes()函数进行接收
- conn1.recv_bytes_into(buffer [, offset]):
  - 接收一条完整的字节消息，并把它保存在buffer对象中，该对象支持可写入的缓冲区接口（即bytearray对象或类似的对象）。
- offset指定缓冲区中放置消息处的字节位移。
- 返回值是收到的字节数。
- 如果消息长度大于可用的缓冲区空间，将引发BufferTooShort异常。
- 基于管道实现进程间通信（与队列的方式是类似的，队列就是管道加锁实现的）

【三】代码实现

基于管道实现进程间通信

from multiprocessing import Pipe, Process

def producer(pipe_conn, name):
    # 【1】获取两个管道对象 左侧管道对象 右侧管道对象
    left_connection, right_connection = pipe_conn
    # 【2】放数据
    # 先关闭一侧
    right_connection.close()  # 关闭右侧
    # 再通过左侧管道传数据
    for i in range(5):
        data = f'producer{name}生产了{i}'
        print(data)
        left_connection.send(data)
        # 传递完数据之后一定要关闭打开的通道
    left_connection.close()



def consumer(pipe_conn, name):
    left_connection, right_connection = pipe_conn

    left_connection.close()
    # 通过右管道取数据
    while True:
        data = right_connection.recv()
        print(f'consumer{name}消费了{data}')
        if not data:
            break
    right_connection.close()



def main():
    # 【一】创建管道对象
    pipe = Pipe()
    # 【二】创建消费者对象和生产者对象
    producer_one = Process(
        target=producer,
        args=(pipe, f'producer1')
    )
    producer_one.start()

    # 创建消费者
    consumer_one = Process(
        target=consumer,
        args=(pipe, f'customer1')
    )
    consumer_one.daemon = True
    consumer_one.start()
    producer_one.join()

# 管道需要创建一个管道对象
# 管道对象里面有左右两个管道对象
# 传数据的时候要关闭一侧，从另一侧传数据进去
# 取数据的时候也要关闭一侧，从另一端取数据

if __name__ == '__main__':
    main()

# producerproducer1生产了0
# producerproducer1生产了1
# producerproducer1生产了2
# producerproducer1生产了3
# producerproducer1生产了4
# consumercustomer1消费了producerproducer1生产了0
# consumercustomer1消费了producerproducer1生产了1
# consumercustomer1消费了producerproducer1生产了2
# consumercustomer1消费了producerproducer1生产了3
# consumercustomer1消费了producerproducer1生产了4

Ⅱ 多线程

【一】什么是线程

在传统操作系统中，每个进程有一个地址空间，而且默认就有一个控制线程
线程顾名思义，就是一条流水线工作的过程
操作系统 --> 运行一个程序叫进程 ---> 进程里面又开了一个进程 ---> 改名叫线程
- 一条流水线必须属于一个车间，一个车间的工作过程是一个进程
- 车间负责把资源整合到一起，是一个资源单位，而一个车间内至少有一个流水线
- 流水线的工作需要电源，电源就相当于cpu
所以进程只是用来把资源集中到一起（进程只是一个资源单位，或者说资源集合），而线程才是cpu上的执行单位。
多线程（即多个控制线程）的概念是在一个进程中存在多个控制线程，多个控制线程共享该进程的地址空间，相当于一个车间内有多条流水线，都共用一个车间的资源。
多线程就是在进程CPU处理多个任务的逻辑

【1】举例说明

- 进程就是你的资源单位就是车间 ---> 存储设备及资源
- 线程就是你的执行单位就是流水线 --> 负责对数据进行加工和处理
- 将操作系统比喻成大的工厂
  - 进程相当于工厂里面的车间
  - 线程相当于车间里面的流水线

进程和线程都是抽象的概念

【2】小结

每一个进程必定自带一个线程
进程：资源单位
- 起一个进程仅仅只是在内存空间中开辟出一块独立的空间
线程：执行单位
- 真正被CPU执行的其实是进程里面的线程
- 线程指的就是代码的执行过程，执行代码中所需要使用到的资源都找所在的进程索要
进程和线程都是虚拟单位，只是为了我们更加方便的描述问题

【二】线程的创建开销

【1】创建进程的开销要远大于线程

如果我们的软件是一个工厂
该工厂有多条流水线
流水线工作需要电源
电源只有一个即cpu（单核cpu）
- 一个车间就是一个进程
  - 一个车间至少一条流水线（一个进程至少一个线程）
- 创建一个进程
  - 就是创建一个车间（申请空间，在该空间内建至少一条流水线）
- 而建线程
  - 就只是在一个车间内造一条流水线
  - 无需申请空间，所以创建开销小

【2】进程之间是竞争关系，线程之间是协作关系

车间直接是竞争/抢电源的关系，竞争
- 不同的进程直接是竞争关系
- 不同的程序员写的程序运行的迅雷抢占其他进程的网速
- 360把其他进程当做病毒干死
一个车间的不同流水线式协同工作的关系
- 同一个进程的线程之间是合作关系，是同一个程序写的程序内开启动
- 迅雷内的线程是合作关系，不会自己干自己

【三】线程和进程的区别

Threads share the address space of the process that created it; processes have their own address space.
- 线程共享创建它的进程的地址空间；进程具有自己的地址空间。
Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
- 线程可以直接访问其进程的数据段；进程具有其父进程数据段的副本。
Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
- 线程可以直接与其进程中的其他线程通信；进程必须使用进程间通信与同级进程进行通信。
New threads are easily created; new processes require duplication of the parent process.
- 新线程很容易创建；新进程需要复制父进程。
Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
- 线程可以对同一进程的线程行使相当大的控制权。进程只能控制子进程。
Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.
- 对主线程的更改（取消，优先级更改等）可能会影响该进程其他线程的行为；对父进程的更改不会影响子进程。
开设多进程的时候每一个进程之间的数据是相互隔离的
- 每一个人都有 1
对于多线程来说，所有线程共享一个进程中的数据
- 只有一个 1

【四】为何要有多线程

【1】开设进程

申请内存空间 -- 耗资源
拷贝代码 - 耗资源

【2】开设线程

一个进程内可以开设多个线程
在一个进程内开设多个线程无需再次申请内存空间及拷贝代码操作

【3】总结线程的优点

减少了资源的消耗
同一个进程下的多个线程资源共享

【4】什么是多线程

多线程指的是
- 在一个进程中开启多个线程
- 简单的讲：如果多个任务共用一块地址空间，那么必须在一个进程内开启多个线程。
多线程共享一个进程的地址空间
- 线程比进程更轻量级，线程比进程更容易创建可撤销，在许多操作系统中，创建一个线程比创建一个进程要快10-100倍，在有大量线程需要动态和快速修改时，这一特性很有用
若多个线程都是cpu密集型的，那么并不能获得性能上的增强
- 但是如果存在大量的计算和大量的I/O处理，拥有多个线程允许这些活动彼此重叠运行，从而会加快程序执行的速度。
在多cpu系统中，为了最大限度的利用多核，可以开启多个线程，比开进程开销要小的多。（这一条并不适用于Python）

【5】思考题

（1）案例需求：开发一款文字处理软件进程

获取用户输入的功能
实时展示到屏幕的功能
自动保存数据到硬盘的功能

# 开发一款文字处理软件 --- 进程还是线程       进程
# 获取用户输入的功能 --- 进程还是线程         线程
# 实时展示到屏幕的功能 --- 进程还是线程       线程
# 自动保存数据到硬盘的功能 --- 进程还是线程    线程

（2）针对上述功能进程合适还是线程合适？

开启一个文字处理软件进程
该进程肯定需要办不止一件事情，比如监听键盘输入，处理文字，定时自动将文字保存到硬盘
这三个任务操作的都是同一块数据，因而不能用多进程。
只能在一个进程里并发地开启三个线程
如果是单线程，那就只能是，键盘输入时，不能处理文字和自动保存，自动保存时又不能输入和处理文字。

【五】开设多线程的两种方式

【1】threading模块介绍

multiprocess模块的完全模仿了threading模块的接口
二者在使用层面，有很大的相似性，因而不再详细介绍

【2】开启线程的两种方式

开启线程不需要在main下面执行代码，直接书写即可
但是我们还是习惯性的将启动命令写在main下面

（1）方式一：直接调用 Thread 方法

from multiprocessing import Process
from threading import Thread
import time


def task(name):
    print(f'当前任务:>>>{name} 正在运行')
    time.sleep(3)
    print(f'当前任务:>>>{name} 结束运行')


def Thread_main():
    t = Thread(target=task, args=("silence",))
    # 创建线程的开销非常小，几乎代码运行的一瞬间线程就已经创建了
    t.start()
    '''
    当前任务:>>>silence 正在运行
    this is main process!
    当前任务:>>>silence 结束运行
    '''


def Process_main():
    p = Process(target=task, args=("silence",))
    p.start()
    '''
    this is main process!
    当前任务:>>>silence 正在运行
    当前任务:>>>silence 结束运行

    '''


if __name__ == '__main__':
    # Thread_main()
    Process_main()
    print('this is main process!')

（2）方式二：继承 Thread 父类

from threading import Thread
import time


class MyThread(Thread):

    def __init__(self, name):
        # 重写了别人的方法，又不知道别人的方法里面有什么， 就调用父类的方法
        super().__init__()
        self.name = name

    # 定义 run 函数
    def run(self):
        print(f'{self.name} is running')
        time.sleep(3)
        print(f'{self.name} is ending')


def main():
    t = MyThread('silence')
    t.start()
    print(f'this is a main process')

    """
    silence is running
    this is a main process
    silence is ending

    """

if __name__ == '__main__':
    main()

【六】同一个进程下的多个线程之间数据是共享的

from threading import Thread
from multiprocessing import Process

number = 999


def work(name):
    global number
    print(f'{name} change before {number}')
    number += 1
    print(f'{name} change after {number}')


def main_process():
    task_list = []
    for i in range(5):
        task = Process(
            target=work,
            args=(f'process_{i}',)
        )
        task.start()
        task_list.append(task)
    [task.join() for task in task_list]
    print(number)
'''
process_2 change before 999
process_2 change after 1000
process_0 change before 999
process_0 change after 1000
process_4 change before 999
process_4 change after 1000
process_3 change before 999
process_3 change after 1000
process_1 change before 999
process_1 change after 1000
999
'''


def main_thread():
    task_list = []
    for i in range(5):
        task = Thread(
            target=work,
            args=(f'thread_{i}',)
        )
        task.start()
        task_list.append(task)
    [task.join() for task in task_list]
    print(number)
'''
thread_0 change before 999
thread_0 change after 1000
thread_1 change before 1000
thread_1 change after 1001
thread_2 change before 1001
thread_2 change after 1002
thread_3 change before 1002
thread_3 change after 1003
thread_4 change before 1003
thread_4 change after 1004
1004
'''

if __name__ == '__main__':
    # main_process()
    main_thread()

【七】线程对象属性及其他方法

【1】同一个进程下的进程号相同

os.getpid()

【2】获取当前进程的名字

{current_thread().name

【3】统计当前活跃的线程数

active_count()

【4】守护线程

.daemon = True

守护进程要特别注意的是

from threading import Thread
from multiprocessing import Process
import time


def foo():
    print(f' this is foo begin')
    time.sleep(1)
    print(f' this is foo end')


def func():
    print(f' this is func begin')
    time.sleep(3)
    print(f' this is func end')


def main():
    t1 = Thread(target=foo)
    t2 = Thread(target=func)
    t1.daemon = True
    t1.start()
    t2.start()

    print(f' this is main')


if __name__ == '__main__':
    main()
    
    #  this is foo begin
    #  this is func begin
    #  this is main
    #  this is foo end
    #  this is func end

分析
- t1 是守护线程，会随着主线程的死亡而死亡
- 当多线程开启时，主线程运行，开启子线程
- 再开启主线程
- 主线程结束后会等待非守护子线程结束，所以需要等待t2，等待func结束运行
- 所以执行顺序是子线程1---子线程2---主线程---子线程1结束---子线程2结束

【八】多线程和多进程时间比较

使用爬虫爬数据来对比

import time

# 【一】需要两个模块
# 【1】模仿浏览器对网址发起请求
import requests  # pip install requests
# 【2】解析页面数据的模块
from lxml import etree  # pip install lxml
# 【3】模仿浏览器
from fake_useragent import UserAgent  # pip install fake-useragent

from multiprocessing import Process
from threading import Thread


# 【二】解析网页请求及数据
class SpiderImg(object):
    def __init__(self):
        self.base_area = 'https://pic.netbian.com'
        self.base_url = 'https://pic.netbian.com/4kdongman/'
        self.headers = {
            'User-Agent': UserAgent().random
        }

    def spider_tag_url(self):
        img_data_dict = {}
        response = requests.get(self.base_url, headers=self.headers)
        # response.encoding = 'utf-8'
        response.encoding = 'gbk'
        page_text = response.text
        tree = etree.HTML(page_text)
        li_list = tree.xpath('//*[@id="main"]/div[4]/ul/li')
        for li in li_list:
            # //*[@id="main"]/div[4]/ul/li[1]/a
            # ./a
            detail_href = self.base_area + li.xpath('./a/@href')[0]
            response = requests.get(detail_href, headers=self.headers)
            response.encoding = 'gbk'
            page_text = response.text
            tree = etree.HTML(page_text)
            img_url = self.base_area + tree.xpath('//*[@id="img"]/img/@src')[0]
            # https://pic.netbian.com/uploads/allimg/240521/232729-17163052491e1c.jpg
            img_title = img_url.split('/')[-1]
            # 240521/232729-17163052491e1c.jpg
            img_data_dict[img_title] = img_url
        return img_data_dict

    def download_img(self, img_url, img_title):
        # 获取到图片的二进制数据
        response = requests.get(img_url, headers=self.headers)
        img_data = response.content
        with open(f'{img_title}', 'wb') as fp:
            fp.write(img_data)
        print(f'当前下载 {img_title} 成功!')

    def main_process(self):
        start_time = time.time()
        img_data_dict = self.spider_tag_url()
        end_time = time.time()
        print(f'抓取所有图片连接数据 {len(img_data_dict)} 总耗时 :>>>> {end_time - start_time}s')
        task_list = []
        for img_title, img_url in img_data_dict.items():
            task = Process(
                target=self.download_img,
                kwargs={'img_url': img_url, 'img_title': img_title}
            )
            task.start()
            task_list.append(task)
        for task in task_list:
            task.join()

    def main_thread(self):
        start_time = time.time()
        img_data_dict = self.spider_tag_url()
        end_time = time.time()
        print(f'抓取所有图片连接数据  {len(img_data_dict)}  总耗时 :>>>> {end_time - start_time}s')
        task_list = []
        for img_title, img_url in img_data_dict.items():
            task = Thread(
                target=self.download_img,
                kwargs={'img_url': img_url, 'img_title': img_title}
            )
            task.start()
            task_list.append(task)
        for task in task_list:
            task.join()


if __name__ == '__main__':
    spider = SpiderImg()
    start_time = time.time()
    # spider.main_process()  # 下载所有图片总耗时 :>>>> 7.990673542022705s
    spider.main_thread()  # 下载所有图片总耗时 :>>>> 5.58322811126709s
    end_time = time.time()
    print(f'下载所有图片总耗时 :>>>> {end_time - start_time}s')
    
    # 一页二十张图片的数据多线程就比多进程短2s时间  多数据可想而知 还是多线程时间快

posted on 2024-05-22 17:21 silence^ 阅读(61) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

· 信号量（Semaphore），事件Event(了解)，队列补充，进程池和线程池（重点），协程理论，Greenlet，Gevent模块，asynico模块

· 进程间通信（管道）、多线程理论、开设多线程的两种方式、threading介绍、线程之间共享数据、多线程以及多进程时间比较

· 多线程与多进程

· 多进程多线程记录第一篇

zyb123

进程间通信（管道）,多线程

Ⅰ 进程间通信（管道）

【一】引入

【二】介绍

【1】管道类介绍

（1）创建管道对象

（2）主要的方法

【2】创建管道的类

【3】参数介绍

【4】主要方法

【5】次要方法

【三】代码实现

Ⅱ 多线程

【一】什么是线程

【1】举例说明

【2】小结

【二】线程的创建开销

【1】创建进程的开销要远大于线程

【2】进程之间是竞争关系，线程之间是协作关系

【三】线程和进程的区别

【四】为何要有多线程

【1】开设进程

【2】开设线程

【3】总结线程的优点

【4】什么是多线程

【5】思考题

（1）案例需求：开发一款文字处理软件进程

（2）针对上述功能进程合适还是线程合适？

【五】开设多线程的两种方式

【1】threading模块介绍

【2】开启线程的两种方式

（1）方式一：直接调用 Thread 方法

（2）方式二：继承 Thread 父类

【六】同一个进程下的多个线程之间数据是共享的

【七】线程对象属性及其他方法

【1】同一个进程下的进程号相同

【2】获取当前进程的名字

【3】统计当前活跃的线程数

【4】守护线程

【八】多线程和多进程时间比较

导航

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜