I/o 系统（待补充）

一：概述：
二:I/O硬件
- 设备类型：
- 三种常见设备接口类型：
- 设备访问特征:
- pcie ： peripherals component interconnect express （快）
- I/O结构：
- 端口：
- 总线 bus：
- 控制器：
三：cpu与外围设备的交互
cpu与设备的连接
- 设备控制器：
I/O指令和内存映射I/O
I/O请求生命周期：
I/O数据传输：
- dma
- I/O设备通知操作系统的机制

一：概述：

Because I/O devices vary so widely in their function and speed (consider a mouse, a hard disk, a

flash drive, and a tape robot), varied methods are needed to control them. 因为I/O设备的功能与速度差异很大(设想一下鼠标、硬盘及磁带机),所以需要采用不同方法来控制设备。

These methods form the I/O subsystem of the kernel, which separates the rest of the kernel from the

complexities of managing I/O devices. 这些方法构成了内核的I/O子系统,以便内核的其他部分不必涉及IO设备管理的复杂性。

I/O-device technology exhibits two conflicting trends. I/O设备技术呈现两个冲突趋势。

On the one hand,we see increasing standardization of software and hardware interfaces.

This trend helps us to incorporate improved device generations into existing computers and operating systems. 一方面,软件和硬件的接口标准化日益增长。这个趋势有助于将改进的设备集成到现有计算机和操作系统。incorporate into 使…成为…的一部分;

On the other hand,we see an increasingly broad variety of I/O devices. Some new devices are so unlike previous devices that it is a challenge to incorporate them into our computers and operating systems. 另一方面,IO设备的种类也日益增多。有些新设备与以前设备的差别如此之大,以致难以集成到计算机和操作系统。

This challenge is met by a combination of hardware and software techniques.这种挑战的解决需要采用硬件和软件的组合技术。
The basic I/O hardware elements, such as ports, buses, and device controllers,accommodate a wide variety of I/O devices.IO设备的基本要素,如端口、总线及设备控制器，用来适用各种各样的IO设备。

To encapsulate the details and oddities of different devices, the kernel of an operating system is structured to use device-driver modules. 为了封装不同设备的细节和奇怪之处，操作系统的内核被构造为使用设备驱动程序模块

The device drivers present a uniform deviceaccess interface to the I/O subsystem, much as system calls provide a standard interface between the application and the operating system.

设备驱动程序( device driver)为I/O子系统提供了统一的设备访问接口,就像系统调用为应用程序与操作系统之间提供了标准接口。

二:I/O硬件

设备类型：

存储设备(磁盘、磁带)
传输设备(网络连接、蓝牙)
人机交互设备(屏幕、键盘、鼠标、音频输入和输出)

三种常见设备接口类型：

字符设备:键盘/鼠标,串口等

块设备:磁盘驱动器、磁带驱动器、光驱等

网络设备:以太网、无线、蓝牙等是如何连接的以及软件如何控制硬件

设备访问特征:

字符设备：访问特征--》以字节为单位顺序访问

I/O命令---》 1、get()、put()等 2、通常使用文件访问接口和语义

块设备：访问特征--》均匀的数据块访问

I/O命令---》 1、原始I/O 或文件系统接口 2、内存映射文件访问磁盘映射到内存当中，用内存映射文件对磁盘进行访问

网络设备：访问特征--》格式化报文交换

				I/O命令--> 1、send/receive网络报文口 2、通过网络接口支持多种网络协议

pcie ： peripherals component interconnect express （快）

pci:外围设备组件相互连接 peripherals component interconnect

I/O结构：

北桥连接高速设备，南桥连接I/O设备

cpu 和外部设备组件的连接并不是直接连接的，而是通过芯片组来连接的

这个芯片组就是北桥，cpu和北桥连接在一起，然后北桥和其他最重要的外围设备连接在一起。如图所示，因为内存、显卡的吞吐量比较大所以需要跟cpu挨的紧一点。。

南桥连接的是低速的设备，新式硬盘stat接口、usb接口、慢速的pci设备声卡网卡。

![(C:\Users\tangli\AppData\Roaming\Typora\typora-user-images\image-20210808132444813.png)

现在北桥的芯片整合到cpu中，有的cpu中自带显卡。。，目的是为了二进制信号走的路少走一点

端口：

设备与计算机系统的通信,可以通过电缆甚至空气来发送信息.

设备与计算机的通信通过一个连接点或端口(port),例如,串行端口

总线 bus：

设备共享的一组通用线路

If devices share a common set of wires, the connection is called a bus. 设备共享的一组通用线路

A bus ~~, like the PCI bus used in most computers today,~~ is a set of wires and a rigidly defined protocol that specifies a set of messages that can be sent on the wires. 线路

In terms of the electronics, the messages are conveyed by patterns of electrical voltages applied to the wires with defined timings

通过对线路施加具有一组定义号的时序电压来传递消息

Buses are used widely in computer architecture and vary in their signaling methods, speed, throughput, and connection methods. 总线在计算机体系结构中应用广泛,它们在信令方法、速度、吞吐量和连接方法等方面差异很大。

A typical PC bus structure appears in Figure 12.1 一个典型的PC总线结构如图12.1所示。

a PCIe bus： (the common PC system bus) connects the processor–memory subsystem to fast devices PC总线(PCI bus)(常用PC系统总线)将处理器-内存子系统连到快速设备
an expansion bus ： connects relatively slow devices, such as the keyboard and serial and USB ports. 扩展总线( expansion bus)连接相对较慢的设备,如键盘和串口和USB端口

In the lower-left portion of the figure, four disks are connected together on a serial-attached SCSI (SAS) bus plugged into an SAS
controller. 在图的左下部分,四个磁盘通过插入 SAS 控制器的串行连接 SCSI (SAS) 总线连接在一起。 scsi总线是嵌入刀sas控制器中

上面是更加形象的图。应该磁盘控制器中包含scsi总线

控制器：

cpu只是用来发布命令的设备，真正控制键盘、打印机、外设其实靠的是控制器

适配器比较大，控制器比较小其实适配器也叫控制器

A controller is a collection of electronics that can operate a port, a bus, or a device. 控制器( controller)是可以操作端口、总线或设备的一组电子器件。A serial-port controller is a simple device controller.串行端口控制器是一个简单的设备控制器。

It is a single chip (or portion of a chip) in the computer that controls the signals on the wires of a serial port.它是计算机内的单个芯片(或芯片的一部分),用于控制串口线路的信号。

By contrast, a fibr channel (FC) bus controller is not simple. 相比之下,FC总线控制器并不简单。
Because the FC protocol is complex and used in data centers rather than on PCs, the FC bus controller is often implemented as a separate circuit board —or a host bus adapter (HBA)—that connects to a bus in the computer.因为FC协议复杂,FC总线控制器通常为单独的电路板(或主机适配器( host adapter)),可以连到计算机。

It typically contains a processor, microcode, and some private memory to enable it to process the FC protocol messages.它通常包含处理器、微代码和一些专用内存,能够处理FC协议消息。

Some devices have their own built-in controllers. 有些设备有内置的控制器。If you look at a disk drive, you will see a circuit board attached to one side. This board is the disk controller. 如果观察一下磁盘,则会看到附在一边的线路板,该板就是磁盘控制器。

It implements the disk side of the protocol for some kinds of connection—SAS and SATA, 它实现了某种连接协议(例如scsI或串行高级技术连接( Serial Advanced Technology Attachment,sat))的磁盘一端的部分。

for instance. It has microcode and a processor to do many tasks, such as bad-sector mapping,
prefetching, buffering, and caching.它有微码和处理器来处理许多任务,如坏簇映射、预取、缓冲和高速缓存。

How does the processor give commands and data to a controller to accomplish an I/O transfer? The short answer is that the controller has one or more registers for data and control signals.处理器如何对控制器发出命令和数据以便完成IO传输?答案是,控制器具有一个或多个寄存器,用于数据和控制信号。The processor communicates with the controller by reading and writing bit patterns in these registers. cpu（处理器）通过读写这些寄存器的位模式来与控制器通信

One way in which this communication can occur is through the use of special I/O instructions that specify the transfer of a byte or a word to an I/O port address.这种通信的一种方式是,通过使用特殊I/O指令针对IO端口地址传输一个字节或字。The I/O instruction triggers bus lines to select the proper device and to move bits into or out of a device register.I/O指令触发总线线路,选择适当设备,并将位移入或移出设备寄存器。

Alternatively (as another option or possibility.) the device can support memory-mapped I/O. In this case, the device-control registers are mapped into the address space of the processor. The CPU executes I/O requests using the standard data transfer instructions to read and write the device-control registers at their mapped locations in physical memory.

另一种方式,设备控制器可以支持内存映射l/O( memory-mapped-I/O)。在这种情况下,设备控制寄存器被映射到CPU处理器的地址空间。此时设备控制寄存器相当于一块内存CPU处理器执行I/o请求是通过标准数据传输指令读写映射到物理内存的设备控制器寄存器。

三：cpu与外围设备的交互

阻塞I/o

用户发送I/O请求，然后这个请求会送到操作系统中内核中的设备驱动，设备驱动会把它转换成硬件控制，控制你硬件进行相应的操作，硬件操作完成之后，它会产生中断。由内核当中的中断处理例程进行响应。最后把响应送到设备驱动，回到用户态

非阻塞I/O

异步I/O

通过系统调用把要写的数据告诉设备驱动，设备驱动控制硬件进程操作，控制完成之后不会等待结果直接返回。

设备操作完成之后它会产生中断。由内核当中的中断处理例程进行响应。最后把响应送到设备驱动，回到用户态

cpu与设备的连接

寄存器：数据、状态、控制的交互

映射到内存中，对内存区域的访问对应过来就是I/O设备的访问

设备控制器：

CPU和I/O设备间的接口
向CPU提供特殊指令和寄存器

cpu 通过I/O总线向I/O设备发送I/O地址

I/O总线和设备控制器中有总线适配器

I/O总线映射过来的可能是I/O空间的端口号或者内存地址

I/O端口有相应的I/O指令内存地址：直接访问存储就对应着对I/O设备的访问

I/O地址： CPU用来控制I/O硬件

内存地址或端口号：

内存映射I/O

I/O指令

设备产生中断请求后，在中断控制器中进行汇总，然后送给cpu。cpu就可以对外部设备的中断事件做出响应

cpu与设备的通信方式：

1、轮询

2、设备中断

3、DMA

不用中断控制器，cpu直接访问I/O端口或者说直接访问设备所对应的地址空间

我也可以采用中断的方式，外部设备有事件要通知cpu，通过中断控制器到cpu

DMA：外部设备需要把地址直接放到内存当中，通过dma控制器把I/o数据放置内存单元

I/O指令和内存映射I/O

I/O指令: 通过I/O端口号访问设备寄存器

I/O请求生命周期：

I/O数据传输：

dma

I/O设备通知操作系统的机制

轮询

可屏蔽中断和非屏蔽中断区别

按照是否可以被屏蔽，可将中断分为两大类：不可屏蔽中断（又叫非屏蔽中断）和可屏蔽中断。

不可屏蔽中断源一旦提出请求，cpu必须无条件响应，而对于可屏蔽中断源的请求，cpu可以响应，也可以不响应。cup一般设置两根中断请求输入线：可屏蔽中断请求INTR(Interrupt Require)和不可屏蔽中断请求NMI(Nonmaskable Interrupt)。对于可屏蔽中断，除了受本身的屏蔽位的控制外，还都要受一个总的控制，即CPU标志寄存器中的中断允许标志位IF(Interrupt Flag)的控制，IF位为1，可以得到CPU的响应，否则，得不到响应。IF位可以有用户控制，指令STI或Turbo c的Enable()函数，将IF位置1(开中断)，指令CLI或Turbo_c 的Disable()函数，将IF位清0(关中断)。

典型的非屏蔽中断源的例子是电源掉电，一旦出现，必须立即无条件地响应，否则进行其他任何工作都是没有意义的。典型的可屏蔽中断源的例子是打印机中断，CPU对打印机中断请求的响应可以快一些，也可以慢一些，因为让打印机等待儿是完全可以的。

中断是什么？

先来看看什么是中断？在计算机中，中断是系统用来响应硬件设备请求的一种机制，操作系统收到硬件的中断请求，会打断正在执行的进程，然后调用内核中的中断处理程序来响应请求。

中断首先是处理器提供的一种响应外设请求的机制。一个外设通过产生一种电信号通知中断控制器，中断控制器再向处理器发送相应的信号。处理器检测到了这个信号后就会打断自己当前正在做的工作，转而去处理这次中断（所以才叫中断）。当然在转去处理中断和中断返回时都有保护现场和返回现场的操作，这里不赘述，我在下面问题的回答里解释过一些，可以参考：

为什么系统调用时要把一些寄存器保存到内核栈又从内核栈恢复？

不同的设备会对应不同的中断号，不同的中断也会有不同的中断处理函数，中断处理函数一般在设备驱动注册时一同注册，这样一来哪个设备有了事件就能产生对应的中断，并找到对应的中断处理程序来执行了。

一个硬件中断的大致过程描述是下面这样（非绝对，依具体情况而定，意会一下即可）：

+---------+   产生中断      +----------+   通知    +-----+
| 硬件设备 | -------------> | 中断控制器 | -------> | CPU |
+---------+                +----------+          +-----+
                                                    |
                                                    V
                                                 [中断内核]
                                                    |
                                                    V
                       [是否存在中断处理程序？] <--- do_IRQ()
                                 |
                         +-------+-------+
                         |Y             N|
                         V               |
                  handle_IRQ_event       |
                         |               |
                         V               |
                   执行中断处理程序         |
                         |               V
                         +----------> irq_exit ----> 恢复现场 .....

上面的解释可能过于学术了，容易云里雾里，我就举个生活中取外卖的例子。

小林中午搬完砖，肚子饿了，点了份白切鸡外卖。虽然平台上会显示配送进度，但是我也不能一直傻傻地盯着呀，时间很宝贵，当然得去干别的事情，等外卖到了配送员会通过「电话」通知我，电话响了，我就会停下手中地事情，去拿外卖。

这里的打电话，其实就是对应计算机里的中断，没接到电话的时候，我可以做其他的事情，只有接到了电话，也就是发生中断，我才会停下当前的事情，去进行另一个事情，也就是拿外卖。

从这个例子，我们可以知道，中断是一种异步的事件处理机制，可以提高系统的并发处理能力。

操作系统收到了中断请求，会打断其他进程的运行，所以中断请求的响应程序，也就是中断处理程序，要尽可能快的执行完，这样可以减少对正常进程运行调度地影响。

而且，中断处理程序在响应中断时，可能还会「临时关闭中断」，这意味着，如果当前中断处理程序没有执行完之前，系统中其他的中断请求都无法被响应，也就说中断有可能会丢失，所以中断处理程序要短且快。

还是回到外卖的例子，小林到了晚上又点起了外卖，这次为了犒劳自己，共点了两份外卖，一份小龙虾和一份奶茶，并且是由不同地配送员来配送，那么问题来了，当第一份外卖送到时，第一位配送员给我打了长长的电话，说了一些杂七杂八的事情，比如给个好评等等，但如果这时第二位配送员也想给我打电话。

很明显，这时第二位配送员因为我在通话中（相当于关闭了中断响应），自然就无法打通我的电话，他可能尝试了几次后就走掉了（相当于丢失了一次中断）

+---------+   产生中断      +----------+   通知    +-----+
| 硬件设备 | -------------> | 中断控制器 | -------> | CPU |
+---------+                +----------+          +-----+
                                                    |
                                                    V
                                                 [中断内核]
                                                    |
                                                    V
                       [是否存在中断处理程序？] <--- do_IRQ()
                                 |
                         +-------+-------+
                         |Y             N|
                         V               |
                  handle_IRQ_event       |
                         |               |
                         V               |
                   执行中断处理程序         |
                         |               V
                         +----------> irq_exit ----> 恢复现场 .....

什么是软中断？

前面我们也提到了，中断请求的处理程序应该要短且快，这样才能减少对正常进程运行调度地影响，而且中断处理程序可能会暂时关闭中断，这时如果中断处理程序执行时间过长，可能在还未执行完中断处理程序前，会丢失当前其他设备的中断请求。

在中断处理时CPU没法处理其它事物，

那 Linux 系统为了解决中断处理程序执行过长和中断丢失的问题，将中断过程分成了两个阶段，分别是「上半部和下半部分」。

上半部用来快速处理中断，一般会暂时关闭中断请求，主要负责处理跟硬件紧密相关或者时间敏感的事情。
下半部用来延迟处理上半部未完成的工作，一般以「内核线程」的方式运行。

前面的外卖例子，由于第一个配送员长时间跟我通话，则导致第二位配送员无法拨通我的电话，其实当我接到第一位配送员的电话，可以告诉配送员说我现在下楼，剩下的事情，等我们见面再说（上半部），然后就可以挂断电话，到楼下后，在拿外卖，以及跟配送员说其他的事情（下半部）。

这样，第一位配送员就不会占用我手机太多时间，当第二位配送员正好过来时，会有很大几率拨通我的电话。

再举一个计算机中的例子，常见的网卡接收网络包的例子。

于网卡来说，如果每次网卡收包时中断的时间都过长，那很可能造成丢包的可能性。（因为中断处理程序在处理中断请求的时候，会临时关闭中断。。也就是此时如果网卡受到其他包时候，控制器无法继续向cpu发送其他中断信号）

当然我们不能完全避免丢包的可能性，以太包的传输是没有100%保证的，所以网络才有协议栈，通过高层的协议来保证连续数据传输的数据完整性（比如在协议发现丢包时要求重传）。

但是即使有协议保证，那我们也不能肆无忌惮的使用中断，中断的时间越短越好，尽快放开处理器，让它可以去响应下次中断甚至进行调度工作。

对于网卡收包来说：

网卡收到网络包后，会通过硬件中断通知内核有新的数据到了，于是内核就会调用对应的中断处理程序来响应网卡发出的中断信号，为了让中断的处理事件的事件越短越好，我们将中断处理事件分成了上下两部分:

上半部：网卡收到数据包，通知内核数据包到了，中断处理将只要把网卡的数据读到内存中，然后更新一下硬件寄存器的状态，比如把状态更新为表示数据已经读到内存中的状态值。

下半部：软中断就是下半部使用的一种机制，它通过软件模仿硬件中断的处理过程，但是和硬件没有关系，单纯的通过软件达到一种异步处理的方式。

接上半部，内核会触发一个软中断，把一些处理比较耗时且复杂的事情，交给「软中断处理程序」去做，也就是中断的下半部，其主要是需要从内存中找到网络数据，再按照网络协议栈，对网络数据进行逐层解析和处理，最后把数据送给应用程序。即解析处理数据包的工作则可以放到下半部去执行。

所以，中断处理程序的上部分和下半部可以理解为：

上半部直接处理硬件请求，也就是硬中断，主要是负责耗时短的工作，特点是快速执行；
下半部是由内核触发，也就说软中断，主要是负责上半部未完成的工作，通常都是耗时比较长的事情，特点是延迟执行；

还有一个区别，硬中断（上半部）是会打断 CPU 正在执行的任务，然后立即执行中断处理程序，而软中断（下半部）是以内核线程的方式执行，并且每一个 CPU 都对应一个软中断内核线程，名字通常为「ksoftirqd/CPU 编号」，比如 0 号 CPU 对应的软中断内核线程的名字是 ksoftirqd/0

其它下半部的处理机制还包括tasklet，工作队列等。依据所处理的场合不同，选择不同的机制，网卡收包一般使用软中断。对应NET_RX_SOFTIRQ这个软中断，软中断不只是包括硬件设备中断处理程序的下半部，一些内核自定义事件也属于软中断，比如内核调度等、RCU 锁（内核里常用的一种锁）等。

网卡收包的中断过程

注意：之前提过

硬中断（上半部）是会打断 CPU 正在执行的任务，然后立即执行中断处理程序，

软中断（下半部）是以内核线程的方式执行，并且每一个 CPU 都对应一个软中断内核线程，名字通常为「ksoftirqd/CPU 编号」，比如 0 号 CPU 对应的软中断内核线程的名字是 ksoftirqd/0

1、网卡硬中断的注册

一个网卡收到数据包，它首先要做的事情就是通知处理器进行中断处理。其实就是通知cpu去执行设备驱动程序中的中断处理程序代码

不同的外设有不同的中断和中断处理函数，所以要研究网卡的中断我们得以一个具体的网卡驱动为例，比如e1000。其模块初始化函数就是：

static int __init e1000_init_module(void)
{
        int ret;
        pr_info("%s - version %s\n", e1000_driver_string, e1000_driver_version);

        pr_info("%s\n", e1000_copyright);

        ret = pci_register_driver(&e1000_driver);
...
...
        return ret;
}

static int __init e1000_init_module(void)
{
        int ret;
        pr_info("%s - version %s\n", e1000_driver_string, e1000_driver_version);

        pr_info("%s\n", e1000_copyright);

        ret = pci_register_driver(&e1000_driver);
...
...
        return ret;
}

其中e1000_driver这个结构体是一个关键，它的赋值如下：

static struct pci_driver e1000_driver = {
        .name     = e1000_driver_name,
        .id_table = e1000_pci_tbl,
        .probe    = e1000_probe,
        .remove   = e1000_remove,
        .driver = {
                .pm = &e1000_pm_ops,
        },
        .shutdown = e1000_shutdown,
        .err_handler = &e1000_err_handler
};

其中很主要的一个方法就是.probe方法，也就是e1000_probe()：

/**                                                                                                                                                                                            
 * e1000_probe - Device Initialization Routine                                                                                                                                                 
 * @pdev: PCI device information struct                                                                                                                                                        
 * @ent: entry in e1000_pci_tbl                                                                                                                                                                
 *                                                                                                                                                                                             
 * Returns 0 on success, negative on failure                                                                                                                                                   
 *                                                                                                                                                                                             
 * e1000_probe initializes an adapter identified by a pci_dev structure.                                                                                                                       
 * The OS initialization, configuring of the adapter private structure,                                                                                                                        
 * and a hardware reset occur.                                                                                                                                                                 
 **/
static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
...
...
        netdev->netdev_ops = &e1000_netdev_ops;
        e1000_set_ethtool_ops(netdev);
...
...
}

这个函数很长，我们不都列出来，这是e1000主要的初始化函数，即使从注释都能看出来。我们留意其注册了netdev的netdev_ops，用的是e1000_netdev_ops这个结构体：

static const struct net_device_ops e1000_netdev_ops = {
        .ndo_open               = e1000_open,
        .ndo_stop               = e1000_close,
        .ndo_start_xmit         = e1000_xmit_frame,
        .ndo_set_rx_mode        = e1000_set_rx_mode,
        .ndo_set_mac_address    = e1000_set_mac,
        .ndo_tx_timeout         = e1000_tx_timeout,
...
...
};

这个e1000的方法集里有一个重要的方法，e1000_open，我们要说的中断的注册就从这里开始：

/**                                                                                                                                                                                            
 * e1000_open - Called when a network interface is made active                                                                                                                                 
 * @netdev: network interface device structure                                                                                                                                                 
 *                                                                                                                                                                                             
 * Returns 0 on success, negative value on failure                                                                                                                                             
 *                                                                                                                                                                                             
 * The open entry point is called when a network interface is made                                                                                                                             
 * active by the system (IFF_UP).  At this point all resources needed                                                                                                                          
 * for transmit and receive operations are allocated, the interrupt                                                                                                                            
 * handler is registered with the OS, the watchdog task is started,                                                                                                                            
 * and the stack is notified that the interface is ready.                                                                                                                                      
 **/
int e1000_open(struct net_device *netdev)
{
        struct e1000_adapter *adapter = netdev_priv(netdev);
        struct e1000_hw *hw = &adapter->hw;
...
...
        err = e1000_request_irq(adapter);
...
}

e1000在这里注册了中断：

static int e1000_request_irq(struct e1000_adapter *adapter)
{
        struct net_device *netdev = adapter->netdev;
        irq_handler_t handler = e1000_intr;
        int irq_flags = IRQF_SHARED;
        int err;

        err = request_irq(adapter->pdev->irq, handler, irq_flags, netdev->name,
...
...
}

中断处理程序是写在设备驱动中的

如上所示，这个被注册到中断处理程序的硬中断处理函数，也就是handler，就是e1000_intr()。我们不展开这个中断处理函数看了，我们知道中断处理函数在这里被注册了，在网络包来的时候会触发这个中断函数。

到这里，网卡硬中断的注册完成

2、注册软中断

上面我们看到了网卡硬中断的注册，我们下面看一下软中断处理的注册。我们在一开始提到了网卡收包时使用的软中断是NET_RX_SOFTIRQ，我们就在内核中查找这个关键字，看看这个注册的位置在哪。踏破铁鞋无觅处，得来全不费工夫，原来这个注册的位置在这里：

/*                                                                                                                                                                                             
 *      Initialize the DEV module. At boot time this walks the device list and                                                                                                                 
 *      unhooks any devices that fail to initialise (normally hardware not                                                                                                                     
 *      present) and leaves us with a valid list of present and active devices.                                                                                                                
 *                                                                                                                                                                                             
 */
...
static int __init net_dev_init(void)
{
...
...     // open_softirq()函数就是注册软中断用的函数，向它指定软中断号NET_RX_SOFTIRQ和软中断处理函数 net_rx_action()就可         // 以完成注册了。
        open_softirq(NET_TX_SOFTIRQ, net_tx_action);
        open_softirq(NET_RX_SOFTIRQ, net_rx_action);
...
...
}

3、从硬中断到软中断

现在一个网络包来了，会产生中断，会执行do_IRQ。关于do_IRQ的实现有很多，不同硬件对中断的处理都会有所不同，但一个基本的执行思路就是：

void __irq_entry do_IRQ(unsigned int irq)  //do_IRQ[98]   void do_IRQ(struct pt_regs *regs, int irq)
{                                                                                              
        irq_enter();//*** arch/sh/kernel/irq.c: |do_IRQ[185]   asmlinkage __irq_entry int do_IRQ(unsigned           //int irq, struct pt_regs *regs)                                                                     
        generic_handle_irq(irq);                                                               
        irq_exit();                                                                            
}

我们没必要都展开，让我们专注我们的问题。do_IRQ会执行上面e1000_intr这个中断处理函数，这个中断处理是属于上半部的处理，在do_IRQ的结尾会调用irq_exit()，这是软中断和中断衔接的一个地方。我们重点说一下这里。

void irq_exit(void)
{
        __irq_exit_rcu();
        rcu_irq_exit();
         /* must be last! */
        lockdep_hardirq_exit();
}

static inline void __irq_exit_rcu(void)
{
#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
        local_irq_disable();
#else
        lockdep_assert_irqs_disabled();
#endif
        account_irq_exit_time(current);
        preempt_count_sub(HARDIRQ_OFFSET);
        if (!in_interrupt() && local_softirq_pending())
                invoke_softirq();

        tick_irq_exit();
}

在irq_exit()的第一步就是一个local_irq_disable()，也就是说禁止了中断，不再响应中断。因为下面要处理所有标记为要处理的软中断，关中断是因为后面要清除这些软中断，将CPU软中断的位图中置位的位清零，这需要关中断，防止其它进程对位图的修改造成干扰。??

然后preempt_count_sub(HARDIRQ_OFFSET)，硬中断的计数减1，表示当前的硬中断到这里就结束了。但是如果当前的中断是嵌套在其它中断里的话，这次减1后不会计数清0，如果当前只有这一个中断的话，这次减1后计数会清0。注意这很重要。

因为接下来一步判断!in_interrupt() && local_softirq_pending()，第一个!in_interrupt()就是通过计数来判断当前是否还处于中断上下文中，如果当前还有未完成的中断，则直接退出当前中断 tick_irq_exit();。未完成的中断即后半部的执行在后续适当的时机再进行，这个“适当的时机”比如ksoftirqd守护进程的调度，或者下次中断到此正好不在中断上下文的时候等情况。

我们现在假设当前中断结束后没有其它中断了，也就是不在中断上下文了，且当前CPU有等待处理的软中断，即local_softirq_pending()也为真。那么执行invoke_softirq()。

static inline void invoke_softirq(void)
{
        if (ksoftirqd_running(local_softirq_pending()))
                return;

        if (!force_irqthreads) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
                /*                                                                                                                                                                             
                 * We can safely execute softirq on the current stack if                                                                                                                       
                 * it is the irq stack, because it should be near empty                                                                                                                        
                 * at this stage.                                                                                                                                                              
                 */
                __do_softirq();
#else
                /*                                                                                                                                                                             
                 * Otherwise, irq_exit() is called on the task stack that can                                                                                                                  
                 * be potentially deep already. So call softirq in its own stack                                                                                                               
                 * to prevent from any overrun.                                                                                                                                                
                 */
                do_softirq_own_stack();
#endif
        } else {
                wakeup_softirqd();
        }
}

这个函数的逻辑很简单，首先如果ksoftirqd正在被执行，那么我们不想处理被pending的软中断，交给ksoftirqd线程来处理，这里直接退出。

如果ksoftirqd没有正在运行，那么判断force_irqthreads，也就是判断是否配置了CONFIG_IRQ_FORCED_THREADING，是否要求强制将软中断处理都交给ksoftirqd线程。因为这里明显要在中断处理退出的最后阶段处理软中断，但是也可以让ksoftirqd来后续处理。如果设置了force_irqthreads，则不再执行__do_softirq()，转而执行wakeup_softirqd()来唤醒ksoftirqd线程，将其加入可运行队列，然后退出。

如果没有设置force_irqthreads，那么就执行__do_softirq():

asmlinkage __visible void __softirq_entry __do_softirq(void)
{
...
...
        pending = local_softirq_pending();
        account_irq_enter_time(current);

        __local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
        in_hardirq = lockdep_softirq_start();

restart:
        /* Reset the pending bitmask before enabling irqs */
        set_softirq_pending(0);

        local_irq_enable();

        h = softirq_vec;

        while ((softirq_bit = ffs(pending))) {
...
...
        }

        if (__this_cpu_read(ksoftirqd) == current)
                rcu_softirq_qs();
        local_irq_disable();

        pending = local_softirq_pending();
        if (pending) {
                if (time_before(jiffies, end) && !need_resched() &&
                    --max_restart)
                        goto restart;

                wakeup_softirqd();
        }

        lockdep_softirq_end(in_hardirq);
        account_irq_exit_time(current);
        __local_bh_enable(SOFTIRQ_OFFSET);
        WARN_ON_ONCE(in_interrupt());
        current_restore_flags(old_flags, PF_MEMALLOC);
}

注意在函数开始时就先执行了一个__local_bh_disable_ip(RET_IP, SOFTIRQ_OFFSET)，表示当前要处理软中断了，在这种情况下是不允许睡眠的，也就是不能进程调度。这点很重要，也很容易混淆，加上前面我们说的irq_exit()开头的local_irq_disable()，所以当前处在一个既禁止硬中断，又禁止软中断，不能睡眠不能调度的状态。很多人就容易将这种状态归类为“中断上下文”，我个人认为是不对的。从上面in_interrupt函数的定义来看，是否处于中断上下文和preempt_count对于中断的计数有关：

#define irq_count()     (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \
                                 | NMI_MASK))

#define in_interrupt()          (irq_count())

和是否禁止了中断没有直接的关系。虽然中断上下文应该不允许睡眠和调度，但是不能睡眠和调度的时候不等于in_interrupt，比如spin_lock的时候也是不能睡眠的（这是目前我个人观点）。但是很多程序员之所以容易一概而论，是因为对于内核程序员来讲，判断自己所编程的位置是否可以睡眠和调度是最被关心的，所以禁用了中断后不能调度和睡眠就很容易被归类为在中断上下文，实际上我个人认为这应该算一个误解，或者说是“变相扩展”后的说辞。一切还要看我们对中断上下文这个概念的界定，如果像in_interrupt那样界定，那关不关中断和是否处于中断上下文就没有直接的关系。

下面在__do_softirq开始处理软中断（执行每一个待处理的软中断的action）前还有一个很关键的地方，就是local_irq_enable()，这就打开了硬件中断，然后后面的软中断处理可以在允许中断的情况下执行。注意这时候__local_bh_disable_ip(RET_IP, SOFTIRQ_OFFSET)仍然有效，睡眠仍然是不允许的。

到这里我们可以看到，内核是尽量做到能允许中断就尽量允许，能允许调度就尽量允许，因为无情的禁止是对CPU资源最大的浪费，也是对外设中断的不负责。否则长期处于禁止中断的情况下，网卡大量丢包将是难免的，而这也将是制约成网卡实际速率的瓶颈。

系统里有哪些软中断？

在 Linux 系统里，我们可以通过查看 /proc/softirqs 的内容来知晓「软中断」的运行情况，以及 /proc/interrupts 的内容来知晓「硬中断」的运行情况。

接下来，就来简单的解析下 /proc/softirqs 文件的内容，在我服务器上查看到的文件内容如下：

你可以看到，每一个 CPU 都有自己对应的不同类型软中断的累计运行次数，有 3 点需要注意下。

第一点，要注意第一列的内容，它是代表着软中断的类型，在我的系统里，软中断包括了 10 个类型，分别对应不同的工作类型，比如 NET_RX 表示网络接收中断，NET_TX 表示网络发送中断、TIMER 表示定时中断、RCU 表示 RCU 锁中断、SCHED 表示内核调度中断。

第二点，要注意同一种类型的软中断在不同 CPU 的分布情况，正常情况下，同一种中断在不同 CPU 上的累计次数相差不多，比如我的系统里，NET_RX 在 CPU0 、CPU1、CPU2、CPU3 上的中断次数基本是同一个数量级，相差不多。

第三点，这些数值是系统运行以来的累计中断次数，数值的大小没什么参考意义，但是系统的中断次数的变化速率才是我们要关注的，我们可以使用 watch -d cat /proc/softirqs 命令查看中断次数的变化速率。

前面提到过，软中断是以内核线程的方式执行的，我们可以用 ps 命令可以查看到，下面这个就是在我的服务器上查到软中断内核线程的结果：

可以发现，内核线程的名字外面都有有中括号，这说明 ps 无法获取它们的命令行参数，所以一般来说，名字在中括号里到，都可以认为是内核线程。

而且，你可以看到有 4 个 ksoftirqd 内核线程，这是因为我这台服务器的 CPU 是 4 核心的，每个 CPU 核心都对应着一个内核线程。

如何定位软中断 CPU 使用率过高的问题？

要想知道当前的系统的软中断情况，我们可以使用 top 命令查看，下面是一台服务器上的 top 的数据：

上图中的黄色部分 si，就是 CPU 在软中断上的使用率，而且可以发现，每个 CPU 使用率都不高，两个 CPU 的使用率虽然只有 3% 和 4% 左右，但是都是用在软中断上了。

另外，也可以看到 CPU 使用率最高的进程也是软中断 ksoftirqd，因此可以认为此时系统的开销主要来源于软中断。

如果要知道是哪种软中断类型导致的，我们可以使用 watch -d cat /proc/softirqs 命令查看每个软中断类型的中断次数的变化速率。

一般对于网络 I/O 比较高的 Web 服务器，NET_RX 网络接收中断的变化速率相比其他中断类型快很多。

如果发现 NET_RX 网络接收中断次数的变化速率过快，接下里就可以使用 sar -n DEV 查看网卡的网络包接收速率情况，然后分析是哪个网卡有大量的网络包进来。

接着，在通过 tcpdump 抓包，分析这些包的来源，如果是非法的地址，可以考虑加防火墙，如果是正常流量，则要考虑硬件升级等。

总结

为了避免由于中断处理程序执行时间过长，而影响正常进程的调度，Linux 将中断处理程序分为上半部和下半部：

上半部，对应硬中断，由硬件触发中断，用来快速处理中断；
下半部，对应软中断，由内核触发中断，用来异步处理上半部未完成的工作；

Linux 中的软中断包括网络收发、定时、调度、RCU 锁等各种类型，可以通过查看 /proc/softirqs 来观察软中断的累计中断次数情况，如果要实时查看中断次数的变化率，可以使用 watch -d cat /proc/softirqs 命令。

每一个 CPU 都有各自的软中断内核线程，我们还可以用 ps 命令来查看内核线程，一般名字在中括号里面到，都认为是内核线程。

如果在 top 命令发现，CPU 在软中断上的使用率比较高，而且 CPU 使用率最高的进程也是软中断 ksoftirqd 的时候，这种一般可以认为系统的开销被软中断占据了。

这时我们就可以分析是哪种软中断类型导致的，一般来说都是因为网络接收软中断导致的，如果是的话，可以用 sar 命令查看是哪个网卡的有大量的网络包接收，再用 tcpdump 抓网络包，做进一步分析该网络包的源头是不是非法地址，如果是就需要考虑防火墙增加规则，如果不是，则考虑硬件升级等。

设备中断：

基本中断机制的工作原理如下：

The basic interrupt mechanism works as follows.

1、 The CPU hardware has a wire called the interrupt-request line that the CPU senses after executing every instruction. CPU硬件有一条线,称作中断请求线( Interrupt-Request-Line,IRL);CPU在执行完每条指令后,都会检测IRL。

2、When the CPU detects that a controller has asserted a signal on the interrupt-request line, the CPU performs a state save and jumps to the interrupt-handler routine at a fixed address in memory.当CPU检测到设备控制器已在IRL上发出了一个信号时,CPU执行保存状态操作并且跳到内存固定位置的中断处理程序(interrupt handler routine)。

3、The interrupt handler determines the cause of the interrupt, performs the necessary processing,performs a state restore, and executes a return from interrupt instruction to return the CPU to the execution state prior to the interrupt.

中断处理程序确定中断原因,执行必要处理,执行状态恢复（是不是叫做恢复现场）,并且执行返回中断指令以便CPU回到中断前的执行状态。

We say that the device controller raises an interrupt by asserting a signal on the interrupt request line, the CPU catches the interrupt and dispatches it to the interrupt handler, and the handler clears the interrupt by servicing the device

我们说,设备控制器通过中断请求线发送信号而引起( raise)中断,CPU捕获( catch)中断并且分派( dispatch)到中断处理程序,中断处理程序通过处理设备来清除( clear)中断。

summarizes the interrupt-driven I/O cycle 图12.3总结了中断驱动的IO循环。

操作系统概念10 中提到中断的例子是Io设备相关的。。

Consider a typical computer operation:
a program performing I/O.
To start an I/O operation, the device driver loads the appropriate registers in the device controller.
The device controller, in turn, examines the contents of these registers to determine what action to take (such as “read a character from the keyboard”).
The controller starts the transfer of data from the device to its local buffer.
Once the transfer of data is complete, the device controller informs the device driver that it has finished its operation.
The device driver then gives control to other parts of the operating system, possibly returning the data or a pointer to the data if the operation was a read.
For other operations, the device driver returns status information such as “write completed successfully” or “device busy”.
But how does the controller inform the device driver that it has finished its operation? This is accomplished via an interrupt.

考虑一个典型的计算机操作：
执行 I/O 的程序。
要启动 I/O 操作，设备驱动程序会在设备控制器中加载适当的寄存器。
设备控制器依次检查这些寄存器的内容以确定要采取的操作（例如“从键盘读取字符”）。
控制器开始将数据从设备传输到其本地缓冲。
数据传输完成后，设备控制器通知设备驱动程序它已完成其操作。然后，设备驱动程序将控制权交给操作系统的其他部分，如果操作是读取，则可能返回数据或指向数据的指针。
对于其他操作，设备驱动程序返回状态信息，例如“写入成功完成”或“设备繁忙”。
但是控制器如何通知设备驱动程序它有完成它的操作？这是通过中断完成的

1.2.1.2 Implementation
The basic interrupt mechanism works as follows. The CPU hardware has a
wire called the interrupt-request line that the CPU senses after executing every
instruction. When the CPU detects that a controller has asserted a signal on
the interrupt-request line, it reads the interrupt number and jumps to the
interrupt-handler routine by using that interrupt number as an index into
the interrupt vector. It then starts execution at the address associated with
that index. The interrupt handler saves any state it will be changing during
its operation, determines the cause of the interrupt, performs the necessary
processing, performs a state restore, and executes a return from interrupt
instruction to return the CPU to the execution state prior to the interrupt. We
say that the device controller raises an interrupt by asserting a signal on the
interrupt request line, the CPU catches the interrupt and dispatches it to the
interrupt handler, and the handler clears the interrupt by servicing the device.
Figure 1.4 summarizes the interrupt-driven I/O cycle.
The basic interrupt mechanism just described enables the CPU to respond to
an asynchronous event, as when a device controller becomes ready for service.
In a modern operating system, however,we need more sophisticated interrupthandling
features.

We need the ability to defer interrupt handling during critical processing.
We need an efficient way to dispatch to the proper interrupt handler for
a device.
We need multilevel interrupts, so that the operating system can distinguish
between high- and low-priority interrupts and can respond with
the appropriate degree of urgency.
In modern computer hardware, these three features are provided by the CPU
and the interrupt-controller hardware.

Most CPUs have two interrupt request lines. One is the nonmaskable
interrupt, which is reserved for events such as unrecoverable memory errors.
The second interrupt line is maskable: it can be turned off by the CPU before
the execution of critical instruction sequences that must not be interrupted. The
maskable interrupt is used by device controllers to request service.
Recall that the purpose of a vectored interrupt mechanism is to reduce the
need for a single interrupt handler to search all possible sources of interrupts
to determine which one needs service. In practice, however, computers have
more devices (and, hence, interrupt handlers) than they have address elements
in the interrupt vector. Acommon way to solve this problem is to use interrupt
chaining, in which each element in the interrupt vector points to the head of
a list of interrupt handlers. When an interrupt is raised, the handlers on the
corresponding list are called one by one, until one is found that can service
the request. This structure is a compromise between the overhead of a huge
interrupt table and the inefficiency of dispatching to a single interrupt handler.
Figure 1.5 illustrates the design of the interrupt vector for Intel processors.
The events from 0 to 31, which are nonmaskable, are used to signal various
error conditions. The events from 32 to 255, which are maskable, are used for
purposes such as device-generated interrupts.
The interrupt mechanism also implements a system of interrupt priority
levels. These levels enable the CPU to defer the handling of low-priority interrupts without masking all interrupts and makes it possible for a high-priority
interrupt to preempt the execution of a low-priority interrupt.
In summary, interrupts are used throughout modern operating systems to
handle asynchronous events (and for other purposes we will discuss throughout
the text). Device controllers and hardware faults raise interrupts. To enable
the most urgent work to be done first, modern computers use a system of
interrupt priorities. Because interrupts are used so heavily for time-sensitive
processing, efficient interrupt handling is required for good system performance.

设备驱动程序：

处理中断的驱动是需要运行在CPU上的，因此，当中断产生的时候，CPU会中断当前正在运行的任务，来处理中断。

https://www.bilibili.com/video/BV1bf4y147PZ?p=30

然后 bi站上面的视频：

也提到了所以我就学了了一下

csapp中之前学习到了

Accessing Disks
While a detailed description of how I/O devices work and how they are programmed
is outside our scope here, we can give you a general idea. For example,
Figure 6.12 summarizes the steps that take place when a CPU reads data from a
disk.
The CPU issues commands to I/O devices using a technique called memorymapped
I/O (Figure 6.12(a)). In a system with memory-mapped I/O, a block of