程序项目代做,有需求私信(vue、React、Java、爬虫、电路板设计、嵌入式linux等)

linux驱动移植-linux网卡驱动基础

----------------------------------------------------------------------------------------------------------------------------

内核版本:linux 5.2.8

根文件系统:busybox 1.25.0

u-boot:2016.05

----------------------------------------------------------------------------------------------------------------------------

一、OSI七层模型

在介绍OSI之前,我们先澄清一些专业术语。

  • 链路:从一个节点到相连节点的一段物理线路,中间没有任何其他的交换节点;
  • 局域网:是指在某一区域内(如一个学校、工厂等)有多台计算机互联成的网络;
  • 广域网:是指一种跨地区的数据通讯网络,通常包含一个国家或地区;广域网等于把局域网连接起来称为更大的网络;
  • 因特网:将需要国家的广域网结合在一起,组成实际上最大的网络,即因特网;
  • 以太网:可以看做是一种实现局域网通信的技术标准,是目前最广泛的局域网技术;

1.1 概念

OSI七层模式是一个标准,规定了各种计算机在世界范围内互联成网的标准框架,OSI模型是一个分层的模型,每一个部分称为一层,每一层扮演固定的角色,互不干扰。

只要遵循了OSI标准,一个系统就可以和位于世界上任意地方的,也遵循这同一标准的其它任何系统进行通信。

 

OSI七层模型从上到下依次为:应用层、表现层、会话层、传输层、网络层、数据链路层、物理层。

OSI七层模型是一个理论模型,今天的互联网使用的实际模型是TCP/IP模型,缺少了OSI七层模型中的第五层、第六层,TCP/IP模型就常被称为是事实上的国际标准。

TCP/IP是指能够在多个不同网络间实现信息传输的协议簇,TCP/IP协不仅仅指的是TCP和IP两个协议,而是指一个有FTP、SMTP、TCP、UDP、IP等协议构成的协议簇,只是因为TCP/IP协议中的TCP协议和IP协议具有代表性,所以被称为TCP/IP协议。

TCP/IP模型参考OSI七层模型并将其简化为应用层、传输层、网络层、数据链路层、物理层。

注:

  • 上图只是列举TCP/IP模型模型各层比较典型的协议,并不是所有。

1.2 物理层

在物理层上所传输数据的单位是比特。物理层的任务就是传送比特流。也就是说,发送方发送1(或0)时,接收方应当收到1(或0)而不是0(或1)。

因此物理层要考虑使用多少伏的电压代表“1”或“0”,以及接收方如何识别出发送方所发送的比特。物理层还要确定连接电缆的插头应当有多少根引脚以及各引脚应如何连接。

注:

  • 传递信息所使用的一些物理媒体,如双绞线、同轴电缆、光缆、无线信道等,并不在物理层协议在内而是在物理层协议的下面。

1.3 数据链路层

当需要在一条线路上传递数据时,除了必须有一条物理线路外,还必须有一些必要的通信协议来控制这些数据的传输,若把实现这些协议的硬件和软件加到链路上,就构成了数据链路。

现在最常用的方法就是使用网络适配器(或者叫做网络接口卡NIC、网卡)来实现这些协议的硬件和软件,如拨号上网使用拨号适配器,以及通过以太网上网使用局域网适配器。一般的适配器都包括了数据链路层和物理层这两层功能。

数据链路层把网络层交下来的数据构成帧发送到链路上,以及把接收到的帧中的数据取出上交给网络层。在因特网中,网络层协议数据单位就是IP数据报(或简称为数据报、分组或包)。

数据链路层协议有多种,但有三个基本问题是共同的,封装成帧、透明传输、差错控制。

1.4 网络层

网络层负责为分组交换网上的不同主机提供通信服务。在发送数据时,网络层把传输层产生的报文段或用户数据报封装成数据包进行传送。

网络层的另一个任务就是要选择合适的路由,使源主机传输层所传下来的分组,能够通过网络中的路由器找到目的主机。

1.5 传输层

传输的任务就是负责向两个主机中的进程之间的通信提供服务。由于一个主机可同时运行多个进程,因此传输层有复用和分用的功能。

  • 复用就是多个应用层进程可同时使用下面传输层的服务。
  • 分用就是传输层把收到的信息分别交付给上面应用层中相应的进程。

传输层主要使用以下两种协议:TCP和UDP。

1.5.1 传输控制协议TCP(Transmission Control Protocol)
  • TCP是面向连接的传输层协议,也就是说,应用程序在使用TCP协议之前,必须先建立TCP连接,在传输数据完毕后,必须释放已经建立的TCP连接;我们面试中经常问到的三次握手、四次挥手说的就是说的这个;
  • 每一个TCP的链接只能有两个端点,TCP连接的端点叫做套接字或者插口。根据RFC193的定义:端口号拼接到IP地址即构成套接字;
  • TCP能够提供可靠交付的服务,也就是说,通过TCP连接传送的数据,无差错,不丢失,不重复,并且按序到达;
  • TCP提供全双工通信;
  • TCP是面向面向字节流,数据传输的单位是报文段;
1.5.2 用户数据报协议UDP(User Datagram Protocal)
  • UDP是无连接的,即发送数据之前不需要建立连接,因此减少了开销和发送数据之前的时延;
  • UDP使用尽最大努力交付,即不保证可靠交付;
  •  UDP是面向报文的,数据传输的单位是用户数据报;发送方的UDP对应用程序交下来的报文,在添加首部后就直接交付给IP层;UDP对应用层交付下来的报文,既不合并,也不拆分,而是保留这些报文的边界;
  • UDP没有拥塞控制;
  • UDP支持一对一、一对多、多对一和多对多的交互通信;
  • UDP的首部开销小,只有8个字节;

关于什么是面向字节流,什么是面向报文如果不理解的话,可以参考:如何理解是 TCP 面向字节流协议

1.6 应用层

应用层是体系架构中的最高层。应用层直接为用户的应用进程提供服务。在因特网的应用层协议有很多,如支持万维网应用的HTTP协议,支持电子邮件的SMTP协议,支持文件传输的FTP协议等。

1.7 TCP/IP协议

主机A应用如果需要发送数据到主机B的某个应用,数据需要经过一层又一层的封装,最终通过网络媒介传输到主机B,数据又需要经过以一层又一层的解封才能被主机B的某个应用所接收到。

1.7.1 TCP报文格式

一个TCP报文段可以分为首部和数据两部分,而TCP的全部功能都体现在它首部中各个字段的作用。下图是TCP报文段的数据格式:

TCP报文段的前20个字节是固定的,后面有4N(N是整数)字节是根据需要而增加的选项。因此TCP首部的最小长度是20字节。

首部固定部分各字段含义如下:

  • 源端口和目的端口:各占两个字节,分别写入源端口号和目的端口号;
  • 序列号seq:占用4个字节,序号范围是[0,$2^{32}$-1];TCP是面向字节流的,在一个TCP连接中传送的字节流中的每个字节都按序号编号。整个要传送的字节流的起始序号必须在建立连接时设置。首部中的序号字段值指的就是本报本段所发送的数据第一个字节的序号;例如:一报文段的序号字段值是301,而写入的数据共有100字节;这就表明:该报文段的数据的第一个字节的序号是301,最后一个字节的序号是400,显然,下一个报文段(如果还有的话)的数据需要应当从401开始,即下一个报文段的序号字段值是401;
  • 确认号ack:占用4个字节,是期望收到对方下一个报文段的第一个数据字节的序号;
  • 首部长度:占用4个字节,它支出TCP报文段的数据起始处距离TCP报文段的起始有多远;由于首部有长度不确定的选项字段,因此首部长度是必要的;
  • URG:表示本报文段中发送的数据是否包含紧急数据:URG=1 时表示有紧急数据。当 URG=1 时,后面的紧急指针字段才有效;
  • ACK:表示前面的确认号字段是否有效:ACK=1 时表示有效;只有当 ACK=1 时,前面的确认号字段才有效;TCP 规定,连接建立后,ACK 必须为 1;
  • PSH:告诉对方收到该报文段后是否立即把数据推送给上层。如果值为1,表示应当立即把数据提交给上层,而不是缓存起来;
  • RST:表示是否重置连接:若 RST=1,说明TCP连接出现了严重错误(如主机崩溃),必须释放连接,然后再重新建立连接;
  • SYN:在建立连接时使用,用来同步序号:当 SYN=1,ACK=0 时,表示这是一个请求建立连接的报文段;当 SYN=1,ACK=1时,表示对方同意建立连接;SYN=1时,说明这是一个请求建立连接或同意建立连接的报文;只有在前两次握手中SYN才为1;
  • FIN:标记数据是否发送完毕:若FIN=1,表示 此报文段的发送发的数据已发送完毕,并要求释放连接;
  • 窗口:占用2个字节,窗口值是[0,$2^{16}$-1]之间的整数。窗口指的是发送本报文段的一方的接收窗口;
  • 校验位:占用两个字节。校验位字段校验的范围包括首部和数据两部分;
  • 紧急指针:占用两个字节。紧急指针仅在URG=1时才有意义,它指出本报文段中的紧急数据的字节数;
  • 选项:长度可变,最长可达40个字节。当没有使用该选项时,TCP的首部长度是20字节;
1.7.2 IP报文格式

IP数据报的格式能够说明IP协议都具有什么功能,格式如下图:

一个IP数据报由首部和数据两部分组成。首部的前一部分是固定长度,共20字节,是所有IP数据报必须具有的。在首部的固定部分的后面是一些可选字段,其长度是可变的。

首部固定部分各字段含义如下:

  • 版本:占4位,指IP协议的版本。通信双方使用的IP协议的版本必须一致。目前广泛使用的IP协议版本号为4(即IPv4)。版本号为6(即IPv6);
  • 首部长度:占4位,可表示的最大十进制数值是15。首部长度字段所表示的单位是32位(4字节,与TCP首部中长度字段单位一致)。因为IP首部的固定长度是20字节,因此首部长度字段的最小值为5(0101)。当首部长度为15(1111)时,表示的长度为60字节。当IP分组的首部长度不是4的整数倍时,必须利用最后的填充字段加以填充达到4的整数倍;
  • 区分服务(tos):占1字节,用来获得更好的服务。这个字段在旧标准中叫做服务类型,但实际上一直没有被使用过。只有在区分服务时,这个字段才起作用。在一般情况下都不使用这个字段;
  • 总长度(totlen):占2字节,指首部和数据之和的长度,单位为字节。能表示的最大长度为65535字节。在IP层下面的链路层协议规定了一个数据帧的数据字段的最大长度,这称为最大传输单元MTU(maximum transfer unit)。当一个IP数据报封装成链路层的帧时,此数据报的总长度(即首部加上数据部分)一定不能超过下面的链路层所规定的的MTU值;以太网规定MTU为1500字节。若所传送的数据报长度超过链路层的MTU值,就必须把过长的数据进行分片处理;
  • 标识(identification):占2字节。网络层软件在存储器中维持一个计数器,每产生一个数据报,计数器就加1,并将此值赋给标识字段。但这个“标识”并不同于TCP首部中的序号,因为IP是无连接的服务,数据报不存在按序接收的问题。当数据报长度超过网络的MTU而必须分片时,这个标识字段的值就被复制到所有被分片报文片的标识字段中。相同的标识字段的值使分片后的各数据报片最后能正确地重装层原来的数据;
  • 标志(flag):占3位,目前只有两位有意义;标志字段中间的一位记为DF(dongt fragment),意思是“不能分片”。当DF=0时才允许分片。标志字段最低位MF(more fragment)。MF=1即表示后面“还有分片”的数据报。MF=0表示这已是若干数据报片中的最后一个;
  • 片偏移(offsetfrag):占13位。片偏移指出:较长的IP报文在分片后,某片在原分组中的相对位置。也就是说,相对于用户数据字段的起点,该片从何处开始。片偏移以8个字节为偏移单位。没片的长度一定是8字节的整数倍;
  • 生存时间(TTL):占8位,英文缩写TTL(Time To Live),表明数据报在网络中的寿命。由发出数据报的源点设置这个字段。目的是防止无法交付的数据无限制地在互联网中兜圈子。路由器在每次转发数据报之前就把TTL值减1。若TTL值减小到零,就丢弃此报文,不在转发;
  • 协议:占8位,协议字段指出此数据报携带的数据是使用何种协议,以便使用的目的主机的IP层知道应将数据部分上交给哪个协议进行处理;常用的一些协议和响应的协议字段值如下:
协议名ICMPIGMPTCPEGPIGPUDPIPv6OSPF
协议字段值 1 2 6 8 9 17 41 89
  • 首部检验和(checksum):占16位,也常成为校验和。这个字段只检验数据报的首部,但不包括数据部分(与UDP、TCP中的检验和不同)。IP数据报每经过一个路由器,路由器都需要重新计算一下首部检验和(IP首部中的TTL、标志、片偏移等都可能发生变化) ;

  • 源地址:占32位,4个字节;

  • 目的地址:占32位,4个字节;

例如:一数据报的总长度为3820字节,数据部分为3800字节(IP首部为固定20字节),需要分片传输。假设每片IP报文长度不超过1420字节。去掉固定首部长度20字节,每片报文数据部分长度不超过1400。于是分成3个数据报片,其数据部分长度分别为1400、1400、1000字节。原始数据报首部被复制为各数据报片的首部,只需要改变有关字段的值。

1.7.3 以太帧

以太网链路传输的数据包称做以太帧,或者以太网数据帧。以太帧起始部分由前同步码和帧开始定界符组成,后面紧跟着一个以太网首部,以 MAC 地址说明目的地址和源地址。以太帧的中部是该帧负载的包含其他协议报头的数据包,如 IP 协议。

以太帧由一个 32 位冗余校验码结尾,用于检验数据传输是否出现损坏。以太帧结构如图所示:

上图中每个字段的含义如下表所示:

  •  前同步码:用来使接收端的适配器在接收MAC帧时能够迅速调整时钟频率,使它和发送端的频率相同。前同步码为7个字节,1和0交替;
  • 帧开始定界符:帧的起始符,为1个字节。前6位1和0交替,最后的两个连续的1表示告诉接收端适配器:“帧信息要来了,准备接收”;
  • 目的MAC地址:接收帧的网络适配器的物理地址(MAC 地址),为 6个字节(48 比特)。作用是当网卡接收到一个数据帧时,首先会检查该帧的目的地址,是否与当前适配器的物理地址相同,如果相同,就会进一步处理;如果不同,则直接丢弃;
  • 源MAC地址:发送帧的网络适配器的物理地址(MAC 地址),为6个字节(48 比特);
  • 类型:上层协议的类型。由于上层协议众多,所以在处理数据的时候必须设置该字段,标识数据交付哪个协议处理。例如,字段为 0x0800 时,表示将数据交付给 IP 协议;
  • 数据:也称为有效载荷,表示交付给上层的数据。以太网帧数据长度最小为 46 字节,最大为 1500 字节。如果不足 46 字节时,会填充到最小长度。最大值也叫最大传输单元(MTU);
  • 帧检验序列 FCS:检测该帧是否出现差错,占 4 个字节(32 比特)。发送方计算帧的循环冗余码校验(CRC)值,把这个值写到帧里。接收方计算机重新计算 CRC,与 FCS 字段的值进行比较。如果两个值不相同,则表示传输过程中发生了数据丢失或改变。这时,就需要重新传输这一帧;
1.7.4 ICMP报文格式

ICMP协议全称网际控制报文协议,ICMP允许主机或路由器报告差错情况和提供有关异常情况的报告。ICMP是因特网的标准协议,但ICMP不是高层协议,而是IP层的协议。

ICMP报文作为IP层数据报的数据,加上数据报的首部,组成IP数据报发送出去。

我们在计算机中经常使用的PING命令就是通过ICMP实现的,它用来探测两个主机之间的互通性。

ICMP报文格式如下:

上图中每个字段的含义如下表所示:

  • 类型:1字节,常用的ICMP报文类型如下:
类型 代码 消息含义 消息助记符
0 0 表示回显应答(ping应道) Echo
8 0 表示回显请求(ping请求) Echo-reply
3 0 网络不可达 Net-unreachable
3 1 主机不可达 Host-unreachable
3 2 协议不可达 Protocol-unreachable
5 0 网络重定向 Net-redirect
5 1 主机重定向 Host-redirect
5 2 服务类型和网络重定向 Net-tos-redirect
5 3 服务类型和主机重定向 Host-tos-redirect
11 0 超时  
12 0 参数存在问题 Parameter-problem
13 0 时间戳请求 Timestamp-request
14 0 时间戳响应 Timestamp-reply
15 0 信息请求 Information-request
16 0 信息应答 Information-reply
  • 代码:1字节;
  • 校验和:2字节;

二、网卡驱动架构

2.1 概述

对于网卡来说,其工作物理层和数据链路层,主要负责收发网络的数据包,它将网络通信上层协议传递下来的数据包以特定的媒介访问控制方式进行发送,并将接收到的数据包传递给上层协议。

在知道了网卡的工作内容后,我们也就清楚了网卡驱动程序要实现的功能,即通过控制硬件实现数据的传输,一方面让硬件将上层传递的数据包发送出去,另一方面接收外部数据并传递给上层。 

为了能更加清楚理解linux内核中网卡驱动的程序,我们按照功能对它进行层次划分,划分后的Linux内核的网卡驱动程序的框架如下图所示:

从上图可以看出内核中的网卡驱动程序被划分为4层:

  • 网络协议接口层:向网络层协议提供统一的数据包收发接口,当上层ARP或IP需要发送数据时,它将主调用网络协议接口层的dev_queue_xmit()函数发送数据包到下层或者调用 netif_rx()函数接收数据包,都使用sk_buff作为数据的载体;
  • 网络设备接口层:通过net_device结构体来描述网络设备信息,是设备驱动功能层各个函数的容器,向上实现不同硬件类型接口的统一;
  • 设备驱动功能层:用来负责驱动网络设备硬件来完成各个功能,各个函数是网络设备接口层net_device数据结构的具体成员,比如最核心的功能实现数据包的发送和数据包的接收(它通过hard_start_xmit()函数启动数据包发送操作,并通过网络设备上的中断触发接收操作);
  • 网络设备和媒介层:是完成数据包发送和接收的物理实体,包括网络适配器和具体的传输媒介,网络适配器被设备驱动功能层中的函数在物理上驱动。对于Linux系统而言,网络设备和媒介都可以是虚拟的。

其中net_device结构体是协议层和硬件交互的桥梁,它屏蔽了硬件之间的差异,使得协议层不需要关心硬件的操作,在发送数据时只需要调用net_device结构体中操作函数完成数据的收发。

net_device结构体中的操作函数是由设备驱动功能层实现的函数注册的,对应不同的硬件设备,驱动功能层实现上会有所差异。

总的来说,我们编写网卡驱动程序也就是围绕网络设备接口层和设备驱动功能层进行的,根据硬件功能实现设备驱动功能层的数据收发函数,填充并向上注册net_device结构体。

2.2 核心数据结构

2.2.1 struct sk_buff

sk_buff结构体非常重要,它定义与include/linux/skbuff.h文件中,含义为“套接字缓冲区”,用于TCP/IP模型各层之间传递数据。

当发送数据包时,linux内核的网络处理模块必须建立一个包含要传输包的sk_buff,然后将sk_buff传递到下一层,各层在sk_buff中添加不同的协议头直至交给网络设备发送。

当网络设备从网络媒介上接收到数据包时,linux内核网络处理模块将接收到的数据转换为sk_buff,并传递到上一层,各层去除相应的协议头直至交给应用。

sk_buff数据结构定义如下:

/**
 *      struct sk_buff - socket buffer
 *      @next: Next buffer in list
 *      @prev: Previous buffer in list
 *      @tstamp: Time we arrived/left
 *      @rbnode: RB tree node, alternative to next/prev for netem/tcp
 *      @sk: Socket we are owned by
 *      @dev: Device we arrived on/are leaving by
 *      @cb: Control buffer. Free for use by every layer. Put private vars here
 *      @_skb_refdst: destination entry (with norefcount bit)
 *      @sp: the security path, used for xfrm
 *      @len: Length of actual data
 *      @data_len: Data length
 *      @mac_len: Length of link layer header
 *      @hdr_len: writable header length of cloned skb
 *      @csum: Checksum (must include start/offset pair)
 *      @csum_start: Offset from skb->head where checksumming should start
 *      @csum_offset: Offset from csum_start where checksum should be stored
 *      @priority: Packet queueing priority
 *      @ignore_df: allow local fragmentation
 *      @cloned: Head may be cloned (check refcnt to be sure)
 *      @ip_summed: Driver fed us an IP checksum
 *      @nohdr: Payload reference only, must not modify header
 *      @pkt_type: Packet class
 *      @fclone: skbuff clone status
 *      @ipvs_property: skbuff is owned by ipvs
 *      @offload_fwd_mark: Packet was L2-forwarded in hardware
 *      @offload_l3_fwd_mark: Packet was L3-forwarded in hardware
 *      @tc_skip_classify: do not classify packet. set by IFB device
 *      @tc_at_ingress: used within tc_classify to distinguish in/egress
 *      @tc_redirected: packet was redirected by a tc action
 *      @tc_from_ingress: if tc_redirected, tc_at_ingress at time of redirect
 *      @peeked: this packet has been seen already, so stats have been
 *              done for it, don't do them again
 *      @nf_trace: netfilter packet trace flag
 *      @protocol: Packet protocol from driver
 *      @destructor: Destruct function
 *      @tcp_tsorted_anchor: list structure for TCP (tp->tsorted_sent_queue)
 *      @_nfct: Associated connection, if any (with nfctinfo bits)
 *      @nf_bridge: Saved data about a bridged frame - see br_netfilter.c
 *      @skb_iif: ifindex of device we arrived on
 *      @tc_index: Traffic control index
 *      @hash: the packet hash
 *      @queue_mapping: Queue mapping for multiqueue devices
 *      @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves
 *      @active_extensions: active extensions (skb_ext_id types)
 *      @ndisc_nodetype: router type (from link layer)
 *      @ooo_okay: allow the mapping of a socket to a queue to be changed
 *      @l4_hash: indicate hash is a canonical 4-tuple hash over transport
 *              ports.
 *      @sw_hash: indicates hash was computed in software stack
 *      @wifi_acked_valid: wifi_acked was set
 *      @wifi_acked: whether frame was acked on wifi or not
 *      @no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
 *      @csum_not_inet: use CRC32c to resolve CHECKSUM_PARTIAL
 *      @dst_pending_confirm: need to confirm neighbour
 *      @decrypted: Decrypted SKB
 *      @napi_id: id of the NAPI struct this skb came from
 *      @secmark: security marking
 *      @mark: Generic packet mark
 *      @vlan_proto: vlan encapsulation protocol
 *      @vlan_tci: vlan tag control information
 *      @inner_protocol: Protocol (encapsulation)
 *      @inner_transport_header: Inner transport layer header (encapsulation)
 *      @inner_network_header: Network layer header (encapsulation)
 *      @inner_mac_header: Link layer header (encapsulation)
 *      @transport_header: Transport layer header
 *      @network_header: Network layer header
 *      @mac_header: Link layer header
 *      @tail: Tail pointer
 *      @end: End pointer
 *      @head: Head of buffer
 *      @data: Data head pointer
 *      @truesize: Buffer size
 *      @users: User count - see {datagram,tcp}.c
 *      @extensions: allocated extensions, valid if active_extensions is nonzero
 */
struct sk_buff {
        union {
                struct {
                        /* These two members must be first. */
                        struct sk_buff          *next;
                        struct sk_buff          *prev;

                        union {
                                struct net_device       *dev;
                                /* Some protocols might use this space to store information,
                                 * while device pointer would be NULL.
                                 * UDP receive path is one user.
                                 */
                                unsigned long           dev_scratch;
                        };
                };
                struct rb_node          rbnode; /* used in netem, ip4 defrag, and tcp stack */
                struct list_head        list;
        };

        union {
                struct sock             *sk;
                int                     ip_defrag_offset;
        };

        union {
                ktime_t         tstamp;
                u64             skb_mstamp_ns; /* earliest departure time */
        };
        /*
         * This is the control buffer. It is free to use for every
         * layer. Please put your private variables there. If you
         * want to keep them across layers you have to do a skb_clone()
         * first. This is owned by whoever has the skb queued ATM.
         */
        char                    cb[48] __aligned(8);

        union {
                struct {
                        unsigned long   _skb_refdst;
                        void            (*destructor)(struct sk_buff *skb);
                };
                struct list_head        tcp_tsorted_anchor;
        };
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
        unsigned long            _nfct;
#endif
        unsigned int            len,
                                data_len;
        __u16                   mac_len,
                                hdr_len;

        /* Following fields are _not_ copied in __copy_skb_header()
         * Note that queue_mapping is here mostly to fill a hole.
         */
        __u16                   queue_mapping;

/* if you move cloned around you also must adapt those constants */
#ifdef __BIG_ENDIAN_BITFIELD
#define CLONED_MASK     (1 << 7)
#else
#define CLONED_MASK     1
#endif
#define CLONED_OFFSET()         offsetof(struct sk_buff, __cloned_offset)

        __u8                    __cloned_offset[0];
        __u8                    cloned:1,
                                nohdr:1,
                                fclone:2,
                                peeked:1,
                                head_frag:1,
                                pfmemalloc:1;
#ifdef CONFIG_SKB_EXTENSIONS
        __u8                    active_extensions;
#endif
        /* fields enclosed in headers_start/headers_end are copied
         * using a single memcpy() in __copy_skb_header()
         */
        /* private: */
        __u32                   headers_start[0];
        /* public: */
/* if you move pkt_type around you also must adapt those constants */
#ifdef __BIG_ENDIAN_BITFIELD
#define PKT_TYPE_MAX    (7 << 5)
#else
#define PKT_TYPE_MAX    7
#endif
#define PKT_TYPE_OFFSET()       offsetof(struct sk_buff, __pkt_type_offset)

        __u8                    __pkt_type_offset[0];
        __u8                    pkt_type:3;
        __u8                    ignore_df:1;
        __u8                    nf_trace:1;
        __u8                    ip_summed:2;
        __u8                    ooo_okay:1;

        __u8                    l4_hash:1;
        __u8                    sw_hash:1;
        __u8                    wifi_acked_valid:1;
        __u8                    wifi_acked:1;
        __u8                    no_fcs:1;
        /* Indicates the inner headers are valid in the skbuff. */
        __u8                    encapsulation:1;
        __u8                    encap_hdr_csum:1;
        __u8                    csum_valid:1;

#ifdef __BIG_ENDIAN_BITFIELD
#define PKT_VLAN_PRESENT_BIT    7
#else
#define PKT_VLAN_PRESENT_BIT    0
#endif
#define PKT_VLAN_PRESENT_OFFSET()       offsetof(struct sk_buff, __pkt_vlan_present_offset)
        __u8                    __pkt_vlan_present_offset[0];
        __u8                    vlan_present:1;
        __u8                    csum_complete_sw:1;
        __u8                    csum_level:2;
        __u8                    csum_not_inet:1;
        __u8                    dst_pending_confirm:1;
#ifdef CONFIG_IPV6_NDISC_NODETYPE
        __u8                    ndisc_nodetype:2;
#endif

        __u8                    ipvs_property:1;
        __u8                    inner_protocol_type:1;
        __u8                    remcsum_offload:1;
#ifdef CONFIG_NET_SWITCHDEV
        __u8                    offload_fwd_mark:1;
        __u8                    offload_l3_fwd_mark:1;
#endif
#ifdef CONFIG_NET_CLS_ACT
        __u8                    tc_skip_classify:1;
        __u8                    tc_at_ingress:1;
        __u8                    tc_redirected:1;
        __u8                    tc_from_ingress:1;
#endif
#ifdef CONFIG_TLS_DEVICE
        __u8                    decrypted:1;
#endif

#ifdef CONFIG_NET_SCHED
        __u16                   tc_index;       /* traffic control index */
#endif

        union {
                __wsum          csum;
                struct {
                        __u16   csum_start;
                        __u16   csum_offset;
                };
        };
        __u32                   priority;
        int                     skb_iif;
        __u32                   hash;
        __be16                  vlan_proto;
        __u16                   vlan_tci;
#if defined(CONFIG_NET_RX_BUSY_POLL) || defined(CONFIG_XPS)
        union {
                unsigned int    napi_id;
                unsigned int    sender_cpu;
        };
#endif
#ifdef CONFIG_NETWORK_SECMARK
        __u32           secmark;
#endif
        union {
                __u32           mark;
                __u32           reserved_tailroom;
        };

        union {
                __be16          inner_protocol;
                __u8            inner_ipproto;
        };

        __u16                   inner_transport_header;
        __u16                   inner_network_header;
        __u16                   inner_mac_header;

        __be16                  protocol;
        __u16                   transport_header;
        __u16                   network_header;
        __u16                   mac_header;

        /* private: */
        __u32                   headers_end[0];
        /* public: */

        /* These elements must be at the end, see alloc_skb() for details.  */
        sk_buff_data_t          tail;
        sk_buff_data_t          end;
        unsigned char           *head,
                                *data;
        unsigned int            truesize;
        refcount_t              users;

#ifdef CONFIG_SKB_EXTENSIONS
        /* only useable after checking ->active_extensions != 0 */
        struct skb_ext          *extensions;
#endif
};
View Code

这个结构体成员内容太多,这里我们挑部分参数介绍:

  • next:指向双向链表中的下一个sk_buff结构体;
  • prev:指向双向链表中的上一个sk_buff结构体;
  • len:缓冲区中数据块大小。长度包括:主要缓冲区(head所指)的数据以及一些片断(fragment)的数据。当包在协议栈向上或向下走时,其大小会变,因为有头部的丢弃和添加;
  • data_len:片段中数据大小;
  • mac_len:数据链路层head长度;
  • hdr_len:writable header length of cloned skb;
  • priority:该sk_buff结构体的优先级;
  • ksb_iif:
  • hash:
  • vlan_proto:
  • vlan_tci:
  • protocal:存放上层的协议类型,可以通过eth_type_trans()来获取;
  • transport_header:传输层头部的偏移值;
  • network_header:网络层头部的偏移值;
  • mac_header:数据链路层头部的偏移值;
  • head:指向已分配数据缓冲区的开端;
  • end:指向已分配数据缓冲区的尾端;
  • data:指向实际数据的开端;
  • tail:指向实际数据的尾端;

从成员变量可以看出来,sk_buff可以用来构建双向链表。head和end指向已分配数据缓冲区的开端和尾端,而data和tail指向实际数据的开端和尾端。

2.2.2 struct net_device

linux内核中使用net_device结构体来描述网络设备,这个结构是网络设备接口层中最重要的结构。该结构不仅描述了接口方面的信息,还包含了硬件信息。

对于网络设备接口层,我们只需要填充net_device数据结构内容,并注册到内核,设置硬件相关操作,使能中断处理等,即可实现硬件操作函数与内核的挂接。

net_device定义在include/linux/netdevice.h文件中,其数据结构定义如下:

/**
 *      struct net_device - The DEVICE structure.
 *
 *      Actually, this whole structure is a big mistake.  It mixes I/O
 *      data with strictly "high-level" data, and it has to know about
 *      almost every data structure used in the INET module.
 *
 *      @name:  This is the first field of the "visible" part of this structure
 *              (i.e. as seen by users in the "Space.c" file).  It is the name
 *              of the interface.
 *
 *      @name_hlist:    Device name hash chain, please keep it close to name[]
 *      @ifalias:       SNMP alias
 *      @mem_end:       Shared memory end
 *      @mem_start:     Shared memory start
 *      @base_addr:     Device I/O address
 *      @irq:           Device IRQ number
 *
 *      @state:         Generic network queuing layer state, see netdev_state_t
 *      @dev_list:      The global list of network devices
 *      @napi_list:     List entry used for polling NAPI devices
 *      @unreg_list:    List entry  when we are unregistering the
 *                      device; see the function unregister_netdev
 *      @close_list:    List entry used when we are closing the device
 *      @ptype_all:     Device-specific packet handlers for all protocols
 *      @ptype_specific: Device-specific, protocol-specific packet handlers
 *
 *      @adj_list:      Directly linked devices, like slaves for bonding
 *      @features:      Currently active device features
 *      @hw_features:   User-changeable features
 *
 *      @wanted_features:       User-requested features
 *      @vlan_features:         Mask of features inheritable by VLAN devices
 *
 *      @hw_enc_features:       Mask of features inherited by encapsulating devices
 *                              This field indicates what encapsulation
 *                              offloads the hardware is capable of doing,
 *                              and drivers will need to set them appropriately.
 *
 *      @mpls_features: Mask of features inheritable by MPLS
 *
 *      @ifindex:       interface index
 *      @group:         The group the device belongs to
 *
 *      @stats:         Statistics struct, which was left as a legacy, use
 *                      rtnl_link_stats64 instead
 *
 *      @rx_dropped:    Dropped packets by core network,
 *                      do not use this in drivers
 *      @tx_dropped:    Dropped packets by core network,
 *                      do not use this in drivers
 *      @rx_nohandler:  nohandler dropped packets by core network on
 *                      inactive devices, do not use this in drivers
 *      @carrier_up_count:      Number of times the carrier has been up
 *      @carrier_down_count:    Number of times the carrier has been down
 *
 *      @wireless_handlers:     List of functions to handle Wireless Extensions,
 *                              instead of ioctl,
 *                              see <net/iw_handler.h> for details.
 *      @wireless_data: Instance data managed by the core of wireless extensions
 *
 *      @netdev_ops:    Includes several pointers to callbacks,
 *                      if one wants to override the ndo_*() functions
 *      @ethtool_ops:   Management operations
 *      @ndisc_ops:     Includes callbacks for different IPv6 neighbour
 *                      discovery handling. Necessary for e.g. 6LoWPAN.
 *      @header_ops:    Includes callbacks for creating,parsing,caching,etc
 *                      of Layer 2 headers.
 *
 *      @flags:         Interface flags (a la BSD)
 *      @priv_flags:    Like 'flags' but invisible to userspace,
 *                      see if.h for the definitions
 *      @gflags:        Global flags ( kept as legacy )
 *      @padded:        How much padding added by alloc_netdev()
 *      @operstate:     RFC2863 operstate
 *      @link_mode:     Mapping policy to operstate
 *      @if_port:       Selectable AUI, TP, ...
 *      @dma:           DMA channel
 *      @mtu:           Interface MTU value
 *      @min_mtu:       Interface Minimum MTU value
 *      @max_mtu:       Interface Maximum MTU value
 *      @type:          Interface hardware type
 *      @hard_header_len: Maximum hardware header length.
 *      @min_header_len:  Minimum hardware header length
 *
 *      @needed_headroom: Extra headroom the hardware may need, but not in all
 *                        cases can this be guaranteed
 *      @needed_tailroom: Extra tailroom the hardware may need, but not in all
 *                        cases can this be guaranteed. Some cases also use
 *                        LL_MAX_HEADER instead to allocate the skb
 *
 *      interface address info:
 *
 *      @perm_addr:             Permanent hw address
 *      @addr_assign_type:      Hw address assignment type
 *      @addr_len:              Hardware address length
 *      @neigh_priv_len:        Used in neigh_alloc()
 *      @dev_id:                Used to differentiate devices that share
 *                              the same link layer address
 *      @dev_port:              Used to differentiate devices that share
 *                              the same function
 *      @addr_list_lock:        XXX: need comments on this one
 *      @uc_promisc:            Counter that indicates promiscuous mode
 *                              has been enabled due to the need to listen to
 *                              additional unicast addresses in a device that
 *                              does not implement ndo_set_rx_mode()
 *      @uc:                    unicast mac addresses
 *      @mc:                    multicast mac addresses
 *      @dev_addrs:             list of device hw addresses
 *      @queues_kset:           Group of all Kobjects in the Tx and RX queues
 *      @promiscuity:           Number of times the NIC is told to work in
 *                              promiscuous mode; if it becomes 0 the NIC will
 *                              exit promiscuous mode
 *      @allmulti:              Counter, enables or disables allmulticast mode
 *
 *      @vlan_info:     VLAN info
 *      @dsa_ptr:       dsa specific data
 *      @tipc_ptr:      TIPC specific data
 *      @atalk_ptr:     AppleTalk link
 *      @ip_ptr:        IPv4 specific data
 *      @dn_ptr:        DECnet specific data
 *      @ip6_ptr:       IPv6 specific data
 *      @ax25_ptr:      AX.25 specific data
 *      @ieee80211_ptr: IEEE 802.11 specific data, assign before registering
 *
 *      @dev_addr:      Hw address (before bcast,
 *                      because most packets are unicast)
 *
 *      @_rx:                   Array of RX queues
 *      @num_rx_queues:         Number of RX queues
 *                              allocated at register_netdev() time
 *      @real_num_rx_queues:    Number of RX queues currently active in device
 *
 *      @rx_handler:            handler for received packets
 *      @rx_handler_data:       XXX: need comments on this one
 *      @miniq_ingress:         ingress/clsact qdisc specific data for
 *                              ingress processing
 *      @ingress_queue:         XXX: need comments on this one
 *      @broadcast:             hw bcast address
 *
 *      @rx_cpu_rmap:   CPU reverse-mapping for RX completion interrupts,
 *                      indexed by RX queue number. Assigned by driver.
 *                      This must only be set if the ndo_rx_flow_steer
 *                      operation is defined
 *      @index_hlist:           Device index hash chain
 *
 *      @_tx:                   Array of TX queues
 *      @num_tx_queues:         Number of TX queues allocated at alloc_netdev_mq() time
 *      @real_num_tx_queues:    Number of TX queues currently active in device
 *      @qdisc:                 Root qdisc from userspace point of view
 *      @tx_queue_len:          Max frames per queue allowed
 *      @tx_global_lock:        XXX: need comments on this one
 *
 *      @xps_maps:      XXX: need comments on this one
 *      @miniq_egress:          clsact qdisc specific data for
 *                              egress processing
 *      @watchdog_timeo:        Represents the timeout that is used by
 *                              the watchdog (see dev_watchdog())
 *      @watchdog_timer:        List of timers
 *
 *      @pcpu_refcnt:           Number of references to this device
 *      @todo_list:             Delayed register/unregister
 *      @link_watch_list:       XXX: need comments on this one
 *
 *      @reg_state:             Register/unregister state machine
 *      @dismantle:             Device is going to be freed
 *      @rtnl_link_state:       This enum represents the phases of creating
 *                              a new link
 *
 *      @needs_free_netdev:     Should unregister perform free_netdev?
 *      @priv_destructor:       Called from unregister
 *      @npinfo:                XXX: need comments on this one
 *      @nd_net:                Network namespace this network device is inside
 *
 *      @ml_priv:       Mid-layer private
 *      @lstats:        Loopback statistics
 *      @tstats:        Tunnel statistics
 *      @dstats:        Dummy statistics
 *      @vstats:        Virtual ethernet statistics
 *
 *      @garp_port:     GARP
 *      @mrp_port:      MRP
 *
 *      @dev:           Class/net/name entry
 *      @sysfs_groups:  Space for optional device, statistics and wireless
 *                      sysfs groups
 *
 *      @sysfs_rx_queue_group:  Space for optional per-rx queue attributes
 *      @rtnl_link_ops: Rtnl_link_ops
 *
 *      @gso_max_size:  Maximum size of generic segmentation offload
 *      @gso_max_segs:  Maximum number of segments that can be passed to the
 *                      NIC for GSO
 *
 *      @dcbnl_ops:     Data Center Bridging netlink ops
 *      @num_tc:        Number of traffic classes in the net device
 *      @tc_to_txq:     XXX: need comments on this one
 *      @prio_tc_map:   XXX: need comments on this one
 *
 *      @fcoe_ddp_xid:  Max exchange id for FCoE LRO by ddp
 *
 *      @priomap:       XXX: need comments on this one
 *      @phydev:        Physical device may attach itself
 *                      for hardware timestamping
 *      @sfp_bus:       attached &struct sfp_bus structure.
 *
 *      @qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
 *      @qdisc_running_key: lockdep class annotating Qdisc->running seqcount
 *
 *      @proto_down:    protocol port state information can be sent to the
 *                      switch driver and used to set the phys state of the
 *                      switch port.
 *
 *      @wol_enabled:   Wake-on-LAN is enabled
 *
 *      FIXME: cleanup struct net_device such that network protocol info
 *      moves out.
 */
struct net_device {
        char                    name[IFNAMSIZ];
        struct hlist_node       name_hlist;
        struct dev_ifalias      __rcu *ifalias;
        /*
         *      I/O specific fields
         *      FIXME: Merge these and struct ifmap into one
         */
        unsigned long           mem_end;
        unsigned long           mem_start;
        unsigned long           base_addr;
        int                     irq;

        /*
         *      Some hardware also needs these fields (state,dev_list,
         *      napi_list,unreg_list,close_list) but they are not
         *      part of the usual set specified in Space.c.
         */

        unsigned long           state;

        struct list_head        dev_list;
        struct list_head        napi_list;
        struct list_head        unreg_list;
        struct list_head        close_list;
        struct list_head        ptype_all;
        struct list_head        ptype_specific;

        struct {
                struct list_head upper;
                struct list_head lower;
        } adj_list;

        netdev_features_t       features;
        netdev_features_t       hw_features;
        netdev_features_t       wanted_features;
        netdev_features_t       vlan_features;
        netdev_features_t       hw_enc_features;
        netdev_features_t       mpls_features;
        netdev_features_t       gso_partial_features;

        int                     ifindex;
        int                     group;

        struct net_device_stats stats;

        atomic_long_t           rx_dropped;
        atomic_long_t           tx_dropped;
        atomic_long_t           rx_nohandler;

        /* Stats to monitor link on/off, flapping */
        atomic_t                carrier_up_count;
        atomic_t                carrier_down_count;

#ifdef CONFIG_WIRELESS_EXT
        const struct iw_handler_def *wireless_handlers;
        struct iw_public_data   *wireless_data;
#endif
        const struct net_device_ops *netdev_ops;
        const struct ethtool_ops *ethtool_ops;
#ifdef CONFIG_NET_L3_MASTER_DEV
        const struct l3mdev_ops *l3mdev_ops;
#endif
#if IS_ENABLED(CONFIG_IPV6)
        const struct ndisc_ops *ndisc_ops;
#endif

#ifdef CONFIG_XFRM_OFFLOAD
        const struct xfrmdev_ops *xfrmdev_ops;
#endif

#if IS_ENABLED(CONFIG_TLS_DEVICE)
        const struct tlsdev_ops *tlsdev_ops;
#endif

        const struct header_ops *header_ops;

        unsigned int            flags;
        unsigned int            priv_flags;

        unsigned short          gflags;
        unsigned short          padded;

        unsigned char           operstate;
        unsigned char           link_mode;

        unsigned char           if_port;
        unsigned char           dma;

        unsigned int            mtu;
        unsigned int            min_mtu;
        unsigned int            max_mtu;
        unsigned short          type;
        unsigned short          hard_header_len;
        unsigned char           min_header_len;

        unsigned short          needed_headroom;
        unsigned short          needed_tailroom;

        /* Interface address info. */
        unsigned char           perm_addr[MAX_ADDR_LEN];
        unsigned char           addr_assign_type;
        unsigned char           addr_len;
        unsigned short          neigh_priv_len;
        unsigned short          dev_id;
        unsigned short          dev_port;
        spinlock_t              addr_list_lock;
        unsigned char           name_assign_type;
        bool                    uc_promisc;
        struct netdev_hw_addr_list      uc;
        struct netdev_hw_addr_list      mc;
        struct netdev_hw_addr_list      dev_addrs;
#ifdef CONFIG_SYSFS
        struct kset             *queues_kset;
#endif
        unsigned int            promiscuity;
        unsigned int            allmulti;


        /* Protocol-specific pointers */

#if IS_ENABLED(CONFIG_VLAN_8021Q)
        struct vlan_info __rcu  *vlan_info;
#endif
#if IS_ENABLED(CONFIG_NET_DSA)
        struct dsa_port         *dsa_ptr;
#endif
#if IS_ENABLED(CONFIG_TIPC)
        struct tipc_bearer __rcu *tipc_ptr;
#endif
#if IS_ENABLED(CONFIG_IRDA) || IS_ENABLED(CONFIG_ATALK)
        void                    *atalk_ptr;
#endif
        struct in_device __rcu  *ip_ptr;
#if IS_ENABLED(CONFIG_DECNET)
        struct dn_dev __rcu     *dn_ptr;
#endif
        struct inet6_dev __rcu  *ip6_ptr;
#if IS_ENABLED(CONFIG_AX25)
        void                    *ax25_ptr;
#endif
        struct wireless_dev     *ieee80211_ptr;
        struct wpan_dev         *ieee802154_ptr;
#if IS_ENABLED(CONFIG_MPLS_ROUTING)
        struct mpls_dev __rcu   *mpls_ptr;
#endif

/*
 * Cache lines mostly used on receive path (including eth_type_trans())
 */
        /* Interface address info used in eth_type_trans() */
        unsigned char           *dev_addr;

        struct netdev_rx_queue  *_rx;
        unsigned int            num_rx_queues;
        unsigned int            real_num_rx_queues;

        struct bpf_prog __rcu   *xdp_prog;
        unsigned long           gro_flush_timeout;
        rx_handler_func_t __rcu *rx_handler;
        void __rcu              *rx_handler_data;

#ifdef CONFIG_NET_CLS_ACT
        struct mini_Qdisc __rcu *miniq_ingress;
#endif
        struct netdev_queue __rcu *ingress_queue;
#ifdef CONFIG_NETFILTER_INGRESS
        struct nf_hook_entries __rcu *nf_hooks_ingress;
#endif

        unsigned char           broadcast[MAX_ADDR_LEN];
#ifdef CONFIG_RFS_ACCEL
        struct cpu_rmap         *rx_cpu_rmap;
#endif
        struct hlist_node       index_hlist;

/*
 * Cache lines mostly used on transmit path
 */
        struct netdev_queue     *_tx ____cacheline_aligned_in_smp;
        unsigned int            num_tx_queues;
        unsigned int            real_num_tx_queues;
        struct Qdisc            *qdisc;
#ifdef CONFIG_NET_SCHED
        DECLARE_HASHTABLE       (qdisc_hash, 4);
#endif
        unsigned int            tx_queue_len;
        spinlock_t              tx_global_lock;
        int                     watchdog_timeo;

#ifdef CONFIG_XPS
        struct xps_dev_maps __rcu *xps_cpus_map;
        struct xps_dev_maps __rcu *xps_rxqs_map;
#endif
#ifdef CONFIG_NET_CLS_ACT
        struct mini_Qdisc __rcu *miniq_egress;
#endif

        /* These may be needed for future network-power-down code. */
        struct timer_list       watchdog_timer;

        int __percpu            *pcpu_refcnt;
        struct list_head        todo_list;

        struct list_head        link_watch_list;

        enum { NETREG_UNINITIALIZED=0,
               NETREG_REGISTERED,       /* completed register_netdevice */
               NETREG_UNREGISTERING,    /* called unregister_netdevice */
               NETREG_UNREGISTERED,     /* completed unregister todo */
               NETREG_RELEASED,         /* called free_netdev */
               NETREG_DUMMY,            /* dummy device for NAPI poll */
        } reg_state:8;

        bool dismantle;

        enum {
                RTNL_LINK_INITIALIZED,
                RTNL_LINK_INITIALIZING,
        } rtnl_link_state:16;

        bool needs_free_netdev;
        void (*priv_destructor)(struct net_device *dev);

#ifdef CONFIG_NETPOLL
        struct netpoll_info __rcu       *npinfo;
#endif

        possible_net_t                  nd_net;

        /* mid-layer private */
        union {
                void                                    *ml_priv;
                struct pcpu_lstats __percpu             *lstats;
                struct pcpu_sw_netstats __percpu        *tstats;
                struct pcpu_dstats __percpu             *dstats;
        };

#if IS_ENABLED(CONFIG_GARP)
        struct garp_port __rcu  *garp_port;
#endif
#if IS_ENABLED(CONFIG_MRP)
        struct mrp_port __rcu   *mrp_port;
#endif

        struct device           dev;
        const struct attribute_group *sysfs_groups[4];
        const struct attribute_group *sysfs_rx_queue_group;

        const struct rtnl_link_ops *rtnl_link_ops;

        /* for setting kernel sock attribute on TCP connection setup */
#define GSO_MAX_SIZE            65536
        unsigned int            gso_max_size;
#define GSO_MAX_SEGS            65535
        u16                     gso_max_segs;

#ifdef CONFIG_DCB
        const struct dcbnl_rtnl_ops *dcbnl_ops;
#endif
        s16                     num_tc;
        struct netdev_tc_txq    tc_to_txq[TC_MAX_QUEUE];
        u8                      prio_tc_map[TC_BITMASK + 1];

#if IS_ENABLED(CONFIG_FCOE)
        unsigned int            fcoe_ddp_xid;
#endif
#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
        struct netprio_map __rcu *priomap;
#endif
        struct phy_device       *phydev;
        struct sfp_bus          *sfp_bus;
        struct lock_class_key   *qdisc_tx_busylock;
        struct lock_class_key   *qdisc_running_key;
        bool                    proto_down;
        unsigned                wol_enabled:1;
};
View Code

这个结构体成员内容太多,这里我们挑部分参数介绍:

  • name:网卡设备名称;
  • mem_end:该设备的内存结束地址;
  • mem_start:该设备的内存起始地址;
  • base_addr:该设备的内存I/O基地址;
  • irq:该设备的中断号;
  • if_port:多端口设备使用的端口类型;
  • dma:该设备的DMA通道;
  • get_stats:获取流量的统计信息,运行ifconfig便会调用该成员函数,并返回一个net_device_stats结构体获取信息;
  • stats:用来保存统计信息的net_device_stats结构体;
  • features:接口特征;
  • flags:flags指网络接口标志,以IFF_(Interface Flags)开头,flags =IFF_UP( 当设备被激活并可以开始发送数据包时,内核设置该标志)、IFF_AUTOMEDIA(设置设备可在多种媒介间切换)、IFF_BROADCAST( 允许广播)、IFF_DEBUG( 调试模式,可用于控制printk调用的详细程度) 、IFF_LOOPBACK( 回环)、IFF_MULTICAST( 允许组播) 、IFF_NOARP( 接口不能执行ARP,点对点接口就不需要运行 ARP)和IFF_POINTOPOINT(接口连接到点到点链路)等。
  • mtu:最大传输单元,也叫最大数据包;
  • type:接口的硬件类型;
  • hard_header_len:硬件帧头长度,一般被赋为ETH_HLEN,即14;
  • perm_addr:存放网关地址;
  • dev_addr:MAC地址;
  • netdev_ops:网卡设备的操作函数集;

net_device结构体中包含net_dev_ops指针,该指针指向操作硬件的方法。

2.2.3 struct net_device_ops

net_device_ops结构体用于描述网络设备操作函数的集合,定义在include/linux/netdevice.h文件中,其数据结构定义如下:

/*
 * This structure defines the management hooks for network devices.
 * The following hooks can be defined; unless noted otherwise, they are
 * optional and can be filled with a null pointer.
 *
 * int (*ndo_init)(struct net_device *dev);
 *     This function is called once when a network device is registered.
 *     The network device can use this for any late stage initialization
 *     or semantic validation. It can fail with an error code which will
 *     be propagated back to register_netdev.
 *
 * void (*ndo_uninit)(struct net_device *dev);
 *     This function is called when device is unregistered or when registration
 *     fails. It is not called if init fails.
 *
 * int (*ndo_open)(struct net_device *dev);
 *     This function is called when a network device transitions to the up
 *     state.
 *
 * int (*ndo_stop)(struct net_device *dev);
 *     This function is called when a network device transitions to the down
 *     state.
 *
 * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb,
 *                               struct net_device *dev);
 *    Called when a packet needs to be transmitted.
 *    Returns NETDEV_TX_OK.  Can return NETDEV_TX_BUSY, but you should stop
 *    the queue before that can happen; it's for obsolete devices and weird
 *    corner cases, but the stack really does a non-trivial amount
 *    of useless work if you return NETDEV_TX_BUSY.
 *    Required; cannot be NULL.
 *
 * netdev_features_t (*ndo_features_check)(struct sk_buff *skb,
 *                       struct net_device *dev
 *                       netdev_features_t features);
 *    Called by core transmit path to determine if device is capable of
 *    performing offload operations on a given packet. This is to give
 *    the device an opportunity to implement any restrictions that cannot
 *    be otherwise expressed by feature flags. The check is called with
 *    the set of features that the stack has calculated and it returns
 *    those the driver believes to be appropriate.
 *
 * u16 (*ndo_select_queue)(struct net_device *dev, struct sk_buff *skb,
 *                         struct net_device *sb_dev);
 *    Called to decide which queue to use when device supports multiple
 *    transmit queues.
 *
 * void (*ndo_change_rx_flags)(struct net_device *dev, int flags);
 *    This function is called to allow device receiver to make
 *    changes to configuration when multicast or promiscuous is enabled.
 *
 * void (*ndo_set_rx_mode)(struct net_device *dev);
 *    This function is called device changes address list filtering.
 *    If driver handles unicast address filtering, it should set
 *    IFF_UNICAST_FLT in its priv_flags.
 *
 * int (*ndo_set_mac_address)(struct net_device *dev, void *addr);
 *    This function  is called when the Media Access Control address
 *    needs to be changed. If this interface is not defined, the
 *    MAC address can not be changed.
 *
 * int (*ndo_validate_addr)(struct net_device *dev);
 *    Test if Media Access Control address is valid for the device.
 *
 * int (*ndo_do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd);
 *    Called when a user requests an ioctl which can't be handled by
 *    the generic interface code. If not defined ioctls return
 *    not supported error code.
 *
 * int (*ndo_set_config)(struct net_device *dev, struct ifmap *map);
 *    Used to set network devices bus interface parameters. This interface
 *    is retained for legacy reasons; new devices should use the bus
 *    interface (PCI) for low level management.
 *
 * int (*ndo_change_mtu)(struct net_device *dev, int new_mtu);
 *    Called when a user wants to change the Maximum Transfer Unit
 *    of a device.
 *
 * void (*ndo_tx_timeout)(struct net_device *dev);
 *    Callback used when the transmitter has not made any progress
 *    for dev->watchdog ticks.
 *
 * void (*ndo_get_stats64)(struct net_device *dev,
 *                         struct rtnl_link_stats64 *storage);
 * struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
 *    Called when a user wants to get the network device usage
 *    statistics. Drivers must do one of the following:
 *    1. Define @ndo_get_stats64 to fill in a zero-initialised
 *       rtnl_link_stats64 structure passed by the caller.
 *    2. Define @ndo_get_stats to update a net_device_stats structure
 *       (which should normally be dev->stats) and return a pointer to
 *       it. The structure may be changed asynchronously only if each
 *       field is written atomically.
 *    3. Update dev->stats asynchronously and atomically, and define
 *       neither operation.
 *
 * bool (*ndo_has_offload_stats)(const struct net_device *dev, int attr_id)
 *    Return true if this device supports offload stats of this attr_id.
 *
 * int (*ndo_get_offload_stats)(int attr_id, const struct net_device *dev,
 *    void *attr_data)
 *    Get statistics for offload operations by attr_id. Write it into the
 *    attr_data pointer.
 *
 * int (*ndo_vlan_rx_add_vid)(struct net_device *dev, __be16 proto, u16 vid);
 *    If device supports VLAN filtering this function is called when a
 *    VLAN id is registered.
 *
 * int (*ndo_vlan_rx_kill_vid)(struct net_device *dev, __be16 proto, u16 vid);
 *    If device supports VLAN filtering this function is called when a
 *    VLAN id is unregistered.
 *
 * void (*ndo_poll_controller)(struct net_device *dev);
 *
 *    SR-IOV management functions.
 * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
 * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan,
 *              u8 qos, __be16 proto);
 * int (*ndo_set_vf_rate)(struct net_device *dev, int vf, int min_tx_rate,
 *              int max_tx_rate);
 * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, bool setting);
 * int (*ndo_set_vf_trust)(struct net_device *dev, int vf, bool setting);
 * int (*ndo_get_vf_config)(struct net_device *dev,
 *                int vf, struct ifla_vf_info *ivf);
 * int (*ndo_set_vf_link_state)(struct net_device *dev, int vf, int link_state);
 * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
 *              struct nlattr *port[]);
 *
 *      Enable or disable the VF ability to query its RSS Redirection Table and
 *      Hash Key. This is needed since on some devices VF share this information
 *      with PF and querying it may introduce a theoretical security risk.
 * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
 * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
 * int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
 *               void *type_data);
 *    Called to setup any 'tc' scheduler, classifier or action on @dev.
 *    This is always called from the stack with the rtnl lock held and netif
 *    tx queues stopped. This allows the netdevice to perform queue
 *    management safely.
 *
 *    Fiber Channel over Ethernet (FCoE) offload functions.
 * int (*ndo_fcoe_enable)(struct net_device *dev);
 *    Called when the FCoE protocol stack wants to start using LLD for FCoE
 *    so the underlying device can perform whatever needed configuration or
 *    initialization to support acceleration of FCoE traffic.
 *
 * int (*ndo_fcoe_disable)(struct net_device *dev);
 *    Called when the FCoE protocol stack wants to stop using LLD for FCoE
 *    so the underlying device can perform whatever needed clean-ups to
 *    stop supporting acceleration of FCoE traffic.
 *
 * int (*ndo_fcoe_ddp_setup)(struct net_device *dev, u16 xid,
 *                 struct scatterlist *sgl, unsigned int sgc);
 *    Called when the FCoE Initiator wants to initialize an I/O that
 *    is a possible candidate for Direct Data Placement (DDP). The LLD can
 *    perform necessary setup and returns 1 to indicate the device is set up
 *    successfully to perform DDP on this I/O, otherwise this returns 0.
 *
 * int (*ndo_fcoe_ddp_done)(struct net_device *dev,  u16 xid);
 *    Called when the FCoE Initiator/Target is done with the DDPed I/O as
 *    indicated by the FC exchange id 'xid', so the underlying device can
 *    clean up and reuse resources for later DDP requests.
 *
 * int (*ndo_fcoe_ddp_target)(struct net_device *dev, u16 xid,
 *                  struct scatterlist *sgl, unsigned int sgc);
 *    Called when the FCoE Target wants to initialize an I/O that
 *    is a possible candidate for Direct Data Placement (DDP). The LLD can
 *    perform necessary setup and returns 1 to indicate the device is set up
 *    successfully to perform DDP on this I/O, otherwise this returns 0.
 *
 * int (*ndo_fcoe_get_hbainfo)(struct net_device *dev,
 *                   struct netdev_fcoe_hbainfo *hbainfo);
 *    Called when the FCoE Protocol stack wants information on the underlying
 *    device. This information is utilized by the FCoE protocol stack to
 *    register attributes with Fiber Channel management service as per the
 *    FC-GS Fabric Device Management Information(FDMI) specification.
 *
 * int (*ndo_fcoe_get_wwn)(struct net_device *dev, u64 *wwn, int type);
 *    Called when the underlying device wants to override default World Wide
 *    Name (WWN) generation mechanism in FCoE protocol stack to pass its own
 *    World Wide Port Name (WWPN) or World Wide Node Name (WWNN) to the FCoE
 *    protocol stack to use.
 *
 *    RFS acceleration.
 * int (*ndo_rx_flow_steer)(struct net_device *dev, const struct sk_buff *skb,
 *                u16 rxq_index, u32 flow_id);
 *    Set hardware filter for RFS.  rxq_index is the target queue index;
 *    flow_id is a flow ID to be passed to rps_may_expire_flow() later.
 *    Return the filter ID on success, or a negative error code.
 *
 *    Slave management functions (for bridge, bonding, etc).
 * int (*ndo_add_slave)(struct net_device *dev, struct net_device *slave_dev);
 *    Called to make another netdev an underling.
 *
 * int (*ndo_del_slave)(struct net_device *dev, struct net_device *slave_dev);
 *    Called to release previously enslaved netdev.
 *
 *      Feature/offload setting functions.
 * netdev_features_t (*ndo_fix_features)(struct net_device *dev,
 *        netdev_features_t features);
 *    Adjusts the requested feature flags according to device-specific
 *    constraints, and returns the resulting flags. Must not modify
 *    the device state.
 *
 * int (*ndo_set_features)(struct net_device *dev, netdev_features_t features);
 *    Called to update device configuration to new features. Passed
 *    feature set might be less than what was returned by ndo_fix_features()).
 *    Must return >0 or -errno if it changed dev->features itself.
 *
 * int (*ndo_fdb_add)(struct ndmsg *ndm, struct nlattr *tb[],
 *              struct net_device *dev,
 *              const unsigned char *addr, u16 vid, u16 flags,
 *              struct netlink_ext_ack *extack);
 *    Adds an FDB entry to dev for addr.
 * int (*ndo_fdb_del)(struct ndmsg *ndm, struct nlattr *tb[],
 *              struct net_device *dev,
 *              const unsigned char *addr, u16 vid)
 *    Deletes the FDB entry from dev coresponding to addr.
 * int (*ndo_fdb_dump)(struct sk_buff *skb, struct netlink_callback *cb,
 *               struct net_device *dev, struct net_device *filter_dev,
 *               int *idx)
 *    Used to add FDB entries to dump requests. Implementers should add
 *    entries to skb and update idx with the number of entries.
 *
 * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh,
 *                 u16 flags, struct netlink_ext_ack *extack)
 * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
 *                 struct net_device *dev, u32 filter_mask,
 *                 int nlflags)
 * int (*ndo_bridge_dellink)(struct net_device *dev, struct nlmsghdr *nlh,
 *                 u16 flags);
 *
 * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
 *    Called to change device carrier. Soft-devices (like dummy, team, etc)
 *    which do not represent real hardware may define this to allow their
 *    userspace components to manage their virtual carrier state. Devices
 *    that determine carrier state from physical hardware properties (eg
 *    network cables) or protocol-dependent mechanisms (eg
 *    USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
 *
 * int (*ndo_get_phys_port_id)(struct net_device *dev,
 *                   struct netdev_phys_item_id *ppid);
 *    Called to get ID of physical port of this device. If driver does
 *    not implement this, it is assumed that the hw is not able to have
 *    multiple net devices on single physical port.
 *
 * int (*ndo_get_port_parent_id)(struct net_device *dev,
 *                 struct netdev_phys_item_id *ppid)
 *    Called to get the parent ID of the physical port of this device.
 *
 * void (*ndo_udp_tunnel_add)(struct net_device *dev,
 *                  struct udp_tunnel_info *ti);
 *    Called by UDP tunnel to notify a driver about the UDP port and socket
 *    address family that a UDP tunnel is listnening to. It is called only
 *    when a new port starts listening. The operation is protected by the
 *    RTNL.
 *
 * void (*ndo_udp_tunnel_del)(struct net_device *dev,
 *                  struct udp_tunnel_info *ti);
 *    Called by UDP tunnel to notify the driver about a UDP port and socket
 *    address family that the UDP tunnel is not listening to anymore. The
 *    operation is protected by the RTNL.
 *
 * void* (*ndo_dfwd_add_station)(struct net_device *pdev,
 *                 struct net_device *dev)
 *    Called by upper layer devices to accelerate switching or other
 *    station functionality into hardware. 'pdev is the lowerdev
 *    to use for the offload and 'dev' is the net device that will
 *    back the offload. Returns a pointer to the private structure
 *    the upper layer will maintain.
 * void (*ndo_dfwd_del_station)(struct net_device *pdev, void *priv)
 *    Called by upper layer device to delete the station created
 *    by 'ndo_dfwd_add_station'. 'pdev' is the net device backing
 *    the station and priv is the structure returned by the add
 *    operation.
 * int (*ndo_set_tx_maxrate)(struct net_device *dev,
 *                 int queue_index, u32 maxrate);
 *    Called when a user wants to set a max-rate limitation of specific
 *    TX queue.
 * int (*ndo_get_iflink)(const struct net_device *dev);
 *    Called to get the iflink value of this device.
 * void (*ndo_change_proto_down)(struct net_device *dev,
 *                 bool proto_down);
 *    This function is used to pass protocol port error state information
 *    to the switch driver. The switch driver can react to the proto_down
 *      by doing a phys down on the associated switch port.
 * int (*ndo_fill_metadata_dst)(struct net_device *dev, struct sk_buff *skb);
 *    This function is used to get egress tunnel information for given skb.
 *    This is useful for retrieving outer tunnel header parameters while
 *    sampling packet.
 * void (*ndo_set_rx_headroom)(struct net_device *dev, int needed_headroom);
 *    This function is used to specify the headroom that the skb must
 *    consider when allocation skb during packet reception. Setting
 *    appropriate rx headroom value allows avoiding skb head copy on
 *    forward. Setting a negative value resets the rx headroom to the
 *    default value.
 * int (*ndo_bpf)(struct net_device *dev, struct netdev_bpf *bpf);
 *    This function is used to set or query state related to XDP on the
 *    netdevice and manage BPF offload. See definition of
 *    enum bpf_netdev_command for details.
 * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp,
 *            u32 flags);
 *    This function is used to submit @n XDP packets for transmit on a
 *    netdevice. Returns number of frames successfully transmitted, frames
 *    that got dropped are freed/returned via xdp_return_frame().
 *    Returns negative number, means general error invoking ndo, meaning
 *    no frames were xmit'ed and core-caller will free all frames.
 * struct devlink_port *(*ndo_get_devlink_port)(struct net_device *dev);
 *    Get devlink port instance associated with a given netdev.
 *    Called with a reference on the netdevice and devlink locks only,
 *    rtnl_lock is not held.
 */
struct net_device_ops {
    int            (*ndo_init)(struct net_device *dev);
    void            (*ndo_uninit)(struct net_device *dev);
    int            (*ndo_open)(struct net_device *dev);
    int            (*ndo_stop)(struct net_device *dev);
    netdev_tx_t        (*ndo_start_xmit)(struct sk_buff *skb,
                          struct net_device *dev);
    netdev_features_t    (*ndo_features_check)(struct sk_buff *skb,
                              struct net_device *dev,
                              netdev_features_t features);
    u16            (*ndo_select_queue)(struct net_device *dev,
                            struct sk_buff *skb,
                            struct net_device *sb_dev);
    void            (*ndo_change_rx_flags)(struct net_device *dev,
                               int flags);
    void            (*ndo_set_rx_mode)(struct net_device *dev);
    int            (*ndo_set_mac_address)(struct net_device *dev,
                               void *addr);
    int            (*ndo_validate_addr)(struct net_device *dev);
    int            (*ndo_do_ioctl)(struct net_device *dev,
                            struct ifreq *ifr, int cmd);
    int            (*ndo_set_config)(struct net_device *dev,
                              struct ifmap *map);
    int            (*ndo_change_mtu)(struct net_device *dev,
                          int new_mtu);
    int            (*ndo_neigh_setup)(struct net_device *dev,
                           struct neigh_parms *);
    void            (*ndo_tx_timeout) (struct net_device *dev);

    void            (*ndo_get_stats64)(struct net_device *dev,
                           struct rtnl_link_stats64 *storage);
    bool            (*ndo_has_offload_stats)(const struct net_device *dev, int attr_id);
    int            (*ndo_get_offload_stats)(int attr_id,
                             const struct net_device *dev,
                             void *attr_data);
    struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);

    int            (*ndo_vlan_rx_add_vid)(struct net_device *dev,
                               __be16 proto, u16 vid);
    int            (*ndo_vlan_rx_kill_vid)(struct net_device *dev,
                                __be16 proto, u16 vid);
#ifdef CONFIG_NET_POLL_CONTROLLER
    void                    (*ndo_poll_controller)(struct net_device *dev);
    int            (*ndo_netpoll_setup)(struct net_device *dev,
                             struct netpoll_info *info);
    void            (*ndo_netpoll_cleanup)(struct net_device *dev);
#endif
    int            (*ndo_set_vf_mac)(struct net_device *dev,
                          int queue, u8 *mac);
    int            (*ndo_set_vf_vlan)(struct net_device *dev,
                           int queue, u16 vlan,
                           u8 qos, __be16 proto);
    int            (*ndo_set_vf_rate)(struct net_device *dev,
                           int vf, int min_tx_rate,
                           int max_tx_rate);
    int            (*ndo_set_vf_spoofchk)(struct net_device *dev,
                               int vf, bool setting);
    int            (*ndo_set_vf_trust)(struct net_device *dev,
                            int vf, bool setting);
    int            (*ndo_get_vf_config)(struct net_device *dev,
                             int vf,
                             struct ifla_vf_info *ivf);
    int            (*ndo_set_vf_link_state)(struct net_device *dev,
                             int vf, int link_state);
    int            (*ndo_get_vf_stats)(struct net_device *dev,
                            int vf,
                            struct ifla_vf_stats
                            *vf_stats);
    int            (*ndo_set_vf_port)(struct net_device *dev,
                           int vf,
                           struct nlattr *port[]);
    int            (*ndo_get_vf_port)(struct net_device *dev,
                           int vf, struct sk_buff *skb);
    int            (*ndo_set_vf_guid)(struct net_device *dev,
                           int vf, u64 guid,
                           int guid_type);
    int            (*ndo_set_vf_rss_query_en)(
                           struct net_device *dev,
                           int vf, bool setting);
    int            (*ndo_setup_tc)(struct net_device *dev,
                        enum tc_setup_type type,
                        void *type_data);
#if IS_ENABLED(CONFIG_FCOE)
    int            (*ndo_fcoe_enable)(struct net_device *dev);
    int            (*ndo_fcoe_disable)(struct net_device *dev);
    int            (*ndo_fcoe_ddp_setup)(struct net_device *dev,
                              u16 xid,
                              struct scatterlist *sgl,
                              unsigned int sgc);
    int            (*ndo_fcoe_ddp_done)(struct net_device *dev,
                             u16 xid);
    int            (*ndo_fcoe_ddp_target)(struct net_device *dev,
                               u16 xid,
                               struct scatterlist *sgl,
                               unsigned int sgc);
    int            (*ndo_fcoe_get_hbainfo)(struct net_device *dev,
                            struct netdev_fcoe_hbainfo *hbainfo);
#endif

#if IS_ENABLED(CONFIG_LIBFCOE)
#define NETDEV_FCOE_WWNN 0
#define NETDEV_FCOE_WWPN 1
    int            (*ndo_fcoe_get_wwn)(struct net_device *dev,
                            u64 *wwn, int type);
#endif

#ifdef CONFIG_RFS_ACCEL
    int            (*ndo_rx_flow_steer)(struct net_device *dev,
                             const struct sk_buff *skb,
                             u16 rxq_index,
                             u32 flow_id);
#endif
    int            (*ndo_add_slave)(struct net_device *dev,
                         struct net_device *slave_dev,
                         struct netlink_ext_ack *extack);
    int            (*ndo_del_slave)(struct net_device *dev,
                         struct net_device *slave_dev);
    netdev_features_t    (*ndo_fix_features)(struct net_device *dev,
                            netdev_features_t features);
    int            (*ndo_set_features)(struct net_device *dev,
                            netdev_features_t features);
    int            (*ndo_neigh_construct)(struct net_device *dev,
                               struct neighbour *n);
    void            (*ndo_neigh_destroy)(struct net_device *dev,
                             struct neighbour *n);

    int            (*ndo_fdb_add)(struct ndmsg *ndm,
                           struct nlattr *tb[],
                           struct net_device *dev,
                           const unsigned char *addr,
                           u16 vid,
                           u16 flags,
                           struct netlink_ext_ack *extack);
    int            (*ndo_fdb_del)(struct ndmsg *ndm,
                           struct nlattr *tb[],
                           struct net_device *dev,
                           const unsigned char *addr,
                           u16 vid);
    int            (*ndo_fdb_dump)(struct sk_buff *skb,
                        struct netlink_callback *cb,
                        struct net_device *dev,
                        struct net_device *filter_dev,
                        int *idx);
    int            (*ndo_fdb_get)(struct sk_buff *skb,
                           struct nlattr *tb[],
                           struct net_device *dev,
                           const unsigned char *addr,
                           u16 vid, u32 portid, u32 seq,
                           struct netlink_ext_ack *extack);
    int            (*ndo_bridge_setlink)(struct net_device *dev,
                              struct nlmsghdr *nlh,
                              u16 flags,
                              struct netlink_ext_ack *extack);
    int            (*ndo_bridge_getlink)(struct sk_buff *skb,
                              u32 pid, u32 seq,
                              struct net_device *dev,
                              u32 filter_mask,
                              int nlflags);
    int            (*ndo_bridge_dellink)(struct net_device *dev,
                              struct nlmsghdr *nlh,
                              u16 flags);
    int            (*ndo_change_carrier)(struct net_device *dev,
                              bool new_carrier);
    int            (*ndo_get_phys_port_id)(struct net_device *dev,
                            struct netdev_phys_item_id *ppid);
    int            (*ndo_get_port_parent_id)(struct net_device *dev,
                              struct netdev_phys_item_id *ppid);
    int            (*ndo_get_phys_port_name)(struct net_device *dev,
                              char *name, size_t len);
    void            (*ndo_udp_tunnel_add)(struct net_device *dev,
                              struct udp_tunnel_info *ti);
    void            (*ndo_udp_tunnel_del)(struct net_device *dev,
                              struct udp_tunnel_info *ti);
    void*            (*ndo_dfwd_add_station)(struct net_device *pdev,
                            struct net_device *dev);
    void            (*ndo_dfwd_del_station)(struct net_device *pdev,
                            void *priv);

    int            (*ndo_get_lock_subclass)(struct net_device *dev);
    int            (*ndo_set_tx_maxrate)(struct net_device *dev,
                              int queue_index,
                              u32 maxrate);
    int            (*ndo_get_iflink)(const struct net_device *dev);
    int            (*ndo_change_proto_down)(struct net_device *dev,
                             bool proto_down);
    int            (*ndo_fill_metadata_dst)(struct net_device *dev,
                               struct sk_buff *skb);
    void            (*ndo_set_rx_headroom)(struct net_device *dev,
                               int needed_headroom);
    int            (*ndo_bpf)(struct net_device *dev,
                       struct netdev_bpf *bpf);
    int            (*ndo_xdp_xmit)(struct net_device *dev, int n,
                        struct xdp_frame **xdp,
                        u32 flags);
    int            (*ndo_xsk_async_xmit)(struct net_device *dev,
                              u32 queue_id);
    struct devlink_port *    (*ndo_get_devlink_port)(struct net_device *dev);
};
View Code

其中:

  • ndo_start_xmit:数据包发送函数, sk_buff就是用来收发数据包的结构体;
  • ndo_tx_timeout:发包超时处理函数;

2.3 核心函数

2.3.1 分配/释放/改变sk_buff

linux内核中用于分配sk_buff的函数原型如下:

struct sk_buff *alloc_skb(unsigned int len, gfp_t priority);
struct sk_buff *dev_alloc_skb(unsigned int len);

alloc_skb()函数分配一个sk_buff和一个数据缓冲区,参数len为数据缓冲区的空间大小,通常以L1_CACHE_BYTES字节(对于32位ARM)对齐,参数priority为内存分配的优先级。

alloc_skb()分配出来的结构示意图:

可以看到刚分配的sk_buff成员head、data、tail均指向数据缓冲区的头部。end指向数据缓冲区的尾部。len=0,表示的是data到tail之间的数据长度。

dev_alloc_skb()函数以GFP_ATOMIC优先级进行skb的分配,原因是该函数经常在设备驱动的接收中断里被调用。

linux内核中用于释放sk_buff的函数原型如下:

void kfree_skb(struct sk_buff *skb);
void dev_kfree_skb(struct sk_buff *skb);
void dev_kfree_skb_irq(struct sk_buff *skb);
void dev_kfree_skb_any(struct sk_buff *skb);

Linux内核内部使用kree_skb()函数,而在网络设备驱动程序中则最好用dev_kfree_skb()、dev_kfree_skb_irq()或dev_kfree_skb_any()函数进行套接字缓冲区的释放。

其中:

  • dev_kfree_skb()函数用于非中断上下文;
  • dev_kfree_skb_irq()函数用于中断上下文;
  • dev_kfree_skb_any()函数在中断和非中断上下文中皆可采用,它其实是做一个非常简单的上下文判断,然后再调用dev_kfree_skb_irq()或者dev_kfree_skb()。

linux内核中可以用如下函数在缓冲区尾部增加数据:

unsigned char *skb_put(struct sk_buff *skb, unsigned int len);

它会导致skb->tail后移len(skb->tail+=len),而skb->len会增加len的大小(skb->len+=len),通常,在设备驱动的接收数据处理中会调用此函数。

linux内核中可以用如下函数在缓冲区开头增加数据:

unsigned char *skb_push(struct sk_buff *skb, unsigned int len);

它会导致skb->data前移len(skb->data-=len),而skb->len会增加len的大小(skb->len+=len) 。

与该函数的功能完成相反的函数是skb_pull(),它可以在缓冲区开头移除数据,执行的动作是skb->len-=len、skb->data+=len。

linux内核中可以使用如下函数在skb_reserve 在数据缓冲区的头部保留一些空间,通常用于允许插入协议头或强制将数据在某个边界上对齐。

 static inline void skb_reserve(struct sk_buff *skb, int len)
 {
      /* 数据区data指针增加len字节*/
      skb->data += len;
      /* 数据区tail指针增加len字节 */
      skb->tail += len;
 }

它通过移动标记数据域开始和结束的data和tail指针来完成操作。网卡驱动程序中接收函数,把任何数据存储在刚分配到的数据缓冲区之前都会执行

skb_reserve(skb, 2);     // 把IP对齐在16字节地址边界上

然后把一个14字节的ethernet帧拷贝到数据缓冲区, 这样IP报头就可以从缓冲区开始按照16字节边界对齐,并紧接在ethernet报头之后。

2.3.2 dev_queue_xmit

网络协议接口层为上层协议提供了dev_queue_xmit()函数用于数据包的发送,函数原型定义在net/core/dev.c文件中:

int dev_queue_xmit(struct sk_buff *skb);
2.3.3 netif_rx

上层对数据包的接收也通过向netif_rx()函数传递一个struct sk_buff数据结构的指针来完成。netif_rx()函数的原型为:

int netif_rx(struct sk_buff *skb);
2.3.4 分配/注册/卸载net_device

linux内核中用于分配net_device的函数原型如下:

/* Support for loadable net-drivers */
struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
                                    unsigned char name_assign_type,
                                    void (*setup)(struct net_device *),
                                    unsigned int txqs, unsigned int rxqs);
int dev_get_valid_name(struct net *net, struct net_device *dev,
                       const char *name);

#define alloc_netdev(sizeof_priv, name, name_assign_type, setup) \
        alloc_netdev_mqs(sizeof_priv, name, name_assign_type, setup, 1, 1)

sizeof_priv代表额外分配的内存,用于存储私有数据,设置为0代表不分配额外内存;

name表示网卡名字;

name_assign_type:通常填写NET_NAME_ENUM即可;

/* interface name assignment types (sysfs name_assign_type attribute) */
#define NET_NAME_UNKNOWN    0    /* unknown origin (not exposed to userspace) */
#define NET_NAME_ENUM        1    /* enumerated by kernel */
#define NET_NAME_PREDICTABLE    2    /* predictably named by the kernel */
#define NET_NAME_USER        3    /* provided by user-space */
#define NET_NAME_RENAMED    4    /* renamed by user-space */

setup: nnet_device的setuo()函数指针,一般设置为ether_setup,ether_setup是一个回调函数,使用设置以太网设备通用值,来设置分配net_device结构体的一些成员;

txqs和rxqs为要分配的发送和接收自队列的数量。

linux内核中向内核注册net_device的函数原型如下:

int register_netdev(struct net_device *dev);
int register_netdevice(struct net_device *dev);

register_netdev()是对register_netdevice()的包装函数。在调用register_netdev()注册设备时,如果指定的名称中包含%d格式串(只支持%d),内核会选择一个适当的数字来替换格式化串,真正的注册工作由register_netdevice()来完成。

linux内核中用于卸载net_device的函数原型如下:

void unregister_netdev(struct net_device *dev);
void unregister_netdevice(struct net_device *dev);
2.3.5 netif_stop_queue/netif_wake_queue

netif_stop_queue函数定义在include/linux/netdevice.h文件中,用来阻止上层调用hard_start_xmit进行数据包的传输;

/**
 *      netif_stop_queue - stop transmitted packets
 *      @dev: network device
 *
 *      Stop upper layers calling the device hard_start_xmit routine.
 *      Used for flow control when transmit resources are unavailable.
 */
static inline void netif_stop_queue(struct net_device *dev)
{
        netif_tx_stop_queue(netdev_get_tx_queue(dev, 0));
}

相反。netif_wake_queue用来唤醒上层进行数据包的传输,荀彧上层调用hard_start_xmit。

/**
 *      netif_wake_queue - restart transmit
 *      @dev: network device
 *
 *      Allow upper layers to call the device hard_start_xmit routine.
 *      Used for flow control when transmit resources are available.
 */
static inline void netif_wake_queue(struct net_device *dev)
{
        netif_tx_wake_queue(netdev_get_tx_queue(dev, 0));
}

三、编写网卡驱动程序

这里我们将会尝试编写一个虚拟的网卡驱动程序,由于没有真实的网卡,不会接收到数据,不能实现接收中断,所以将收包函数放在发包函数里,将要发送的skb_buff数据再提交上层;然后我们就可以通过linux的ping命令来实现发包,同时实现ping包的接收。

内核驱动里接收数据包主要是通过中断函数处理,中断类型如果等于ISQ_RECEIVER_EVENT表示为接收中断,然后进入接收数据函数,通过netif_rx()将数据上交给上层。

虚拟网卡驱动大致包含三部分:

  • 进行网卡驱动初始化;
  • 发送数据包;
  • 接收数据包;

3.1 项目结构

我们在/work/sambashare/drivers路径下创建项目19.vent,创建为vent_dev.c。

3.2 网卡驱动初始化

初始化步骤如下:

  • 使用alloc_netdev()为网络设备分配一个net_device结构体;
  • 设置与网卡设备硬件相关的寄存器(虚拟网卡这步忽略);
  • 设置net_device结构体的成员;
    • 设置网络设备的操作函数集,如上层发送数据会最终调用到ndo_start_xmit()函数;
    • 设置网卡设备的MAC,这里可以随意设置,如果是真正的网卡设备,需要获取网卡硬件的MAC地址;
    • 设置虚拟网卡通信的标志flags,由于是虚拟网卡,并没有真正的和实际的网络设备进行通信,上报的数据只是我们人为构造的,所有不需要在通信前使用ARP(地址解析协议)获取通信设备的MAC地址。如果使能了使用ARP协议去获取相应IP的设备的MAC地址将会导致错误;
  • 使用register_netdev()向内核注册网络设备;

代码如下:

/*
 * init入口函数
*/
static int vnet_init(void)
{
   /* 1. 分配一个net_device结构体 */
   vnet_dev = alloc_netdev(0, "vnet%d", NET_NAME_ENUM, ether_setup);

   /* 2.设置网络设备的操作函数集,如上层发送数据会最终调用到ndo_start_xmit()函数 */
   vnet_dev->netdev_ops = &vnet_ops;

   /* 3. 设置网卡设备的MAC,这里可以随意设置,如果是真正的网卡设备,需要获取网卡硬件的MAC地址 */
   vnet_dev->dev_addr[0] = 0x89;
   vnet_dev->dev_addr[1] = 0x89;
   vnet_dev->dev_addr[2] = 0x89;
   vnet_dev->dev_addr[3] = 0x89;
   vnet_dev->dev_addr[4] = 0x89;
   vnet_dev->dev_addr[5] = 0x89;

   /* 4. 设置虚拟网卡通信的标志flags */
   vnet_dev->flags |= IFF_NOARP;

   /* 5.向内核注册网络设备  */
   register_netdev(vnet_dev);

   return 0;
}

3.3 发送数据包

发送数据包步骤如下:

  • 发送数据时,使用netif_stop_queue()来阻止上层将新的数据传送进来;
  • 调用接收数据包函数,并代入发送的sk_buff缓冲区,里面来伪造一个收的ping包函数提交上层;这样当上层有数据发送时,由于构造到了一个相同类型的应答信息返回给上层,上层协议就能认为,当前网络设备能和给定ip的设备间能够正常的通信;
  • 使用dev_kfree_skb()函数来释放发送的sk_buff缓存区;
  • 更新发送的统计信息,记录总共发送包的个数和总共发送的字节数;
  • 使用netif_wake_queue()来唤醒被阻塞的上层,让上层协议继续调用设备数据操作函数传递数据;

代码如下:

/**
 * 发送数据包
 */
static netdev_tx_t vnet_send_packet(struct sk_buff *skb, struct net_device *dev)
{
   static int cnt = 0;
   int i = 0;

   printk("vnet_send_packet: cnt = %d\n", ++cnt);

   // 输出发送以太帧长度
   printk("vnet_send_packet: length = %d\n", skb->len);
   for(i=0;i<skb->len;i++) {
      printk(KERN_CONT "0x%02x ", *(skb->data+i));
      if((i+1)%16==0){
        printk("");
      }
   }

    /* 1. 发送数据时,阻止上层将新的数据传送进来 */
   netif_stop_queue(dev);

    /* 2. 调用接收数据包函数,并代入发送的sk_buff缓冲区,里面来伪造一个收的ping包函数提交上层 */
    emulator_rx_packet(skb,dev);

    /* 3. 释放发送的sk_buff缓存区 */
    dev_kfree_skb(skb);

    /* 4 更新发送的统计信息,记录总共发送包的个数和总共发送的字节数 */
     dev->stats.tx_packets++;
     dev->stats.tx_bytes += skb->len;

    /* 5. 唤醒被阻塞的上层,让上层协议继续调用设备数据操作函数传递数据 */
    netif_wake_queue(dev);

    return 0;
}

3.4 接收数据包

我们以ping www.baidu.com 命令为例,我么可以通过wirkshark抓取ping数据包,以下面一帧74字节的数据为例:

由于是构造应答数据包,需要将请求数据的源MAC、目标MAC,源IP、目标IP内容调换,并设置数据包类型,使用调换后的信息构造应答的skb_buff。

  • 将发送的skb_buff缓冲区中的源MAC和目标MAC内容调换;
  • 将发送的skb_buff缓冲区中的源IP和目标IP内容调换;
  • 设置数据包的数据类型,之前是发送ping包0x08,需要改为0x00,表示接收ping包;
  • 使用ip_fast_csum()来重新获取iphdr结构体的校验码;
  • 使用dev_alloc_skb()来构造一个新的sk_buff;
  • 使用skb_reserve(rx_skb, 2)将sk_buff缓冲区里的数据向后位移2字节,来腾出sk_buff缓冲区里的头部空间;
  • 使用memcpy()将之前修改好的sk_buff->data复制到新的sk_buff里的data成员指向的地址处;
  • 设置新的skb_buff中的net_device;
  • 使用eth_type_trans()来获取上层协议,即将skb_buff->data指向上层协议数据,并将返回值赋值给sk_buff的protocol成员变量;
  • 更新接收统计信息,最后使用netif_rx()函数将sk_buff传递给上层;

代码如下:

/**
 * 模拟接收数据包
 */
static void emulator_rx_packet(struct sk_buff *skb, struct net_device *dev)
{
    // 以太帧头部
    struct ethhdr *ethhdr;

    // IP数据报头部
    struct iphdr *ih;

    // 临时max缓冲区
    unsigned char mac_addr[ETH_ALEN];

    // 临时ip缓冲区
     __be32 *saddr, *daddr, tmp;

     // icmp类型
     unsigned char *type;

     // 接收套接字缓冲区
     struct sk_buff * rx_skb;

    int i = 0;

    /* 1. 将发送的skb_buff缓冲区中的源MAC和目标MAC内容调换 */
    ethhdr = (struct ethhdr*)skb->data;
    memcpy(mac_addr,ethhdr->h_dest,ETH_ALEN);
    memcpy(ethhdr->h_dest, ethhdr->h_source, ETH_ALEN);
    memcpy(ethhdr->h_source, mac_addr, ETH_ALEN);

    /* 2.将发送的skb_buff缓冲区中的源IP和目标IP内容调换 */
    ih = (struct iphdr *)(skb->data + sizeof(struct ethhdr));
    saddr = &ih->saddr;
    daddr = &ih->daddr;
    tmp = *saddr;
    *saddr = *daddr;
    *daddr = tmp;

    /* 3. 设置数据包的数据类型,之前是发送ping包0x08,需要改为0x00,表示接收ping包 */
    type = skb->data + sizeof(struct ethhdr) + sizeof(struct iphdr);
    *type = 0x00;

    /* 4. 重新获取iphdr结构体的校验码,只校验ip数据报的首部 */
    ih->check = 0x00;
    ih->check = ip_fast_csum((unsigned char *)ih, ih->ihl);

    /* 5. 构造一个新的sk_buff */
     rx_skb = dev_alloc_skb(skb->len + 2);  // rx_skb->head、rx_skb->data、rx_skb->tail均指向缓冲区头部,rx_skb->end指向缓冲区尾部

    /* 6. 将sk_buff缓冲区里的数据向后移2字节,来腾出sk_buff缓冲区里的头部空间 */
     skb_reserve(rx_skb, 2);   /* align IP on 16B boundary */

    /* 7. 将之前修改好的sk_buff->data复制到新的sk_buff里的data成员指向的地址处 */
     memcpy(skb_put(rx_skb, skb->len), skb->data, skb->len);

    /* 8. 设置新的skb_buff中的net_device */
    rx_skb->dev = dev;

    /* 9. 设置sk_buff的protocol成员变量 */
    rx_skb->protocol = eth_type_trans(rx_skb, dev);  // 处理后的rx_skb->data会跳过以太帧报头  rx_skb->len-=ETH_HLEN、rx_skb->data+=ETH_HLEN
    rx_skb->ip_summed = CHECKSUM_UNNECESSARY;

    /* 10. 更新接收统计信息 */
    dev->stats.rx_packets++;
    dev->stats.rx_bytes += skb->len;

    // 输出发送以太帧长度
    printk("emulator_rx_packet: length = %d\n", rx_skb->len);
    for(i=0;i<rx_skb->len;i++) {
      printk(KERN_CONT "0x%02x ", *(rx_skb->data+i));
      if((i+1)%16==0){
        printk("");
      }
    }

    /* 11. 将sk_buff传递给上层 */
    netif_rx(rx_skb);
}

3.5 完整代码

#include <linux/module.h>
#include <linux/types.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/bitops.h>
#include <linux/ip.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/skbuff.h>
#include <linux/platform_device.h>

/* 虚拟网络设备 */
static struct net_device  *vnet_dev;

/**
 * 模拟接收数据包
 */
static void emulator_rx_packet(struct sk_buff *skb, struct net_device *dev)
{
    // 以太帧头部
    struct ethhdr *ethhdr;

    // IP数据报头部
    struct iphdr *ih;

    // 临时max缓冲区
    unsigned char mac_addr[ETH_ALEN];

    // 临时ip缓冲区
     __be32 *saddr, *daddr, tmp;

     // icmp类型
     unsigned char *type;

     // 接收套接字缓冲区
     struct sk_buff * rx_skb;

    int i = 0;

    /* 1. 将发送的skb_buff缓冲区中的源MAC和目标MAC内容调换 */
    ethhdr = (struct ethhdr*)skb->data;
    memcpy(mac_addr,ethhdr->h_dest,ETH_ALEN);
    memcpy(ethhdr->h_dest, ethhdr->h_source, ETH_ALEN);
    memcpy(ethhdr->h_source, mac_addr, ETH_ALEN);

    /* 2.将发送的skb_buff缓冲区中的源IP和目标IP内容调换 */
    ih = (struct iphdr *)(skb->data + sizeof(struct ethhdr));
    saddr = &ih->saddr;
    daddr = &ih->daddr;
    tmp = *saddr;
    *saddr = *daddr;
    *daddr = tmp;

    /* 3. 设置数据包的数据类型,之前是发送ping包0x08,需要改为0x00,表示接收ping包 */
    type = skb->data + sizeof(struct ethhdr) + sizeof(struct iphdr);
    *type = 0x00;

    /* 4. 重新获取iphdr结构体的校验码 */
    ih->check = 0x00;
    ih->check = ip_fast_csum((unsigned char *)ih, ih->ihl);

    /* 5. 构造一个新的sk_buff */
     rx_skb = dev_alloc_skb(skb->len + 2);  // rx_skb->head、rx_skb->data、rx_skb->tail均指向缓冲区头部,rx_skb->end指向缓冲区尾部

    /* 6. 将sk_buff缓冲区里的数据向后移2字节,来腾出sk_buff缓冲区里的头部空间 */
     skb_reserve(rx_skb, 2);   /* align IP on 16B boundary */

    /* 7. 将之前修改好的sk_buff->data复制到新的sk_buff里的data成员指向的地址处 */
     memcpy(skb_put(rx_skb, skb->len), skb->data, skb->len);

    /* 8. 设置新的skb_buff中的net_device */
    rx_skb->dev = dev;

    /* 9. 设置sk_buff的protocol成员变量 */
    rx_skb->protocol = eth_type_trans(rx_skb, dev);  // 处理后的rx_skb->data会跳过以太帧报头  rx_skb->len-=ETH_HLEN、rx_skb->data+=ETH_HLEN
    rx_skb->ip_summed = CHECKSUM_UNNECESSARY;

    /* 10. 更新接收统计信息 */
    dev->stats.rx_packets++;
    dev->stats.rx_bytes += skb->len;

    // 输出发送以太帧长度
    printk("emulator_rx_packet: length = %d\n", rx_skb->len);
    for(i=0;i<rx_skb->len;i++) {
      printk(KERN_CONT "0x%02x ", *(rx_skb->data+i));
      if((i+1)%16==0){
        printk("");
      }
    }

    /* 11. 将sk_buff传递给上层 */
    netif_rx(rx_skb);
}

/**
 * 发送数据包
 */
static netdev_tx_t vnet_send_packet(struct sk_buff *skb, struct net_device *dev)
{
   static int cnt = 0;
   int i = 0;

   printk("vnet_send_packet: cnt = %d\n", ++cnt);

   // 输出发送以太帧长度
   printk("vnet_send_packet: length = %d\n", skb->len);
   for(i=0;i<skb->len;i++) {
      printk(KERN_CONT "0x%02x ", *(skb->data+i));
      if((i+1)%16==0){
        printk("");
      }
   }

    /* 1. 发送数据时,阻止上层将新的数据传送进来 */
   netif_stop_queue(dev);

    /* 2. 调用接收数据包函数,并代入发送的sk_buff缓冲区,里面来伪造一个收的ping包函数提交上层 */
    emulator_rx_packet(skb,dev);

    /* 3. 释放发送的sk_buff缓存区 */
    dev_kfree_skb(skb);

    /* 4 更新发送的统计信息,记录总共发送包的个数和总共发送的字节数 */
     dev->stats.tx_packets++;
     dev->stats.tx_bytes += skb->len;

    /* 5. 唤醒被阻塞的上层,让上层协议继续调用设备数据操作函数传递数据 */
    netif_wake_queue(dev);

    return 0;
}


/* 网卡设备操作函数集 */
static const struct net_device_ops vnet_ops = {
    .ndo_start_xmit   = vnet_send_packet,      // 发送数据包
};

/*
 * init入口函数
*/
static int vnet_init(void)
{
   /* 1. 分配一个net_device结构体 */
   vnet_dev = alloc_netdev(0, "vnet%d", NET_NAME_ENUM, ether_setup);

   /* 2.设置网络设备的操作函数集,如上层发送数据会最终调用到ndo_start_xmit()函数 */
   vnet_dev->netdev_ops = &vnet_ops;

   /* 3. 设置网卡设备的MAC,这里可以随意设置,如果是真正的网卡设备,需要获取网卡硬件的MAC地址 */
   vnet_dev->dev_addr[0] = 0x89;
   vnet_dev->dev_addr[1] = 0x89;
   vnet_dev->dev_addr[2] = 0x89;
   vnet_dev->dev_addr[3] = 0x89;
   vnet_dev->dev_addr[4] = 0x89;
   vnet_dev->dev_addr[5] = 0x89;

   /* 4. 设置虚拟网卡通信的标志flags */
   vnet_dev->flags |= IFF_NOARP;

   /* 5.向内核注册网络设备  */
   register_netdev(vnet_dev);

   return 0;
}

/*
 * exit出口函数
 */
static void vnet_exit(void)
{
    unregister_netdev(vnet_dev);
    free_netdev(vnet_dev);
}

module_init(vnet_init);
module_exit(vnet_exit);
MODULE_LICENSE("GPL");
View Code

3.6 测试

3.6.1 编译虚拟网卡驱动

编译虚拟网卡驱动,将 vnet_dev.ko拷贝到nfs根文件系统。

root@zhengyang:/work/sambashare/drivers/19.vnet_dev# cp /work/sambashare/drivers/19.vnet_dev/vnet_dev.ko /work/nfs_root/rootfs

启动开发板,加载驱动:

[root@zy:/]# insmod vnet_dev.ko
vnet_dev: loading out-of-tree module taints kernel.

网卡设备与字符设备、块设备不同,网卡设备并不在/dev目录下,而是存放在/sys/class/net目录下。如下图,可以看到net类下有了这个网卡设备:

[root@zy:/]# ls /sys/class/net
eth0   lo     sit0   vnet0
3.6.2 配置虚拟网卡

此时使用ifconfig只能看到一块网卡:

[root@zy:/]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:45:18:A0:AC:CA
          inet addr:192.168.0.105  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::5045:18ff:fea0:acca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1981 errors:0 dropped:0 overruns:0 frame:0
          TX packets:767 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2653725 (2.5 MiB)  TX bytes:128314 (125.3 KiB)
          Interrupt:55 Base address:0x8300

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ifconfig -a可以看到我们加入的虚拟网卡,此时这块网卡还没有启用:

[root@zy:/]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 52:45:18:A0:AC:CA
          inet addr:192.168.0.105  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::5045:18ff:fea0:acca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2088 errors:0 dropped:0 overruns:0 frame:0
          TX packets:817 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2724795 (2.5 MiB)  TX bytes:137254 (134.0 KiB)
          Interrupt:55 Base address:0x8300

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

vnet0     Link encap:Ethernet  HWaddr 89:89:89:89:89:89  // 新加入的虚拟网卡没有设置ip
          BROADCAST NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

首先设置虚拟网卡vnet0的ip:

[root@zy:/]# ifconfig vnet0 3.3.3.3   // 配置虚拟网卡
vnet_send_packet: cnt = 1vnet_send_packet: cnt = 2
[root@zy:/]# ifconfig

eth0      Link encap:Ethernet  HWaddr 52:45:18:A0:AC:CA
          inet addr:192.168.0.105  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::5045:18ff:fea0:acca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2105 errors:0 dropped:0 overruns:0 frame:0
          TX packets:859 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2728356 (2.6 MiB)  TX bytes:155358 (151.7 KiB)
          Interrupt:55 Base address:0x8300

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

vnet0     Link encap:Ethernet  HWaddr 89:89:89:89:89:89                 // 虚拟网卡
          inet addr:3.3.3.3  Bcast:3.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::8b89:89ff:fe89:8989/64 Scope:Link
          UP BROADCAST RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:140 (140.0 B)  TX bytes:140 (140.0 B)
3.6.3 ping测试

执行如下命令ping自己: ping 3.3.3.3 ,当ping自己时,使用回环网卡,没有调用到网卡驱动发包函数。

[root@zy:/]# ping 3.3.3.3
PING 3.3.3.3 (3.3.3.3): 56 data bytes
64 bytes from 3.3.3.3: seq=0 ttl=64 time=1.585 ms
64 bytes from 3.3.3.3: seq=1 ttl=64 time=0.907 ms
64 bytes from 3.3.3.3: seq=2 ttl=64 time=0.897 ms
64 bytes from 3.3.3.3: seq=3 ttl=64 time=0.900 ms

执行如下命令ping网络:  ping 3.3.3.4 ,使用我们编写的虚拟网卡驱动了,调用到网卡驱动发包函数。

[root@zy:/]# ping 3.3.3.4
PING 3.3.3.4 (3.3.3.4): 56 data bytes
vnet_send_packet: cnt = 9
64 bytes from 3.3.3.4: seq=0 ttl=64 time=1.087 ms
vnet_send_packet: cnt = 10
64 bytes from 3.3.3.4: seq=1 ttl=64 time=0.768 ms
vnet_send_packet: cnt = 11
64 bytes from 3.3.3.4: seq=2 ttl=64 time=0.749 ms

可以执行ifconfig查看,统计信息变化了:

[root@zy:/]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:45:18:A0:AC:CA
          inet addr:192.168.0.105  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::5045:18ff:fea0:acca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2273 errors:0 dropped:0 overruns:0 frame:0
          TX packets:960 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2807766 (2.6 MiB)  TX bytes:179088 (174.8 KiB)
          Interrupt:55 Base address:0x8300

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2856 (2.7 KiB)  TX bytes:2856 (2.7 KiB)

vnet0     Link encap:Ethernet  HWaddr 89:89:89:89:89:89
          inet addr:3.3.3.3  Bcast:3.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::8b89:89ff:fe89:8989/64 Scope:Link
          UP BROADCAST RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0         // 接收了12数据包
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0       // 发送12数据包
          collisions:0 txqueuelen:1000
          RX bytes:968 (968.0 B)  TX bytes:968 (968.0 B)              // 接收/发送字节数
3.6.4 输出以太帧信息
[root@zy:/]# ping 3.3.3.4
PING 3.3.3.4 (3.3.3.4): 56 data bytes
0x01 0x01 0x89 0x89 0x89 0x89 0x89 0x89
vnet_send_packet: cnt = 2                                              # 发送的
vnet_send_packet: length = 98
0x89 0x89 0x89 0x89 0x89 0x89 0x89 0x89 0x89 0x89 0x89 0x89 0x08 0x00 0x45 0x00    # 源/目的 max 0x89 0x89 0x89 0x89 0x89 0x89
0x00 0x54 0xf8 0xda 0x40 0x00 0x40 0x01 0x35 0xc2 0x03 0x03 0x03 0x03 0x03 0x03    # IP数据报长度0x54=84 
0x03 0x04 0x08 0x00 0x62 0x67 0x3c 0x00 0x00 0x00 0xe6 0x94 0x73 0x03 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00
emulator_rx_packet: length = 84                                      # 模拟接收的
0x45 0x00 0x00 0x54 0xf8 0xda 0x40 0x00 0x40 0x01 0x35 0xc2 0x03 0x03 0x03 0x04    # 这里只有IP数据报 没有以太帧头信息 这是因为发送的数据经过了eth_type_trans()函数处理,改变了skb->data
0x03 0x03 0x03 0x03 0x00 0x00 0x62 0x67 0x3c 0x00 0x00 0x00 0xe6 0x94 0x73 0x03
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00
64 bytes from 3.3.3.4: seq=0 ttl=64 time=88.367 ms

四、代码下载

Young / s3c2440_project[drivers]

参考文章:

[1] Linux 网卡驱动程序

[2] 26.Linux-网卡驱动介绍以及制作虚拟网卡驱动(详解)

[3]十九、Linux驱动之虚拟网卡驱动

[4]一篇文章让你通俗理解OSI七层模型(TCP/IP模型)

[5]计算机网络03:数据链路层

[6]TCP IP ICMP 以太网帧格式

posted @ 2023-01-29 21:25  大奥特曼打小怪兽  阅读(1106)  评论(0编辑  收藏  举报
如果有任何技术小问题,欢迎大家交流沟通,共同进步