linux驱动移植-Nand Flash ONFI标准和MTD子系统【转】

转自:https://www.cnblogs.com/zyly/p/16756273.html#_label0

目录

 


一、ONFI标准

Nand Flash是嵌入式世界里常见的存储器,对于嵌入式开发而言,Nand Flash主要分为两大类:Serial Nand、Raw Nand,这两类Nand的差异是很大的。

Raw Nand是相对于Serial Nand而言的,Serial Nand即串行接口的Nand Flash,比如采用SPI通信协议的Nand Flash,而Raw Nand是并行接口的Nand Flash。

这里我们首先介绍ONFI协议,主要是因为在Nand Flash驱动源码分析的时候涉及到ONFI协议。而我们使用的K9F2G08U0C这款芯片并没有支持ONFI协议,我们将该芯片支持的命令和ONFI 1.0规定的命令对比就可以发现。

1.1 ONFI标准

说到Raw Nand发展史,其实早期的Raw Nand没有统一标准,虽然早在1989年Toshiba便发表了Nand Flash结构,但具体到Raw Nand芯片,各厂商都是自由设计,因此尺寸不统一、存储结构差异大、接口命令不通用等问题导致客户使用起来很难受。

为了改变这一现状,2006年几个主流的Raw Nand厂商(Hynix、Intel、Micron、Phison、Sony、ST)联合起来商量制订一个Raw Nand标准,这个标准叫Open Nand Flash Interface,简称ONFI,2006年12月ONFI 1.0标准正式推出,此后几乎所有的Raw Nand厂商都按照ONFI标准设计生产Raw Nand,从此不管哪家生产的Raw Nand对嵌入式设计者来说几乎都是一样的,至少在驱动代码层面是一样的。

ONFI官网:http://www.onfi.org/,在这里我们下载到ONFI协议规范:

1.2 Raw Nand分类

1.2.1 单元层数

Nand Flash内存单元按照层数可以分为:

  • 单层单元(Single Level Cell,简称SLC):这种类型的闪存在读写数据时具有最为精确,并且还具有持续最长的数据读写寿命的优点。SLC擦写寿命约在9万到10万次之间。这种类型的闪存由于其使用寿命,准确性和综合性能,在企业市场上十分受众。但由于储存成本高、存储容量相对较小,在家用市场则不太受青睐。
  • 多层单元(Multi Level Cell,简称MLC):它的命名来源于它在SLC的1位/单元的基础上,变成了2位/单元。这样做的一大优势在于大大降低了大容量储存闪存的成本,约3000--10000次擦写寿命。
  • 三层单元(Triple Level Cell,简称TLC):TLC闪存是闪存生产中最低廉的规格,其储存达到了3位/单元,虽然高储存密度实现了较廉价的大容量格式,但其读写的生命周期被极大地缩短,擦写寿命只有短短的500~1000次,同时读写速度较差,只适合普通消费者使用,不能达到工业使用的标准。
  • 四层单元(Quad Lebel Cell,简称QLC):QLC每个单元可储存4bit数据,跟TLC相比,QLC的储存密度提高了33%。QLC不仅能经受1000次编程或擦写循环(与TLC相当,甚至更好),而且容量提升了,成本也更低。

结论:SLC>MLC>TLC。

目前大多数U盘都是采用TLC芯片颗粒,其优点是价格便宜,不过速度一般,寿命相对较短。

而SSD固态硬盘中,目前MLC颗粒固态硬盘是主流,其价格适中,速度与寿命相对较好,而低价SSD固态硬盘普遍采用的是TLC芯片颗粒,大家在购买固态硬盘的时候,可以在产品参数中去了解。 

SLC颗粒固态目前主要在一些高端固态硬盘中出现,售价多数上千元,甚至更贵。

智能手机方面,目前多数智能手机存储也是采用TLC芯片存储,而苹果iPhone6部分产品采用的TLC芯片,另外还有部分采用的是MLC芯片颗粒。总的来说,MLC闪存芯片颗粒是时下主流,产品在速度、寿命以及价格上适中,比较适合推荐。

1.2.2 数据线宽度

数据线宽度可以分为x8 、x16。

1.2.3 数据采集模式

数据采集模式可以分为 SDR、DDR。

1.2.4 接口命令标准

接口命令标准可以分为:非标、ONFI。

1.3 Raw Nand内存模型

ONFI规定了Raw Nand内存单元从大到小最多分为:Device、LUN(Die、Target)、Plane、Block、Page、Cell。

  • Device:就是指单片Nand Flash,对外提供Package封装的芯片,1个Device包含1个或者多个LUN;
  • LUN(Die、Target):是接收和执行Flash命令的基本单元,1个LUN包含1个或者多个plane。
  • Plane:1个Plane包含多个Block。
  • Block:能够执行擦除操作的最小单元,通常由多个Page组成。
  • Page:能够执行编程和读操作的最小单元,通常大小为2KB等。
  • Cell:Page中的最小操作擦写读单元,对应一个浮栅晶体管,可以存储1bit或多bit。

其中Page和Block是必有的,因为Page是读写的最小单元,Block是擦除的最小单元。而LUN和Plane则不是必有的(如没有,可认为LUN=1, Plane=1),一般在大容量Raw Nand(至少8Gb以上)上才会出现。

常见的Nand Flash内部只有一个chip(LUN)、每个chip只有1个plane,而有些复杂得,容量更大的Nand Flash,内部有多个chip,每个chip有多个plane。这类的Nand Flash,其实就是多了一个主控将多块Flash叠加在一起,如下图:

注:对于chip的概念,我理解就是上面的LUN,其实任何某个型号的Nand Flash,都可以称其是一个chip,但是实际上,这里我们所提到的,是针对内部来说的,也就是某型号的Nand Flash,内部有几个chip,比如:

  • 三星的2GB的K9WAG08U1A芯片(可以理解为外部芯片/型号)内部装了2个单片是1GB的K9K8G08U0A,此时就称K9WAG08U1A内部有2个chip;
  • 而有些单个的chip,内部又包含多个plane,比如上面的K9K8G08U0A内部包含4个单片是2Gb的Plane;

1.4 Raw Nand信号与封装

ONFI规定了Raw Nand信号线与封装,如下是典型的x8 Raw Nand内部结构图:

除了内存单元外,还有两大组成,分别是IO控制单元和逻辑控制单元,信号线主要挂在IO控制与逻辑单元,x8 Raw Nand主要有15根信号线(其中必须的是13根,CE¯¯¯¯¯¯¯¯CE¯和RB¯¯¯¯RB¯可以不用)。

引脚名称 描述
CLE 命令使能,当CLE为高电平时,WE¯¯¯¯¯¯¯¯¯WE¯ 上升沿锁存I/O输入到命令寄存器
ALE  地址使能,当ALE为高电平时,WE¯¯¯¯¯¯¯¯¯WE¯上升沿锁存I/O输入到地址寄存器
CE¯¯¯¯¯¯¯¯CE¯  片选信号,低电位有效
RE¯¯¯¯¯¯¯¯RE¯  读使能,低电位有效
WE¯¯¯¯¯¯¯¯¯WE¯  WE¯¯¯¯¯¯¯¯¯WE¯上升沿锁存I/O输入到命令、地址、数据寄存器
WP¯¯¯¯¯¯¯¯¯WP¯  写保护
RB¯¯¯¯RB¯  就绪/忙输出信号(低电平表示操作还在进行中,高电平表示操作完成)
VCC  电源
VSS 地 
NC 不接 
I/O0 ~ I/O7  数据输入输出(命令、地址、数据公用数据总线)

 

ONFI规定的封装标准有很多,比如TSOP48、LGA52、BGA63/100/132/152/272/316,其中对于嵌入式开发而言,最常用的是如下图扁平封装的TSOP-48,这种封装常用于容量较小的Raw Nand(1/2/4/8/16/32Gb),1-32Gb容量对于嵌入式设计而言差不多够用,且TSOP-48封装易于PCB设计,因此得以流行。

1.5 Raw Nand接口命令

ONFI 1.0规定了Raw Nand接口命令,如下表所示,其中一部分是必须要支持的(M),还有一部分是可选支持的(O)。必须支持的命令里最常用的是Read(Read Page)、Page Program、Block Erase、Read Status这三条,涵盖读写擦最基本的三种操作。

此外比较重要的还有:

  • Read Status,用于获取命令执行状态与结果。
  • Read Parameter Page:用于获取芯片内部存储的出厂信息(包括内存结构、特性、时序、其他行为参数等),其结构已由ONFI规定如下表,在设计Nand软件驱动时,可以通过获取这个Parameter Page来做到代码通用。

二、MTD设备驱动

MTD(Memory Technology Drivers)是用于访问memory设备( ROM 、 Flash)的Linux 的子系统, MTD 的主要目的是为了使新的memory设备的驱动更加简单,为此它在硬件和上层之间提供了一个抽象的接口。

2.1 MTD子系统概要

在介绍MTD之前,我们思考一个问题,linux内核为什么抽象出了MTD子系统呢?

我们回顾一下我们上一节块设备驱动编写的流程:

  • 调用register_blkdev注册块设备主设备号;
  • 使用alloc_disk申请一个通用磁盘对象gendisk;
  • 使用blk_mq_init_sq_queue初始化一个请求队列;
    • 设置成员参数major、first_minor、disk_name、fops;
    • 设置请求队列queue,等于之前初始化的请求队列;
  • 设置gendisk结构体的成员;
  • 使用add_disk注册gendisk;

针对于每一种型号的Flash设备,我们进行块设备驱动编写的时候,都要重复进行如上的操作。那我们就开始想了,各种型号的Flash设备有什么区别呢?以Nand Flash为例,主要就是内存模型(页大小、块大小、页数/块、OOB等)、以及时序参数略有差别,那我们是否可以将与Nand Flash紧密相关的部分抽离出来,由Nand Flash驱动层提供,而其他相同部分单独抽离出来。MTD子系统就是做了这样的事情。

2.2 MTD子系统框架

如上图所示,MTD程序框架通用可以分为四层,从上到下以此为设备节点、MTD设备层、MTD原始设备层,Flash驱动层。

  • 设备节点:通过mknod在/dev子目录下建立MTD块设备节点(主设备号为31)和MTD字符设备节点(主设备号为90),通过访问此设备节点即可访问MTD字符设备和块设备 。
  • MTD设备层:基于MTD原始设备,linux系统可以定义出MTD的块设备(主设备号31)和字符设备(设备号90)。其中:
    • mtdchar.c:MTD字符设备接口相关实现;
    • mtdblock.c:MTD块设备接口相关实现;这部分负责设备的建立、数据的读写、优化处理等。这跟传统的块设备驱动类型,块设备主设备号的申请,gendisk结构体的分配设置、队列的初始化等,这些都是由内核自动完成。
  • MTD原始设备层:用于描述MTD原始设备的数据结构是mtd_info,它定义了大量的关于MTD的数据和操作函数。其中:
    • mtdcore.c: MTD原始设备接口相关实现;
    • mtdpart.c :  MTD分区接口相关实现;
  • Flash驱动层:Flash驱动层负责对Flash硬件的读、写和擦除操作,Nand Flash和Nor Flash有不同的协议和硬件细节,这部分知道发什么,如发送什么命令可以识别、读写、擦除等操作,以及硬件该怎么发。Nand Flash有Nand的协议,Nor Flash有Nor的协议,不同协议有不同的函数,通过对应的结构体和函数构造对应的操作环境。用户只需要完成Flash驱动层的相关结构体的分配、设置、注册,并建立从具体设备到MTD原始设备映射关系。
    • Nand Flash芯片的驱动位于drivers/mtd/nand/子目录下,Nand Flash使用nand_chip结构体;
    • Nor Flash芯片驱动位于drivers/mtd/chips/子目录下,Nor Flash使用map_info结构体;

2.2.1 Flash驱动层

(1) Nor Flash驱动

linux内核实现了针对CFI、JEDEC等接口标准的通用Nor Flash驱动。在上述接口驱动基础上,芯片级驱动较简单 :定义具体内存映射结构体map_info,然后通过接口类型后调用do_map_probe。

以scb2_flash.c(位于drivers/mtd/maps/)为例:

  • 定义map_info结构体,初始化成员name、size、phys、bankwidth;
  • 通过ioremap映射成员virt(虚拟内存地址);
  • 通过函数simple_map_init初始化map_info成员函数read、write、copy_from、copy_to;
  • 通过do_map_probe进行CFI接口探测,返回mtd_info结构体;
  • 通过parse_mtd_partitions、add_mtd_partitions注册MTD原始设备;

(2) Nand Flash驱动

linux内核实现了通用Nand Flash驱动(drivers/mtd/nand/raw/nand_base.c),芯片级驱动需要实现nand_chip结构。

MTD使用nand_chip来表示一个Nand Flash芯片, 该结构体包含了关于Nand Flash的内存模型信息,读写方法,ECC模式,硬件控制等一系列底层机制。 

以s3c2410.c(位于drivers/mtd/nand/raw)为例:

  • 分配nand_chip内存;

  • 根据SOC Nand控制器初始化nand_chip成员,比如:chip->legacy(成员write_buf、read_buf、select_chip、cmd_ctrl、dev_ready、IO_ADDR_R、IO_ADDR_W)、chip->controller;

  • 设置chip->priv为mtd_info;
  • 以mtd_info为参数调用nand_scan()探测Nand Flash,nand_scan()会读取nand芯片ID:

    • 初始化chip->base.mtd(成员writesize、oobsize、erasesize等);
    • 初始化chip->base.memorg(成员bits_per_cell、pagesize、oobsize、pages_per_eraseblock、planes_per_lun、luns_per_target、ntatgets等);
    • 初始化chip->options、chip->base.eccreq;
    • 初始化chip->ecc各个成员(设置ecc模式及处理函数);
    • chip成员中所有未初始化函数指针则使用nand_base.c中的默认函数;
  • mtd_info和mtd_partition为参数调用mtd_device_register()进行MTD设备注册;

2.3 核心结构体

2.3.1 struct mtd_info

linux内核使用mtd_info结构体表示MTD原始设备,描述一个设备或一个多分区设备中的一个分区,这其中定义了大量关于MTD的数据和操作函数;所有mtd_info结构体都被存放在mtd_info数组mtd_table中。

mtd_info定义在include/linux/mtd/mtd.h:

复制代码
struct mtd_info {
        u_char type;     // MTD设备类型  包括MTD_NORFALSH、MTD_NANDFALSH等
        uint32_t flags;  // 标志  MTD_WRITEABLE、MTD_NO_ERASE等
        uint32_t orig_flags; /* Flags as before running mtd checks */
        uint64_t size;   // Total size of the MTD  MTD设备总容量

        /* "Major" erase size for the device. Naïve users may take this
         * to be the only erase size available, or may use the more detailed
         * information below if they desire
         */
        uint32_t erasesize;   // MTD设备擦除单位大小,对于Nand Flash来说就是Block的大小
        /* Minimal writable flash unit size. In case of NOR flash it is 1 (even
         * though individual bits can be cleared), in case of NAND flash it is
         * one NAND page (or half, or one-fourths of it), in case of ECC-ed NOR
         * it is of ECC block size, etc. It is illegal to have writesize = 0.
         * Any driver registering a struct mtd_info must ensure a writesize of
         * 1 or larger.
         */
        uint32_t writesize;  // 可写入数据最小字节数,对于Nor Flash是字节,对于Nand Flash为一页

        /*
         * Size of the write buffer used by the MTD. MTD devices having a write
         * buffer can write multiple writesize chunks at a time. E.g. while
         * writing 4 * writesize bytes to a device with 2 * writesize bytes
         * buffer the MTD driver can (but doesn't have to) do 2 writesize
         * operations, but not 4. Currently, all NANDs have writebufsize
         * equivalent to writesize (NAND page size). Some NOR flashes do have
         * writebufsize greater than writesize.
        uint32_t writebufsize;

        uint32_t oobsize;   // Amount of OOB data per block (e.g. 16)
        uint32_t oobavail;  // Available OOB bytes per block

        /*
         * If erasesize is a power of 2 then the shift is stored in
         * erasesize_shift otherwise erasesize_shift is zero. Ditto writesize.
         */
        unsigned int erasesize_shift;   // 擦除数据偏移值,根据erasesize计算
        unsigned int writesize_shift;    // 写入数据偏移值,根据writesize计算
        /* Masks based on erasesize_shift and writesize_shift */
        unsigned int erasesize_mask;     // 擦除数据大小掩码,根据erasesize_shift计算
        unsigned int writesize_mask;     // 写入数据大小掩码,根据writesize_shift计算

        /*
         * read ops return -EUCLEAN if max number of bitflips corrected on any
         * one region comprising an ecc step equals or exceeds this value.
         * Settable by driver, else defaults to ecc_strength.  User can override
         * in sysfs.  N.B. The meaning of the -EUCLEAN return code has changed;
         * see Documentation/ABI/testing/sysfs-class-mtd for more detail.
         */
        unsigned int bitflip_threshold;

        /* Kernel-only stuff starts here. */
        const char *name;  // MTD设备名称
        int index;         // 索引值  

        /* OOB layout description */
        const struct mtd_ooblayout_ops *ooblayout;  // oob布局描述

        /* NAND pairing scheme, only provided for MLC/TLC NANDs */
        const struct mtd_pairing_scheme *pairing;

        /* the ecc step size. */
        unsigned int ecc_step_size;

        /* max number of correctible bit errors per ecc step */
        unsigned int ecc_strength;

        /* Data for variable erase regions. If numeraseregions is zero,
         * it means that the whole device has erasesize as given above.
         */
        int numeraseregions;  // 可变擦除区域的数目,通常为1
        struct mtd_erase_region_info *eraseregions;  // 可变擦除区域
        /*
         * Do not call via these pointers, use corresponding mtd_*()
         * wrappers instead.
         */
        int (*_erase) (struct mtd_info *mtd, struct erase_info *instr);  // 擦除
        int (*_point) (struct mtd_info *mtd, loff_t from, size_t len,
                       size_t *retlen, void **virt, resource_size_t *phys);
        int (*_unpoint) (struct mtd_info *mtd, loff_t from, size_t len);
        int (*_read) (struct mtd_info *mtd, loff_t from, size_t len,  // 读取
                      size_t *retlen, u_char *buf);
        int (*_write) (struct mtd_info *mtd, loff_t to, size_t len,    // 写入
                       size_t *retlen, const u_char *buf);
        int (*_panic_write) (struct mtd_info *mtd, loff_t to, size_t len,
                             size_t *retlen, const u_char *buf);
        int (*_read_oob) (struct mtd_info *mtd, loff_t from,
                          struct mtd_oob_ops *ops);
        int (*_write_oob) (struct mtd_info *mtd, loff_t to,
                           struct mtd_oob_ops *ops);
        int (*_get_fact_prot_info) (struct mtd_info *mtd, size_t len,
                                    size_t *retlen, struct otp_info *buf);
        int (*_read_fact_prot_reg) (struct mtd_info *mtd, loff_t from,
                                    size_t len, size_t *retlen, u_char *buf);
        int (*_get_user_prot_info) (struct mtd_info *mtd, size_t len,
                                    size_t *retlen, struct otp_info *buf);
        int (*_read_user_prot_reg) (struct mtd_info *mtd, loff_t from,
                                    size_t len, size_t *retlen, u_char *buf);
        int (*_write_user_prot_reg) (struct mtd_info *mtd, loff_t to,
                                     size_t len, size_t *retlen, u_char *buf);
        int (*_lock_user_prot_reg) (struct mtd_info *mtd, loff_t from,
                                    size_t len);
        int (*_writev) (struct mtd_info *mtd, const struct kvec *vecs,
                        unsigned long count, loff_t to, size_t *retlen);
        void (*_sync) (struct mtd_info *mtd);
        int (*_lock) (struct mtd_info *mtd, loff_t ofs, uint64_t len);
        int (*_unlock) (struct mtd_info *mtd, loff_t ofs, uint64_t len);
        int (*_is_locked) (struct mtd_info *mtd, loff_t ofs, uint64_t len);
        int (*_block_isreserved) (struct mtd_info *mtd, loff_t ofs);
        int (*_block_isbad) (struct mtd_info *mtd, loff_t ofs);  
        int (*_block_markbad) (struct mtd_info *mtd, loff_t ofs);
        int (*_max_bad_blocks) (struct mtd_info *mtd, loff_t ofs, size_t len);
        int (*_suspend) (struct mtd_info *mtd);
        void (*_resume) (struct mtd_info *mtd);
        void (*_reboot) (struct mtd_info *mtd);
        /*
         * If the driver is something smart, like UBI, it may need to maintain
         * its own reference counting. The below functions are only for driver.
         */
        int (*_get_device) (struct mtd_info *mtd);
        void (*_put_device) (struct mtd_info *mtd);

        struct notifier_block reboot_notifier;  /* default mode before reboot */

        /* ECC status information */
        struct mtd_ecc_stats ecc_stats;
        /* Subpage shift (NAND) */
        int subpage_sft;

        void *priv;

        struct module *owner;
        struct device dev;
        int usecount;
        struct mtd_debug_info dbg;
        struct nvmem_device *nvmem;
};
复制代码

mtd_info结构体中的read()、write()、read_oob()、write_oob()、erase()是MTD设备驱动要实现的主要函数,这是MTD原始设备与Flash驱动层之间的接口;linux已经已经帮我们实现了一套适合大部分Flash设备的mtd_info成员函数。

2.3.2  mtd_part

在MTD中使用mtd_part来表示分区,其中包含了mtd_info,每一个分区都是被看做一个MTD原始设备,在mtd_table中,mtd_part.mtd_info中的大部分数据都从该分区的主分区mtd_part->master中获得。master不作为一个MTD原始设备加入mtd_table中。

mtd_part定义在drivers/mtd/mtdpart.c:

复制代码
/**
 * struct mtd_part - our partition node structure
 *
 * @mtd: struct holding partition details
 * @parent: parent mtd - flash device or another partition
 * @offset: partition offset relative to the *flash device*
 */
struct mtd_part {
        struct mtd_info mtd;     // 分区信息
        struct mtd_info *parent; // 分区的主分区
        uint64_t offset;         // 分区的偏移地址
        struct list_head list;   // 双向链表,将mtd_part链接成一个链表
};
复制代码

2.3.3 struct mtd_partition

在MTD中用mtd_partition来表示分区的信息,mtd_partition定义在include/linux/mtd/partitions.h:

复制代码
/*
 * Partition definition structure:
 *
 * An array of struct partition is passed along with a MTD object to
 * mtd_device_register() to create them.
 *
 * For each partition, these fields are available:
 * name: string that will be used to label the partition's MTD device.
 * types: some partitions can be containers using specific format to describe
 *      embedded subpartitions / volumes. E.g. many home routers use "firmware"
 *      partition that contains at least kernel and rootfs. In such case an
 *      extra parser is needed that will detect these dynamic partitions and
 *      report them to the MTD subsystem. If set this property stores an array
 *      of parser names to use when looking for subpartitions.
 * size: the partition size; if defined as MTDPART_SIZ_FULL, the partition
 *      will extend to the end of the master MTD device.
 * offset: absolute starting position within the master MTD device; if
 *      defined as MTDPART_OFS_APPEND, the partition will start where the
 *      previous one ended; if MTDPART_OFS_NXTBLK, at the next erase block;
 *      if MTDPART_OFS_RETAIN, consume as much as possible, leaving size
 *      after the end of partition.
 * mask_flags: contains flags that have to be masked (removed) from the
 *      master MTD flag set for the corresponding MTD partition.
 *      For example, to force a read-only partition, simply adding
 *      MTD_WRITEABLE to the mask_flags will do the trick.
 *
 * Note: writeable partitions require their size and offset be
 * erasesize aligned (e.g. use MTDPART_OFS_NEXTBLK).
 */

struct mtd_partition {
        const char *name;               /* identifier string  分区名 */
        const char *const *types;       /* names of parsers to use if any */
        uint64_t size;                  /* partition size  分区大小 */
        uint64_t offset;                /* offset within the master MTD space  分区的偏移值  */
        uint32_t mask_flags;            /* master MTD flags to mask out for this partition 标志掩码 */
        struct device_node *of_node; 
};
复制代码

2.3.4   struct nand_chip

nand_chip是一个比较重要的数据结构,MTD使用nand_chip来表示一个Nand Flash内部的芯片,该结构体包含了关于Nand Flash的内存模型信息,读写方法,ECC模式,硬件控制等一系列底层机制。其定义在include/linux/mtd/rawnand.h:

复制代码
/**
 * struct nand_chip - NAND Private Flash Chip Data
 * @base:               Inherit from the generic NAND device
 * @legacy:             All legacy fields/hooks. If you develop a new driver,
 *                      don't even try to use any of these fields/hooks, and if
 *                      you're modifying an existing driver that is using those
 *                      fields/hooks, you should consider reworking the driver
 *                      avoid using them.
 * @setup_read_retry:   [FLASHSPECIFIC] flash (vendor) specific function for
 *                      setting the read-retry mode. Mostly needed for MLC NAND.
 * @ecc:                [BOARDSPECIFIC] ECC control structure
 * @buf_align:          minimum buffer alignment required by a platform
 * @oob_poi:            "poison value buffer," used for laying out OOB data
 *                      before writing
 * @page_shift:         [INTERN] number of address bits in a page (column
 *                      address bits).
 * @phys_erase_shift:   [INTERN] number of address bits in a physical eraseblock
 * @bbt_erase_shift:    [INTERN] number of address bits in a bbt entry
 * @chip_shift:         [INTERN] number of address bits in one chip
 * @options:            [BOARDSPECIFIC] various chip options. They can partly
 *                      be set to inform nand_scan about special functionality.
 *                      See the defines for further explanation.
 * @bbt_options:        [INTERN] bad block specific options. All options used
 *                      here must come from bbm.h. By default, these options
 *                      will be copied to the appropriate nand_bbt_descr's.
 * @badblockpos:        [INTERN] position of the bad block marker in the oob
 *                      area.
 * @badblockbits:       [INTERN] minimum number of set bits in a good block's
 *                      bad block marker position; i.e., BBM == 11110111b is
 *                      not bad when badblockbits == 7
 * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is
 *                            set to the actually used ONFI mode if the chip is
 *                            ONFI compliant or deduced from the datasheet if
 *                            the NAND chip is not ONFI compliant.
 * @pagemask:           [INTERN] page number mask = number of (pages / chip) - 1
 * @data_buf:           [INTERN] buffer for data, size is (page size + oobsize).
 * @pagecache:          Structure containing page cache related fields
 * @pagecache.bitflips: Number of bitflips of the cached page
 * @pagecache.page:     Page number currently in the cache. -1 means no page is
 *                      currently cached
 * @subpagesize:        [INTERN] holds the subpagesize
 * @id:                 [INTERN] holds NAND ID
 * @parameters:         [INTERN] holds generic parameters under an easily
 *                      readable form.
 * @data_interface:     [INTERN] NAND interface timing information
 * @cur_cs:             currently selected target. -1 means no target selected,
 *                      otherwise we should always have cur_cs >= 0 &&
 *                      cur_cs < nanddev_ntargets(). NAND Controller drivers
 *                      should not modify this value, but they're allowed to
 *                      read it.
 * @read_retries:       [INTERN] the number of read retry modes supported
 * @lock:               lock protecting the suspended field. Also used to
 *                      serialize accesses to the NAND device.
 * @suspended:          set to 1 when the device is suspended, 0 when it's not.
 * @bbt:                [INTERN] bad block table pointer
 * @bbt_td:             [REPLACEABLE] bad block table descriptor for flash
 *                      lookup.
 * @bbt_md:             [REPLACEABLE] bad block table mirror descriptor
 * @badblock_pattern:   [REPLACEABLE] bad block scan pattern used for initial
 *                      bad block scan.
 * @controller:         [REPLACEABLE] a pointer to a hardware controller
 *                      structure which is shared among multiple independent
 *                      devices.
 * @priv:               [OPTIONAL] pointer to private chip data
 * @manufacturer:       [INTERN] Contains manufacturer information
 * @manufacturer.desc:  [INTERN] Contains manufacturer's description
 * @manufacturer.priv:  [INTERN] Contains manufacturer private information
 */
struct nand_chip {
        struct nand_device base;    // 可以看作mtd_info子类

        struct nand_legacy legacy;  // 硬件操作函数

        int (*setup_read_retry)(struct nand_chip *chip, int retry_mode);

        unsigned int options;  // 与具体的nand芯片相关的一些选项,如NAND_BUSWIDTH_16等
        unsigned int bbt_options;

        int page_shift;       // 用来表示nand芯片的page大小,如某nand芯片的一个page有512个字节,那么该值就是9
        int phys_erase_shift; // 用来表示nand芯片每次可擦除的大小,如某nand芯片每次可擦除16kb(通常为一个block大小),那么该值就是14
        int bbt_erase_shift;  // 用来表示bad block table的大小,通常bbt占用一个block,所以该值通常和phys_erase_shift相同
        int chip_shift;       // 使用位表示nand芯片的容量
        int pagemask;         // nand总容量/每页字节数 - 1    得到页掩码
        u8 *data_buf;

        struct {
                unsigned int bitflips;
                int page;
        } pagecache;

        int subpagesize;
        int onfi_timing_mode_default;
        unsigned int badblockpos;
        int badblockbits;

        struct nand_id id;  // 保存从nand读取到的设备id信息,包含厂家ID、设备ID等
        struct nand_parameters parameters;

        struct nand_data_interface data_interface;

        int cur_cs;       // 当前选中的目标

        int read_retries;

        struct mutex lock;
        unsigned int suspended : 1;

        uint8_t *oob_poi;
        struct nand_controller *controller; // nand controller

        struct nand_ecc_ctrl ecc; // ecc校验结构体,里面有大量函数进行ecc校验
        unsigned long buf_align;

        uint8_t *bbt;
        struct nand_bbt_descr *bbt_td;
        struct nand_bbt_descr *bbt_md;

        struct nand_bbt_descr *badblock_pattern;

        void *priv;

        struct {
                const struct nand_manufacturer *desc;
                void *priv;
        } manufacturer;   // 厂家ID信息
};
复制代码

nand_chip中的ecc主要做一些与ecc有关的操作,如read_page_raw、write_pager_raw,里面含有大量函数进行ecc校验。

nand_chip中的legacy中读写函数,如read_buf、cmdfunc等,与具体的Nand Controller相关,这部分函数与硬件交互,通常需要我们自己根据SOC Nand Controller来实现。

2.3.5 struct nand_legacy

nand_legacy该结构体就是保存与SOC Nand  Controller硬件相关的函数:

复制代码
/**
 * struct nand_legacy - NAND chip legacy fields/hooks
 * @IO_ADDR_R: address to read the 8 I/O lines of the flash device
 * @IO_ADDR_W: address to write the 8 I/O lines of the flash device
 * @select_chip: select/deselect a specific target/die
 * @read_byte: read one byte from the chip
 * @write_byte: write a single byte to the chip on the low 8 I/O lines
 * @write_buf: write data from the buffer to the chip
 * @read_buf: read data from the chip into the buffer
 * @cmd_ctrl: hardware specific function for controlling ALE/CLE/nCE. Also used
 *            to write command and address
 * @cmdfunc: hardware specific function for writing commands to the chip.
 * @dev_ready: hardware specific function for accessing device ready/busy line.
 *             If set to NULL no access to ready/busy is available and the
 *             ready/busy information is read from the chip status register.
 * @waitfunc: hardware specific function for wait on ready.
 * @block_bad: check if a block is bad, using OOB markers
 * @block_markbad: mark a block bad
 * @set_features: set the NAND chip features
 * @get_features: get the NAND chip features
 * @chip_delay: chip dependent delay for transferring data from array to read
 *              regs (tR).
 * @dummy_controller: dummy controller implementation for drivers that can
 *                    only control a single chip
 *
 * If you look at this structure you're already wrong. These fields/hooks are
 * all deprecated.
 */
struct nand_legacy {
        void __iomem *IO_ADDR_R;                              // 读8根I/O线地址  比如S3C2440设置为数据寄存器地址 NFDATA
        void __iomem *IO_ADDR_W;                              // 写8根I/O线地址  比如S3C2440设置为数据寄存器地址 NFDATA
        void (*select_chip)(struct nand_chip *chip, int cs);  // 片选/取消片选
        u8 (*read_byte)(struct nand_chip *chip);              // 读取一个字节数据
        void (*write_byte)(struct nand_chip *chip, u8 byte);   // 写入一个字节数据
        void (*write_buf)(struct nand_chip *chip, const u8 *buf, int len);  // 写入len个长度字节
        void (*read_buf)(struct nand_chip *chip, u8 *buf, int len);         // 读取len个长度字节
        void (*cmd_ctrl)(struct nand_chip *chip, int dat, unsigned int ctrl);  // 硬件相关控制函数   写命令/地址
        void (*cmdfunc)(struct nand_chip *chip, unsigned command, int column,  // 发送写数据命令 传入列地址、页地址
                        int page_addr);
        int (*dev_ready)(struct nand_chip *chip); // 获取nand状态 繁忙/就绪  
        int (*waitfunc)(struct nand_chip *chip);  // 等待nand就绪
        int (*block_bad)(struct nand_chip *chip, loff_t ofs);      // 检测是否有坏块
        int (*block_markbad)(struct nand_chip *chip, loff_t ofs);  // 标记坏块
        int (*set_features)(struct nand_chip *chip, int feature_addr,
                            u8 *subfeature_para);
        int (*get_features)(struct nand_chip *chip, int feature_addr,
                            u8 *subfeature_para);
        int chip_delay;           // 延迟时间
        struct nand_controller dummy_controller;
};
复制代码

2.3.6 struct  nand_ecc_ctrl

nand_ecc_ctrl中的读写函数read_page_raw、write_pager_raw等主要是用来做一些与ecc有关的操作:

复制代码
/**
 * struct nand_ecc_ctrl - Control structure for ECC
 * @mode:       ECC mode
 * @algo:       ECC algorithm
 * @steps:      number of ECC steps per page
 * @size:       data bytes per ECC step
 * @bytes:      ECC bytes per step
 * @strength:   max number of correctible bits per ECC step
 * @total:      total number of ECC bytes per page
 * @prepad:     padding information for syndrome based ECC generators
 * @postpad:    padding information for syndrome based ECC generators
 * @options:    ECC specific options (see NAND_ECC_XXX flags defined above)
 * @priv:       pointer to private ECC control data
 * @calc_buf:   buffer for calculated ECC, size is oobsize.
 * @code_buf:   buffer for ECC read from flash, size is oobsize.
 * @hwctl:      function to control hardware ECC generator. Must only
 *              be provided if an hardware ECC is available
 * @calculate:  function for ECC calculation or readback from ECC hardware
 * @correct:    function for ECC correction, matching to ECC generator (sw/hw).
 *              Should return a positive number representing the number of
 *              corrected bitflips, -EBADMSG if the number of bitflips exceed
 *              ECC strength, or any other error code if the error is not
 *              directly related to correction.
 *              If -EBADMSG is returned the input buffers should be left
 *              untouched.
 * @read_page_raw:      function to read a raw page without ECC. This function
 *                      should hide the specific layout used by the ECC
 *                      controller and always return contiguous in-band and
 *                      out-of-band data even if they're not stored
 *                      contiguously on the NAND chip (e.g.
 *                      NAND_ECC_HW_SYNDROME interleaves in-band and
 *                      out-of-band data).
 * @write_page_raw:     function to write a raw page without ECC. This function
 *                      should hide the specific layout used by the ECC
 *                      controller and consider the passed data as contiguous
 *                      in-band and out-of-band data. ECC controller is
 *                      responsible for doing the appropriate transformations
 *                      to adapt to its specific layout (e.g.
 *                      NAND_ECC_HW_SYNDROME interleaves in-band and
 *                      out-of-band data).
 * @read_page:  function to read a page according to the ECC generator
 *              requirements; returns maximum number of bitflips corrected in
 *              any single ECC step, -EIO hw error
 * @read_subpage:       function to read parts of the page covered by ECC;
 *                      returns same as read_page()
 * @write_subpage:      function to write parts of the page covered by ECC.
 * @write_page: function to write a page according to the ECC generator
 *              requirements.
 * @write_oob_raw:      function to write chip OOB data without ECC
 * @read_oob_raw:       function to read chip OOB data without ECC
 * @read_oob:   function to read chip OOB data
 * @write_oob:  function to write chip OOB data
 */
struct nand_ecc_ctrl {
        nand_ecc_modes_t mode;
        enum nand_ecc_algo algo;
        int steps;
        int size;
        int bytes;
        int total;
        int strength;
        int prepad;
        int postpad;
        unsigned int options;
        void *priv;
        u8 *calc_buf;
        u8 *code_buf;
        void (*hwctl)(struct nand_chip *chip, int mode);
        int (*calculate)(struct nand_chip *chip, const uint8_t *dat,
                         uint8_t *ecc_code);
        int (*correct)(struct nand_chip *chip, uint8_t *dat, uint8_t *read_ecc,
                       uint8_t *calc_ecc);
        int (*read_page_raw)(struct nand_chip *chip, uint8_t *buf,
                             int oob_required, int page);
        int (*write_page_raw)(struct nand_chip *chip, const uint8_t *buf,
                              int oob_required, int page);
        int (*read_page)(struct nand_chip *chip, uint8_t *buf,
                         int oob_required, int page);
        int (*read_subpage)(struct nand_chip *chip, uint32_t offs,
                            uint32_t len, uint8_t *buf, int page);
        int (*write_subpage)(struct nand_chip *chip, uint32_t offset,
                             uint32_t data_len, const uint8_t *data_buf,
                             int oob_required, int page);
        int (*write_page)(struct nand_chip *chip, const uint8_t *buf,
                          int oob_required, int page);
        int (*write_oob_raw)(struct nand_chip *chip, int page);
        int (*read_oob_raw)(struct nand_chip *chip, int page);
        int (*read_oob)(struct nand_chip *chip, int page);
        int (*write_oob)(struct nand_chip *chip, int page);
};
复制代码

2.3.7 struct  nand_manufacturer

nand_manufacturer保存生产厂家信息,定义在drivers/mtd/nand/raw/internals.h:

复制代码
/*
 * NAND Flash Manufacturer ID Codes
 */
#define NAND_MFR_AMD            0x01
#define NAND_MFR_ATO            0x9b
#define NAND_MFR_EON            0x92
#define NAND_MFR_ESMT           0xc8
#define NAND_MFR_FUJITSU        0x04
#define NAND_MFR_HYNIX          0xad
#define NAND_MFR_INTEL          0x89
#define NAND_MFR_MACRONIX       0xc2
#define NAND_MFR_MICRON         0x2c
#define NAND_MFR_NATIONAL       0x8f
#define NAND_MFR_RENESAS        0x07
#define NAND_MFR_SAMSUNG        0xec   // 三星厂家
#define NAND_MFR_SANDISK        0x45
#define NAND_MFR_STMICRO        0x20
#define NAND_MFR_TOSHIBA        0x98
#define NAND_MFR_WINBOND        0xef

/**
 * struct nand_manufacturer_ops - NAND Manufacturer operations
 * @detect: detect the NAND memory organization and capabilities
 * @init: initialize all vendor specific fields (like the ->read_retry()
 *        implementation) if any.
 * @cleanup: the ->init() function may have allocated resources, ->cleanup()
 *           is here to let vendor specific code release those resources.
 * @fixup_onfi_param_page: apply vendor specific fixups to the ONFI parameter
 *                         page. This is called after the checksum is verified.
 */
struct nand_manufacturer_ops {
        void (*detect)(struct nand_chip *chip);
        int (*init)(struct nand_chip *chip);
        void (*cleanup)(struct nand_chip *chip);
        void (*fixup_onfi_param_page)(struct nand_chip *chip,
                                      struct nand_onfi_params *p);
};

/**
 * struct nand_manufacturer - NAND Flash Manufacturer structure
 * @name: Manufacturer name
 * @id: manufacturer ID code of device.
 * @ops: manufacturer operations
 */
struct nand_manufacturer {
        int id;   // 厂家ID
        char *name;  // 厂家名字
        const struct nand_manufacturer_ops *ops; // 操作函数
};
复制代码

2.3.8 struct nand_device

struct nand_device定义在include/linux/mtd/nand.h:

复制代码
/**
 * struct nand_device - NAND device
 * @mtd: MTD instance attached to the NAND device
 * @memorg: memory layout
 * @eccreq: ECC requirements
 * @rowconv: position to row address converter
 * @bbt: bad block table info
 * @ops: NAND operations attached to the NAND device
 *
 * Generic NAND object. Specialized NAND layers (raw NAND, SPI NAND, OneNAND)
 * should declare their own NAND object embedding a nand_device struct (that's
 * how inheritance is done).
 * struct_nand_device->memorg and struct_nand_device->eccreq should be filled
 * at device detection time to reflect the NAND device
 * capabilities/requirements. Once this is done nanddev_init() can be called.
 * It will take care of converting NAND information into MTD ones, which means
 * the specialized NAND layers should never manually tweak
 * struct_nand_device->mtd except for the ->_read/write() hooks.
 */
struct nand_device {
        struct mtd_info mtd;
        struct nand_memory_organization memorg;
        struct nand_ecc_req eccreq;
        struct nand_row_converter rowconv;
        struct nand_bbt bbt;
        const struct nand_ops *ops;
};
复制代码

2.3.9 结构体关系图

2.4 核心函数

如果MTD设备只有一个分区,那么使用下面两个函数注册和注销MTD设备:

int add_mtd_device(struct mtd_info *mtd)  
int del_mtd_device (struct mtd_info *mtd)  

如果MTD设备存在其他分区,那么使用下面两个函数注册和注销MTD设备:

int add_mtd_partitions(struct mtd_info *master,const struct mtd_partition *parts,int nbparts)  
int del_mtd_partitions(struct mtd_info *master)  

三、MTD设备注册

3.1 add_mtd_device

add_mtd_device定义在drivers/mtd/mtdcore.c:

复制代码
/**
 *      add_mtd_device - register an MTD device
 *      @mtd: pointer to new MTD device info structure
 *
 *      Add a device to the list of MTD devices present in the system, and
 *      notify each currently active MTD 'user' of its arrival. Returns
 *      zero on success or non-zero on failure.
 */

int add_mtd_device(struct mtd_info *mtd)
{
        struct mtd_notifier *not;
        int i, error;

        /*
         * May occur, for instance, on buggy drivers which call
         * mtd_device_parse_register() multiple times on the same master MTD,
         * especially with CONFIG_MTD_PARTITIONED_MASTER=y.
         */
        if (WARN_ONCE(mtd->dev.type, "MTD already registered\n"))
                return -EEXIST;

        BUG_ON(mtd->writesize == 0);

        /*
         * MTD drivers should implement ->_{write,read}() or
         * ->_{write,read}_oob(), but not both.
         */
        if (WARN_ON((mtd->_write && mtd->_write_oob) ||  // 校验函数指针
                    (mtd->_read && mtd->_read_oob)))
                return -EINVAL;

        if (WARN_ON((!mtd->erasesize || !mtd->_erase) &&
                    !(mtd->flags & MTD_NO_ERASE)))
                return -EINVAL;

        mutex_lock(&mtd_table_mutex);  // 互斥锁

        i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 为mtd设备分配index
        if (i < 0) {
                error = i;
                goto fail_locked;
        }

        mtd->index = i;
        mtd->usecount = 0;

        /* default value if not set by driver */
        if (mtd->bitflip_threshold == 0)    // 计算擦除数据偏移
                mtd->bitflip_threshold = mtd->ecc_strength;
        if (is_power_of_2(mtd->erasesize))
                mtd->erasesize_shift = ffs(mtd->erasesize) - 1;
        else
                mtd->erasesize_shift = 0;

        if (is_power_of_2(mtd->writesize))    // 计算写入数据偏移值
                mtd->writesize_shift = ffs(mtd->writesize) - 1;
        else
                mtd->writesize_shift = 0;

        mtd->erasesize_mask = (1 << mtd->erasesize_shift) - 1;  // 计算擦除数据大小掩码
        mtd->writesize_mask = (1 << mtd->writesize_shift) - 1;  // 计算写入数据大小掩码

        /* Some chips always power up locked. Unlock them now */
        if ((mtd->flags & MTD_WRITEABLE) && (mtd->flags & MTD_POWERUP_LOCK)) { // 有些芯片总是通电锁定,立即解锁(一般flash芯片都支持lock机制,在驱动上很少使用)
                error = mtd_unlock(mtd, 0, mtd->size);
                if (error && error != -EOPNOTSUPP)
                        printk(KERN_WARNING
                               "%s: unlock failed, writes may not work\n",
                               mtd->name);
                /* Ignore unlock failures? */
                error = 0;
        }

        /* Caller should have set dev.parent to match the
         * physical device, if appropriate.
         */
        mtd->dev.type = &mtd_devtype;  // 设置设备类型
        mtd->dev.class = &mtd_class;   // 设置设备类 会在/syc/class创建mtd类
        mtd->dev.devt = MTD_DEVT(i);   // 设置设备号,关于设备号的申请是在mtdchar.c模块入口函数中完成的 
        dev_set_name(&mtd->dev, "mtd%d", i);  // 设置设备节点名字mtd%d
        dev_set_drvdata(&mtd->dev, mtd);      // mtd->dev.driver_data = mtd;
        of_node_get(mtd_get_of_node(mtd));
        error = device_register(&mtd->dev);   // 注册MTD字符设备,会在/sys/class/mtd类下创建mtd%d文件,然后mdev通过这个自动创建/dev/mtd%d这个字符设备节点
        if (error)
                goto fail_added;

        /* Add the nvmem provider */
        error = mtd_nvmem_add(mtd);
        if (error)
                goto fail_nvmem_add;

        if (!IS_ERR_OR_NULL(dfs_dir_mtd)) {
                mtd->dbg.dfs_dir = debugfs_create_dir(dev_name(&mtd->dev), dfs_dir_mtd);
                if (IS_ERR_OR_NULL(mtd->dbg.dfs_dir)) {
                        pr_debug("mtd device %s won't show data in debugfs\n",
                                 dev_name(&mtd->dev));
                }
        }

        device_create(&mtd_class, mtd->dev.parent, MTD_DEVT(i) + 1, NULL,   // 创建MTD字符设备,内部调用了device_register 在/sys/class/mtd下创建mtd%dro设备,然后mdev通过这个自动创建/dev/mtd%dro这个字符设备节点
                      "mtd%dro", i);

        pr_debug("mtd: Giving out device %d to %s\n", i, mtd->name);
        /* No need to get a refcount on the module containing
           the notifier, since we hold the mtd_table_mutex */
        list_for_each_entry(not, &mtd_notifiers, list)  // 调用mtd子系统的notify机制,实现针对mtd设备添加、移除,移除notify机制,实现注册的notify hook
                not->add(mtd);

        mutex_unlock(&mtd_table_mutex);                 // 解锁
        /* We _know_ we aren't being removed, because
           our caller is still holding us here. So none
           of this try_ nonsense, and no bitching about it
           either. :) */
        __module_get(THIS_MODULE);
        return 0;

fail_nvmem_add:
        device_unregister(&mtd->dev);
fail_added:
        of_node_put(mtd_get_of_node(mtd));
        idr_remove(&mtd_idr, i);
fail_locked:
        mutex_unlock(&mtd_table_mutex);
        return error;
}
复制代码

该函数主要进行了以下操作:

(1) 对mtd原始设备必要字段以及函数指针进行校验;

(2) 在mtd_idr树中为该mtd原始设备分配节点,并返回分配的节点ID:

 i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 分配ID mtd_idr是一个redix树、将mtd与新分配的ID关联

idr_alloc函数用于为mtd_idr树新增一个节点,该节点在mtd_idr树中有唯一的ID,并且将这个节点与mtd关联。通过ID就可以定位到mtd。

此外该函数第三个参数和第四个参数含义如下:为ID的起始范围,结束范围设置为0,表示mtd_idr树允许的最大ID。

全局变量mtd_idr定义在drivers/mtd/mtdcore.c:

static DEFINE_IDR(mtd_idr);

关于IDR的定义这里就不介绍了,IDR主要实现ID与数据结构的绑定具体可以参考linux内核IDR机制详解(一)

后续字符设备及块设备注册需要该ID,比如后面设置mtd设备对应的device类型变量设备号为MTD_DEVT(i);

#define  MTD_DEVT(index)  MKDEV(MTD_CHAR_MAJOR, (index)*2)

主设备号为MTD_CHAR_MAJOR,即90,次设备号为index*2;

(3) 设备mtd原始设备的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息;

(4) 针对设置可写属性,且上电时对Flash进行lock的芯片,则调用unlock接口,进行解锁(一般Flasg芯片都支持lock机制,但在驱动上很少使用);

(5) 设置mtd原始设备对应的device类型变量所属的class为mtd_class,并设置其设备号,类型、名称、driver_data;

mtd_class定义为:

static struct class mtd_class = {
        .name = "mtd",
        .owner = THIS_MODULE,
        .pm = MTD_CLS_PM_OPS,
};

(6) 调用device_register完成名字为mtd%d MTD字符设备的注册;

(7)调用device_create完成名字为mtd%dro MTD字符设备的创建、初始化以及注册;

(8) 调用mtd子系统的notify机制,实现针对mtd设备添加、移除,移除notify机制,实现注册的notify hook;

list_for_each_entry(not, &mtd_notifiers, list) 
      not->add(mtd);   

list_for_each_entry函数包含三个参数,以此为pos、head、member;它实际上是一个for循环,利用传入的pos作为循环变量,从链表头head开始,逐项向后(next方向)移动pos,直至又回到head。

链表mtd_notifiers定义为:

static LIST_HEAD(mtd_notifiers);

这里实际上就是遍历这个链表得到当前时刻的元素not,类型为mtd_notifiers,然后调用not->add(mtd)方法,在这个方法里会进行名字为mtdblock%d MTD块设备的注册。

3.2 add_mtd_partitions

add_mtd_partitions定义在drivers/mtd/mtdpart.c:

复制代码
/*
 * This function, given a master MTD object and a partition table, creates
 * and registers slave MTD objects which are bound to the master according to
 * the partition definitions.
 *
 * For historical reasons, this function's caller only registers the master
 * if the MTD_PARTITIONED_MASTER config option is set.
 */

int add_mtd_partitions(struct mtd_info *master,  // MTD设备信息
                       const struct mtd_partition *parts,  // 分区表
                       int nbparts) // 分区个数
{
        struct mtd_part *slave;
        uint64_t cur_offset = 0;
        int i, ret;

        printk(KERN_NOTICE "Creating %d MTD partitions on \"%s\":\n", nbparts, master->name);

        for (i = 0; i < nbparts; i++) {   // 遍历分区表
                slave = allocate_partition(master, parts + i, i, cur_offset);   // 分配mtd_part
                if (IS_ERR(slave)) {
                        ret = PTR_ERR(slave);
                        goto err_del_partitions;
                }

                mutex_lock(&mtd_partitions_mutex);
                list_add(&slave->list, &mtd_partitions);  // slave添加到链表mtd_partitions
                mutex_unlock(&mtd_partitions_mutex);

                ret = add_mtd_device(&slave->mtd);  // 为每个分区注册mtd设备,会在/dev下成成mtdblock%d文件块设备文件
                if (ret) {
                        mutex_lock(&mtd_partitions_mutex);
                        list_del(&slave->list);
                        mutex_unlock(&mtd_partitions_mutex);

                        free_partition(slave);
                        goto err_del_partitions;
                }

                mtd_add_partition_attrs(slave);
                /* Look for subpartitions */
                parse_mtd_partitions(&slave->mtd, parts[i].types, NULL);

                cur_offset = slave->offset + slave->mtd.size;
        }

        return 0;

err_del_partitions:
        del_mtd_partitions(master);

        return ret;
}
复制代码

3.2.1 allocate_partition

allocate_partition定义在drivers/mtd/mtdpart.c:

复制代码
static struct mtd_part *allocate_partition(struct mtd_info *parent,
                        const struct mtd_partition *part, int partno,
                        uint64_t cur_offset)
{
        int wr_alignment = (parent->flags & MTD_NO_ERASE) ? parent->writesize :
                                                            parent->erasesize;
        struct mtd_part *slave;
        u32 remainder;
        char *name;
        u64 tmp;

        /* allocate the partition structure */
        slave = kzalloc(sizeof(*slave), GFP_KERNEL);
        name = kstrdup(part->name, GFP_KERNEL);
        if (!name || !slave) {
                printk(KERN_ERR"memory allocation error while creating partitions for \"%s\"\n",
                       parent->name);
                kfree(name);
                kfree(slave);
                return ERR_PTR(-ENOMEM);
        }

        /* set up the MTD object for this partition */
        slave->mtd.type = parent->type;
        slave->mtd.flags = parent->orig_flags & ~part->mask_flags;
        slave->mtd.orig_flags = slave->mtd.flags;
        slave->mtd.size = part->size;
        slave->mtd.writesize = parent->writesize;
        slave->mtd.writebufsize = parent->writebufsize;
        slave->mtd.oobsize = parent->oobsize;
        slave->mtd.oobavail = parent->oobavail;
        slave->mtd.subpage_sft = parent->subpage_sft;
        slave->mtd.pairing = parent->pairing;

        slave->mtd.name = name;
        slave->mtd.owner = parent->owner;

        /* NOTE: Historically, we didn't arrange MTDs as a tree out of
         * concern for showing the same data in multiple partitions.
         * However, it is very useful to have the master node present,
         * so the MTD_PARTITIONED_MASTER option allows that. The master
         * will have device nodes etc only if this is set, so make the
         * parent conditional on that option. Note, this is a way to
         * distinguish between the master and the partition in sysfs.
         */
        slave->mtd.dev.parent = IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER) || mtd_is_partition(parent) ?
                                &parent->dev :
                                parent->dev.parent;
        slave->mtd.dev.of_node = part->of_node;

        if (parent->_read)
                slave->mtd._read = part_read;
        if (parent->_write)
                slave->mtd._write = part_write;
        if (parent->_panic_write)
                slave->mtd._panic_write = part_panic_write;

        if (parent->_point && parent->_unpoint) {
                slave->mtd._point = part_point;
                slave->mtd._unpoint = part_unpoint;
        }

        if (parent->_read_oob)
                slave->mtd._read_oob = part_read_oob;
        if (parent->_write_oob)
                slave->mtd._write_oob = part_write_oob;
        if (parent->_read_user_prot_reg)
                slave->mtd._read_user_prot_reg = part_read_user_prot_reg;
        if (parent->_read_fact_prot_reg)
                slave->mtd._read_fact_prot_reg = part_read_fact_prot_reg;
        if (parent->_write_user_prot_reg)
                slave->mtd._write_user_prot_reg = part_write_user_prot_reg;
        if (parent->_lock_user_prot_reg)
                slave->mtd._lock_user_prot_reg = part_lock_user_prot_reg;
        if (parent->_get_user_prot_info)
                slave->mtd._get_user_prot_info = part_get_user_prot_info;
        if (parent->_get_fact_prot_info)
                slave->mtd._get_fact_prot_info = part_get_fact_prot_info;
        if (parent->_sync)
                slave->mtd._sync = part_sync;
        if (!partno && !parent->dev.class && parent->_suspend &&
            parent->_resume) {
                slave->mtd._suspend = part_suspend;
                slave->mtd._resume = part_resume;
        }
        if (parent->_writev)
                slave->mtd._writev = part_writev;
        if (parent->_lock)
                slave->mtd._lock = part_lock;
        if (parent->_unlock)
                slave->mtd._unlock = part_unlock;
        if (parent->_is_locked)
                slave->mtd._is_locked = part_is_locked;
        if (parent->_block_isreserved)
                slave->mtd._block_isreserved = part_block_isreserved;
        if (parent->_block_isbad)
                slave->mtd._block_isbad = part_block_isbad;
        if (parent->_block_markbad)
                slave->mtd._block_markbad = part_block_markbad;
        if (parent->_max_bad_blocks)
                slave->mtd._max_bad_blocks = part_max_bad_blocks;

        if (parent->_get_device)
                slave->mtd._get_device = part_get_device;
        if (parent->_put_device)
                slave->mtd._put_device = part_put_device;

        slave->mtd._erase = part_erase;
        slave->parent = parent;
        slave->offset = part->offset;

        if (slave->offset == MTDPART_OFS_APPEND)
                slave->offset = cur_offset;
        if (slave->offset == MTDPART_OFS_NXTBLK) {
                tmp = cur_offset;
                slave->offset = cur_offset;
                remainder = do_div(tmp, wr_alignment);
                if (remainder) {
                        slave->offset += wr_alignment - remainder;
                        printk(KERN_NOTICE "Moving partition %d: "
                               "0x%012llx -> 0x%012llx\n", partno,
                               (unsigned long long)cur_offset, (unsigned long long)slave->offset);
                }
        }
        if (slave->offset == MTDPART_OFS_RETAIN) {
                slave->offset = cur_offset;
                if (parent->size - slave->offset >= slave->mtd.size) {
                        slave->mtd.size = parent->size - slave->offset
                                                        - slave->mtd.size;
                } else {
                        printk(KERN_ERR "mtd partition \"%s\" doesn't have enough space: %#llx < %#llx, disabled\n",
                                part->name, parent->size - slave->offset,
                                slave->mtd.size);
                        /* register to preserve ordering */
                        goto out_register;
                }
        }
        if (slave->mtd.size == MTDPART_SIZ_FULL)
                slave->mtd.size = parent->size - slave->offset;

        printk(KERN_NOTICE "0x%012llx-0x%012llx : \"%s\"\n", (unsigned long long)slave->offset,
                (unsigned long long)(slave->offset + slave->mtd.size), slave->mtd.name);

        /* let's do some sanity checks */
        if (slave->offset >= parent->size) {
                /* let's register it anyway to preserve ordering */
                slave->offset = 0;
                slave->mtd.size = 0;

                /* Initialize ->erasesize to make add_mtd_device() happy. */
                slave->mtd.erasesize = parent->erasesize;

                printk(KERN_ERR"mtd: partition \"%s\" is out of reach -- disabled\n",
                        part->name);
                goto out_register;
        }
        if (slave->offset + slave->mtd.size > parent->size) {
                slave->mtd.size = parent->size - slave->offset;
                printk(KERN_WARNING"mtd: partition \"%s\" extends beyond the end of device \"%s\" -- size truncated to %#llx\n",
                        part->name, parent->name, (unsigned long long)slave->mtd.size);
        }
        if (parent->numeraseregions > 1) {
                /* Deal with variable erase size stuff */
                int i, max = parent->numeraseregions;
                u64 end = slave->offset + slave->mtd.size;
                struct mtd_erase_region_info *regions = parent->eraseregions;

                /* Find the first erase regions which is part of this
                 * partition. */
                for (i = 0; i < max && regions[i].offset <= slave->offset; i++)
                        ;
                /* The loop searched for the region _behind_ the first one */
                if (i > 0)
                        i--;

                /* Pick biggest erasesize */
                for (; i < max && regions[i].offset < end; i++) {
                        if (slave->mtd.erasesize < regions[i].erasesize) {
                                slave->mtd.erasesize = regions[i].erasesize;
                        }
                }
                BUG_ON(slave->mtd.erasesize == 0);
        } else {
                /* Single erase size */
                slave->mtd.erasesize = parent->erasesize;
        }

        /*
         * Slave erasesize might differ from the master one if the master
         * exposes several regions with different erasesize. Adjust
         * wr_alignment accordingly.
         */
        if (!(slave->mtd.flags & MTD_NO_ERASE))
                wr_alignment = slave->mtd.erasesize;

        tmp = part_absolute_offset(parent) + slave->offset;
        remainder = do_div(tmp, wr_alignment);
        if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) {
                /* Doesn't start on a boundary of major erase size */
                /* FIXME: Let it be writable if it is on a boundary of
                 * _minor_ erase size though */
                slave->mtd.flags &= ~MTD_WRITEABLE;
                printk(KERN_WARNING"mtd: partition \"%s\" doesn't start on an erase/write block boundary -- force read-only\n",
                        part->name);
        }

        tmp = part_absolute_offset(parent) + slave->mtd.size;
        remainder = do_div(tmp, wr_alignment);
        if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) {
                slave->mtd.flags &= ~MTD_WRITEABLE;
                printk(KERN_WARNING"mtd: partition \"%s\" doesn't end on an erase/write block -- force read-only\n",
                        part->name);
        }

        mtd_set_ooblayout(&slave->mtd, &part_ooblayout_ops);
        slave->mtd.ecc_step_size = parent->ecc_step_size;
        slave->mtd.ecc_strength = parent->ecc_strength;
        slave->mtd.bitflip_threshold = parent->bitflip_threshold;
        if (parent->_block_isbad) {
                uint64_t offs = 0;

                while (offs < slave->mtd.size) {
                        if (mtd_block_isreserved(parent, offs + slave->offset))
                                slave->mtd.ecc_stats.bbtblocks++;
                        else if (mtd_block_isbad(parent, offs + slave->offset))
                                slave->mtd.ecc_stats.badblocks++;
                        offs += slave->mtd.erasesize;
                }
        }

out_register:
        return slave;
}
复制代码

3.2.2 mtd_partitions

链表mtd_partitions定义在drivers/mtd/mtdpart.c:

static LIST_HEAD(mtd_partitions);

3.3 mtd_device_register

宏mtd_device_register定义在include/linux/mtd/mtd.h:

#define mtd_device_register(master, parts, nr_parts)    \
        mtd_device_parse_register(master, NULL, NULL, parts, nr_parts)

函数mtd_device_parse_register定义在drivers/mtd/mtdcore.c:

复制代码
/**
 * mtd_device_parse_register - parse partitions and register an MTD device.
 *
 * @mtd: the MTD device to register
 * @types: the list of MTD partition probes to try, see
 *         'parse_mtd_partitions()' for more information
 * @parser_data: MTD partition parser-specific data
 * @parts: fallback partition information to register, if parsing fails;
 *         only valid if %nr_parts > %0
 * @nr_parts: the number of partitions in parts, if zero then the full
 *            MTD device is registered if no partition info is found
 *
 * This function aggregates MTD partitions parsing (done by
 * 'parse_mtd_partitions()') and MTD device and partitions registering. It
 * basically follows the most common pattern found in many MTD drivers:
 *
 * * If the MTD_PARTITIONED_MASTER option is set, then the device as a whole is
 *   registered first.
 * * Then It tries to probe partitions on MTD device @mtd using parsers
 *   specified in @types (if @types is %NULL, then the default list of parsers
 *   is used, see 'parse_mtd_partitions()' for more information). If none are
 *   found this functions tries to fallback to information specified in
 *   @parts/@nr_parts.
 * * If no partitions were found this function just registers the MTD device
 *   @mtd and exits.
 *
 * Returns zero in case of success and a negative error code in case of failure.
 */
int mtd_device_parse_register(struct mtd_info *mtd, const char * const *types,
                              struct mtd_part_parser_data *parser_data,
                              const struct mtd_partition *parts, // 分区表
                              int nr_parts)  // 分区个数
{
        int ret;

        mtd_set_dev_defaults(mtd);

        if (IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER)) {  // 将Nand Flash当做一个分区注册进内核
                ret = add_mtd_device(mtd);   // 注册MTD设备
                if (ret)
                        return ret;
        }

        /* Prefer parsed partitions over driver-provided fallback */
        ret = parse_mtd_partitions(mtd, types, parser_data);
        if (ret > 0)
                ret = 0;
        else if (nr_parts)  // 注册MTD设备
                ret = add_mtd_partitions(mtd, parts, nr_parts);
        else if (!device_is_registered(&mtd->dev))
                ret = add_mtd_device(mtd);
        else
                ret = 0;

        if (ret)
                goto out;
        /*
         * FIXME: some drivers unfortunately call this function more than once.
         * So we have to check if we've already assigned the reboot notifier.
         *
         * Generally, we can make multiple calls work for most cases, but it
         * does cause problems with parse_mtd_partitions() above (e.g.,
         * cmdlineparts will register partitions more than once).
         */
        WARN_ONCE(mtd->_reboot && mtd->reboot_notifier.notifier_call,
                  "MTD already registered\n");
        if (mtd->_reboot && !mtd->reboot_notifier.notifier_call) {
                mtd->reboot_notifier.notifier_call = mtd_reboot_notifier;
                register_reboot_notifier(&mtd->reboot_notifier);
        }

out:
        if (ret && device_is_registered(&mtd->dev))
                del_mtd_device(mtd);  // 卸载MTD设备

        return ret;
}
复制代码

四、mtdblock.c

之前我们已经介绍过mtdbloc.c文件,该文件实现了MTD块设备相关接口,我们直接定位到drivers/mtd/mtdblock.c文件,并对源码进行解析。

4.1 模块入口函数

我们定位到MTD块设备模块入口函数:

复制代码
static struct mtd_blktrans_ops mtdblock_tr = {  // 这里面定义了MTD块设备相关信息以及操作函数
        .name           = "mtdblock",
        .major          = MTD_BLOCK_MAJOR,   // MTD块设备主设备号  31
        .part_bits      = 0,                 // 磁盘设备分区位数  0表示不分区  1表示有2个分区  2表示有4个分区...
        .blksize        = 512,               // 扇区大小
        .open           = mtdblock_open,    
        .flush          = mtdblock_flush,
        .release        = mtdblock_release,
        .readsect       = mtdblock_readsect,
        .writesect      = mtdblock_writesect,
        .add_mtd        = mtdblock_add_mtd,
        .remove_dev     = mtdblock_remove_dev,
        .owner          = THIS_MODULE,
};

static int __init init_mtdblock(void)
{
        return register_mtd_blktrans(&mtdblock_tr);
}
复制代码

4.2 register_mtd_blktrans

定位到register_mtd_blktrans函数,该函数位于drivers/mtd/mtd_blkdevs.c:

复制代码
int register_mtd_blktrans(struct mtd_blktrans_ops *tr)
{
        struct mtd_info *mtd;
        int ret;

        /* Register the notifier if/when the first device type is
           registered, to prevent the link/init ordering from fucking
           us over. */
        if (!blktrans_notifier.list.next)  // next指向NULL,进入
                register_mtd_user(&blktrans_notifier);  // 注册blktrans_notifier到mtd_notifiers链表                     


        mutex_lock(&mtd_table_mutex);

        ret = register_blkdev(tr->major, tr->name);   // 注册块设备,主设备号为MTD_BLOCK_MAJOR,定义为31
        if (ret < 0) {
                printk(KERN_WARNING "Unable to register %s block device on major %d: %d\n",
                       tr->name, tr->major, ret);
                mutex_unlock(&mtd_table_mutex);
                return ret;
        }

        if (ret)
                tr->major = ret;

        tr->blkshift = ffs(tr->blksize) - 1;

        INIT_LIST_HEAD(&tr->devs);
        list_add(&tr->list, &blktrans_majors);  // 注册tr到链表blktrans_majors

        mtd_for_each_device(mtd)
                if (mtd->type != MTD_ABSENT)
                        tr->add_mtd(tr, mtd);

        mutex_unlock(&mtd_table_mutex);
        return 0;
}
复制代码

该函数主要包含三部分:

  • 调用register_mtd_user:注册blktrans_notifier到链表mtd_notifiers,然后遍历全局变量mtd_idr获取mtd,执行blktrans_notify_add(mtd);
  • 调用register_blkdev注册块设备,主设备号为31,块设备名称为mtdblock;
  • 注册mtdblock_tr到链表blktrans_majors,链表定义为static LIST_HEAD(blktrans_majors);;
  • 然后遍历全局变量mtd_idr获取mtd,执行mtdblock_add_mtd(mtdblock_tr,mtd);

4.2.1 mtd_notifier

mtd_notifier定义在include/linux/mtd/mtd.h:

struct mtd_notifier {
        void (*add)(struct mtd_info *mtd);
        void (*remove)(struct mtd_info *mtd);
        struct list_head list;
};

4.2.2 blktrans_notifier

这里我们关注一下register_mtd_user(&blktrans_notifier),变量blktrans_notifier,定义在drivers/mtd/mtd_blkdevs.c:

static struct mtd_notifier blktrans_notifier = {
        .add = blktrans_notify_add,
        .remove = blktrans_notify_remove,
};

4.2.3 register_mtd_user

register_mtd_user函数将new->list添加到链表mtd_notifiers:

复制代码
/**
 *      register_mtd_user - register a 'user' of MTD devices.
 *      @new: pointer to notifier info structure
 *
 *      Registers a pair of callbacks function to be called upon addition
 *      or removal of MTD devices. Cau                    ses the 'add' callback to be immediately
 *      invoked for each MTD device currently present in the system.                       
 */
void register_mtd_user (struct mtd_notifier *new)
{
        struct mtd_info *mtd;

        mutex_lock(&mtd_table_mutex);           // 互斥锁

        list_add(&new->list, &mtd_notifiers);   // 加入链表

        __module_get(THIS_MODULE);

        mtd_for_each_device(mtd)       // 遍历mtd_idr,得到mtd
                new->add(mtd);     // 最终执行blktrans_notify_add(mtd)

        mutex_unlock(&mtd_table_mutex);       // 解锁
}
复制代码

4.2.4 mtd_for_each_device

mtd_for_each_device宏定义在drivers/mtd/mtdcore.h:

#define mtd_for_each_device(mtd)                        \
        for ((mtd) = __mtd_next_device(0);              \
             (mtd) != NULL;                             \
             (mtd) = __mtd_next_device(mtd->index + 1))

__mtd_next_device定义在drivers/mtd/mtdcore.c:

struct mtd_info *__mtd_next_device(int i)
{
        return idr_get_next(&mtd_idr, &i);
}

这里实际上就是去遍历mtd_idr这个redix树上的所有节点,得到每个节点关联的mtd。

4.2.5 blktrans_notify_add 

然后进入blktrans_notifier变量的blktrans_notify_add ()函数。

复制代码
static void blktrans_notify_add(struct mtd_info *mtd)
{
        struct mtd_blktrans_ops *tr;

        if (mtd->type == MTD_ABSENT)
                return;

        list_for_each_entry(tr, &blktrans_majors, list)   // 遍历blktrans_majors链表
                tr->add_mtd(tr, mtd);  // 执行mtd_blktrans_ops结构体的add_mtd
}
复制代码

在MTD块设备驱动入口函数中,会将mtdblock_tr添加到链表blktrans_majors,所以这里遍历blktrans_majors链表,实际上得到的tr就是mtdblock_tr:然后执行mtdblock_tr.add_mtd(mtdblock_tr,mtd)方法。

mtdblock_tr的add_mtd函数,就是mtdblock_add_mtd函数。

4.2.6 在mtdblock_add_mtd
复制代码
static void mtdblock_add_mtd(struct mtd_blktrans_ops *tr, struct mtd_info *mtd)
{
        struct mtdblk_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL);

        if (!dev)
                return;

        dev->mbd.mtd = mtd;             // 设置MTD原始设备
        dev->mbd.devnum = mtd->index;   // 设置起始次设备号

        dev->mbd.size = mtd->size >> 9;  // 总扇区个数
        dev->mbd.tr = tr;

        if (!(mtd->flags & MTD_WRITEABLE))
                dev->mbd.readonly = 1;

        if (add_mtd_blktrans_dev(&dev->mbd))
                kfree(dev);
}
复制代码

mtdblock_add_mtd函数:

  • 分配了一个mtdblk_dev结构体遍历dev:
  • 初始化dev成员;
  • 调用add_mtd_blktrans_dev(dev->mtd);

mtdblk_dev数据结构实际描述的就是一个MTD块设备,其包含MTD原始设备,定义在drivers/mtd/mtdblock.c:

复制代码
struct mtdblk_dev {
        struct mtd_blktrans_dev mbd;
        int count;
        struct mutex cache_mutex;
        unsigned char *cache_data;
        unsigned long cache_offset;
        unsigned int cache_size;
        enum { STATE_EMPTY, STATE_CLEAN, STATE_DIRTY } cache_state;
};
复制代码
复制代码
struct mtd_blktrans_dev {
        struct mtd_blktrans_ops *tr;    // MTD设备相关信息以及操作函数
        struct list_head list;
        struct mtd_info *mtd;     // MTD原始设备
        struct mutex lock;
        int devnum;                // 用于计算起始次设备号(devnum<<tr->part_bits,左移0位),由于一个MTD块设备可能存在若干个分区,假设有2个分区 那两个分区次设备号就是devnum+1,devnum+2,其中devnum表示整个磁盘
        bool bg_stop;
        unsigned long size;         // 扇区个数
        int readonly;
        int open;
        struct kref ref;
        struct gendisk *disk;          // 磁盘设备
        struct attribute_group *disk_attributes;
        struct request_queue *rq;       // 请求队列
        struct list_head rq_list;
        struct blk_mq_tag_set *tag_set;  // 标签集
        spinlock_t queue_lock;
        void *priv;
        fmode_t file_mode;
};
复制代码

4.2.7 add_mtd_blktrans_dev

add_mtd_blktrans_dev定义在drivers/mtd/mtd_blkdevs.c:

复制代码
int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
{
        struct mtd_blktrans_ops *tr = new->tr;
        struct mtd_blktrans_dev *d;
        int last_devnum = -1;
        struct gendisk *gd;
        int ret;

        if (mutex_trylock(&mtd_table_mutex)) {
                mutex_unlock(&mtd_table_mutex);
                BUG();
        }

        mutex_lock(&blktrans_ref_mutex);
        list_for_each_entry(d, &tr->devs, list) {   // tr->devs是个链表,遍历链表得到mtd_blktrans_dev
                if (new->devnum == -1) {            // new设备未设置devnum号,分配一个空闲的devnum,默认从0开始分配,逐渐递增.....
                        /* Use first free number */
                        if (d->devnum != last_devnum+1) {
                                /* Found a free devnum. Plug it in here */
                                new->devnum = last_devnum+1;          // 新的devnum
                                list_add_tail(&new->list, &d->list);  // 将当前new添加到链表尾部
                                goto added;
                        }
                } else if (d->devnum == new->devnum) {   // new设置的devnum已经被占用
                        /* Required number taken */
                        mutex_unlock(&blktrans_ref_mutex);
                        return -EBUSY;
                } else if (d->devnum > new->devnum) {
                        /* Required number was free */
                        list_add_tail(&new->list, &d->list);
                        goto added;
                }
                last_devnum = d->devnum;  // 更新最新设备分配的次设备号
        }

        ret = -EBUSY;
        if (new->devnum == -1)
                new->devnum = last_devnum+1;

        /* Check that the device and any partitions will get valid
         * minor numbers and that the disk naming code below can cope
         * with this number. */
        if (new->devnum > (MINORMASK >> tr->part_bits) ||
            (tr->part_bits && new->devnum >= 27 * 26)) {
                mutex_unlock(&blktrans_ref_mutex);
                goto error1;
        }

        list_add_tail(&new->list, &tr->devs);
 added:
        mutex_unlock(&blktrans_ref_mutex);

        mutex_init(&new->lock);
        kref_init(&new->ref);
        if (!tr->writesect)
                new->readonly = 1;

        /* Create gendisk */
        ret = -ENOMEM;
        gd = alloc_disk(1 << tr->part_bits);  // 分配一个gendisk结构体,设置分区个数

        if (!gd)
                goto error2;

        new->disk = gd;
        gd->private_data = new;  // 私有数据
        gd->major = tr->major;   // 设置主设备号
        gd->first_minor = (new->devnum) << tr->part_bits;  // 设置起始次设备号
        gd->fops = &mtd_block_ops;  // 设置块设备操作函数

        if (tr->part_bits)   //0    
                if (new->devnum < 26)
                        snprintf(gd->disk_name, sizeof(gd->disk_name),
                                 "%s%c", tr->name, 'a' + new->devnum);
                else
                        snprintf(gd->disk_name, sizeof(gd->disk_name),
                                 "%s%c%c", tr->name,
                                 'a' - 1 + new->devnum / 26,
                                 'a' + new->devnum % 26);
        else     // 设置磁盘名 即/dev/mtdblock%d
                snprintf(gd->disk_name, sizeof(gd->disk_name),
                         "%s%d", tr->name, new->devnum);

        set_capacity(gd, ((u64)new->size * tr->blksize) >> 9);  // 设置容量 单位扇区

        /* Create the request queue */
        spin_lock_init(&new->queue_lock);
        INIT_LIST_HEAD(&new->rq_list);

        new->tag_set = kzalloc(sizeof(*new->tag_set), GFP_KERNEL);
        if (!new->tag_set)
                goto error3;

        new->rq = blk_mq_init_sq_queue(new->tag_set, &mtd_mq_ops, 2,
                                BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING);  // 设置请求队列,同时设置块设备驱动行为的回调函数为mtd_mq_ops
        if (IS_ERR(new->rq)) {
                ret = PTR_ERR(new->rq);
                new->rq = NULL;
                goto error4;
        }

        if (tr->flush)
                blk_queue_write_cache(new->rq, true, false);

        new->rq->queuedata = new;
        blk_queue_logical_block_size(new->rq, tr->blksize);

        blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);

        if (tr->discard) {
                blk_queue_flag_set(QUEUE_FLAG_DISCARD, new->rq);
                blk_queue_max_discard_sectors(new->rq, UINT_MAX);
        }

        gd->queue = new->rq;  // 设置请求队列

        if (new->readonly)
                set_disk_ro(gd, 1);

        device_add_disk(&new->mtd->dev, gd, NULL);  // 向内核注册gendisk

        if (new->disk_attributes) {
                ret = sysfs_create_group(&disk_to_dev(gd)->kobj,
                                        new->disk_attributes);
                WARN_ON(ret);
        }
        return 0;
error4:
        kfree(new->tag_set);
error3:
        put_disk(new->disk);
error2:
        list_del(&new->list);
error1:
        return ret;
}
复制代码

从该函数我们可以看到无论注册多少个MTD块设备,其主设备号都是31,只是次设备号不一样而已,主设备号用来表示一个特定的驱动程序。次设备号用来表示使用该驱动程序的各设备。

4.2.8 mtd_block_ops

这里我们关注一下MTD块设备操作集mtd_block_ops,定义在drivers/mtd/mtd_blkdevs.c。

复制代码
static const struct block_device_operations mtd_block_ops = {
        .owner          = THIS_MODULE,
        .open           = blktrans_open,
        .release        = blktrans_release,
        .ioctl          = blktrans_ioctl,
        .getgeo         = blktrans_getgeo,
};
复制代码

其中部分函数指针的意义:

  • open:当打开一个MTD块设备的时候被调用;
  • release:当关闭一个MTD块设备的时候被调用;
  • getgeo:获取驱动器的集合信息,获取到的信息会被填充在一个hd_geometry结构中;
  • ioctl:对MTD块设备进行一些特殊操作时调用;

4.2.9 blktrans_open
复制代码
static int blktrans_open(struct block_device *bdev, fmode_t mode)
{
        struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk);
        int ret = 0;

        if (!dev)
                return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/

        mutex_lock(&mtd_table_mutex);
        mutex_lock(&dev->lock);

        if (dev->open)
                goto unlock;

        kref_get(&dev->ref);
        __module_get(dev->tr->owner);

        if (!dev->mtd)
                goto unlock;

        if (dev->tr->open) {
                ret = dev->tr->open(dev);  // 实际上调用了mtd_blktrans_ops的open函数
                if (ret)
                        goto error_put;
        }

        ret = __get_mtd_device(dev->mtd);
        if (ret)
                goto error_release;
        dev->file_mode = mode;

unlock:
        dev->open++;
        mutex_unlock(&dev->lock);
        mutex_unlock(&mtd_table_mutex);
        blktrans_dev_put(dev);
        return ret;

error_release:
        if (dev->tr->release)
                dev->tr->release(dev);
error_put:
        module_put(dev->tr->owner);
        kref_put(&dev->ref, blktrans_dev_release);
        mutex_unlock(&dev->lock);
        mutex_unlock(&mtd_table_mutex);
        blktrans_dev_put(dev);
   
复制代码

4.2.10 blktrans_ioctl
复制代码
static int blktrans_ioctl(struct block_device *bdev, fmode_t mode,
                              unsigned int cmd, unsigned long arg)
{
        struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk);
        int ret = -ENXIO;

        if (!dev)
                return ret;

        mutex_lock(&dev->lock);

        if (!dev->mtd)
                goto unlock;

        switch (cmd) {
        case BLKFLSBUF:
                ret = dev->tr->flush ? dev->tr->flush(dev) : 0;
                break;
        default:
                ret = -ENOTTY;
        }
unlock:
        mutex_unlock(&dev->lock);
        blktrans_dev_put(dev);
        return ret;
}
复制代码

4.2.11 mtd_mq_ops

这里我们关注一下MTD块设备驱动mq的操作集合,定义在drivers/mtd/mtd_blkdevs.c。

static const struct blk_mq_ops mtd_mq_ops = {
        .queue_rq       = mtd_queue_rq,
};

在上一节分析我们已经知道将request请求派发给块设备驱动的时候会被调用queue_rq函数,该函数本质上就是进行磁盘和内存之间的数据交互操作。比如将内存数据写入磁盘、或者从磁盘读取数据到内存等。

复制代码
static blk_status_t mtd_queue_rq(struct blk_mq_hw_ctx *hctx,
                                 const struct blk_mq_queue_data *bd)
{
        struct mtd_blktrans_dev *dev;

        dev = hctx->queue->queuedata;
        if (!dev) {
                blk_mq_start_request(bd->rq);
                return BLK_STS_IOERR;
        }

        spin_lock_irq(&dev->queue_lock);
        list_add_tail(&bd->rq->queuelist, &dev->rq_list);
        mtd_blktrans_work(dev);   // 这里就不细究了,读取操作会调用mtdblock_tr.readsect、写入操作会调用mtdblock_tr.writesect,有兴趣自己研究哈
        spin_unlock_irq(&dev->queue_lock);

        return BLK_STS_OK;
}
复制代码

4.3 MTD块设备流程图

register_mtd_blktrans函数执行流程如图:

MTD块设备的入口函数:

  • 将blktrans_notifier添加到mtd_notifiers链表中;
  • 上图第一个双向循环里mtd_idr树只有根节点,所以并不会进入循环,循环内这块代码不会执行;
  • 然后接着注册块设备号主设备号,主设备号为31,块设备名称为mtdblock;
  • 然后进入下面第二个循环里,同理,第二个循环也不会进入。

然后在add_mtd_device(mtd)函数中:

  • 为mtd原始设备分配节点;
  • 设置mtd原始设备的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息;
  • 设置mtd原始设备对应的device类型变量所属的class为mtd_class,并设置其设备号,类型、名称、driver_data;调用device_register完成名字为mtd%d MTD字符设备的注册;
  • 调用device_create完成名字为mtd%dro MTD字符设备的创建、初始化以及注册;
  • 遍历blktrans_notifier,当查找到有blktrans_notifier时,就调用blktrans_notifier->add(mtd):
    • 分配gendisk结构体,设置成员参数:
      • private_data;
      • 设置主设备号major(MTD_BLOCK_MAJOR,值为31);
      • 设置起始次设备号first_minor(如果注册了多个MTD设备,该值是逐渐递增的);
      • 磁盘设备disk_name,设置为mtdblock%d,会在/dev下创建该文件;
      • 块设备操作集fops;
    • 初始化请求队列;
    • 最后注册gendisk。

比如开发板启动后,我们加载Nand Flash驱动后,可以查看到如下信息:

复制代码
[root@zy:/]# ls /sys/class/mtd/ -l
total 0
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd0 -> ../../devices/virtual/mtd/mtd0
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd0ro -> ../../devices/virtual/mtd/mtd0ro
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd1 -> ../../devices/virtual/mtd/mtd1
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd1ro -> ../../devices/virtual/mtd/mtd1ro
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd2 -> ../../devices/virtual/mtd/mtd2
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd2ro -> ../../devices/virtual/mtd/mtd2ro
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd3 -> ../../devices/virtual/mtd/mtd3
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd3ro -> ../../devices/virtual/mtd/mtd3ro
[root@zy:/]# ls -l /dev/mtd*
crw-rw----    1 0        0          90,   0 Jan  1 00:00 /dev/mtd0
crw-rw----    1 0        0          90,   1 Jan  1 00:00 /dev/mtd0ro
crw-rw----    1 0        0          90,   2 Jan  1 00:00 /dev/mtd1
crw-rw----    1 0        0          90,   3 Jan  1 00:00 /dev/mtd1ro
crw-rw----    1 0        0          90,   4 Jan  1 00:00 /dev/mtd2
crw-rw----    1 0        0          90,   5 Jan  1 00:00 /dev/mtd2ro
crw-rw----    1 0        0          90,   6 Jan  1 00:00 /dev/mtd3
crw-rw----    1 0        0          90,   7 Jan  1 00:00 /dev/mtd3ro
brw-rw----    1 0        0          31,   0 Jan  1 00:00 /dev/mtdblock0
brw-rw----    1 0        0          31,   1 Jan  1 00:00 /dev/mtdblock1
brw-rw----    1 0        0          31,   2 Jan  1 00:00 /dev/mtdblock2
brw-rw----    1 0        0          31,   3 Jan  1 00:00 /dev/mtdblock3
复制代码

五、mtdchar.c

之前我们已经介绍过mtdchar.c文件,该文件实现了MTD字符设备相关接口,我们直接定位到drivers/mtd/mtdchar.c文件,并对源码进行解析。

5.1 模块入口函数

复制代码
static const struct file_operations mtd_fops = {  // 字符设备操作集
        .owner          = THIS_MODULE,
        .llseek         = mtdchar_lseek,
        .read           = mtdchar_read,
        .write          = mtdchar_write,
        .unlocked_ioctl = mtdchar_unlocked_ioctl,
#ifdef CONFIG_COMPAT
        .compat_ioctl   = mtdchar_compat_ioctl,
#endif
        .open           = mtdchar_open,
        .release        = mtdchar_close,
        .mmap           = mtdchar_mmap,
#ifndef CONFIG_MMU
        .get_unmapped_area = mtdchar_get_unmapped_area,
        .mmap_capabilities = mtdchar_mmap_capabilities,
#endif
};

int __init init_mtdchar(void)
{
        int ret;

        ret = __register_chrdev(MTD_CHAR_MAJOR, 0, 1 << MINORBITS,    // MTD字符设备主设备号90, MINORBITS=20
                                   "mtd", &mtd_fops);  // 字符设备名称为mtd%d
        if (ret < 0) {
                pr_err("Can't allocate major number %d for MTD\n",
                       MTD_CHAR_MAJOR);
                return ret;
        }

        return ret;
}
复制代码

5.2 __register_chrdev

定位到__register_chrdev函数,该函数位于fs/char_dev.c:

复制代码
/**
 * __register_chrdev() - create and register a cdev occupying a range of minors
 * @major: major device number or 0 for dynamic allocation
 * @baseminor: first of the requested range of minor numbers
 * @count: the number of minor numbers required
 * @name: name of this range of devices
 * @fops: file operations associated with this devices
 *
 * If @major == 0 this functions will dynamically allocate a major and return
 * its number.
 *
 * If @major > 0 this function will attempt to reserve a device with the given
 * major number and will return zero on success.
 *
 * Returns a -ve errno on failure.
 *
 * The name of this device has nothing to do with the name of the device in
 * /dev. It only helps to keep track of the different owners of devices. If
 * your module name has only one type of devices it's ok to use e.g. the name
 * of the module here.
 */
int __register_chrdev(unsigned int major, unsigned int baseminor,
                      unsigned int count, const char *name,
                      const struct file_operations *fops)
{
        struct char_device_struct *cd;
        struct cdev *cdev;
        int err = -ENOMEM;

        cd = __register_chrdev_region(major, baseminor, count, name); // 静态注册一组字符设备号
        if (IS_ERR(cd))
                return PTR_ERR(cd);

        cdev = cdev_alloc();  // 动态申请字符设备
        if (!cdev)
                goto out2;

        cdev->owner = fops->owner;  // 初始化字符设备
        cdev->ops = fops;
        kobject_set_name(&cdev->kobj, "%s", name);

        err = cdev_add(cdev, MKDEV(cd->major, baseminor), count);  // 将字符设备注册到系统
        if (err)
                goto out;

        cd->cdev = cdev;

        return major ? 0 : cd->major;
out:
        kobject_put(&cdev->kobj);
out2:
        kfree(__unregister_chrdev_region(cd->major, baseminor, count));
        return err;
}
复制代码

实际上我们发现模块入口函数中主要进行了:

  • 字符设备号的申请,主设备号90,次设备号数量1<<20;
  • 字符设备的动态申请;
  • 字符设备的注册;

但是这里并没有创建class类、以及类下的文件,这一块是在add_mtd_device中实现的:

  • 调用class_create、device_create生成/sys/class下的class类(这里为mtd)以及class类下的dev文件,供mdev程序扫描生成/dev下的节点;

参考文章

[1]linux MTD系统解析(转)

[2]痞子衡嵌入式:并行NAND接口标准(ONFI)及SLC Raw NAND简介

[3]最新SSD固态硬盘颗粒QLC、SLC、MLC、TLC详解

[4]35.驱动--MTD子系统

[5]MTD NANDFLASH驱动相关知识介绍

posted @ 2022-12-01 14:16  Sky&Zhang  阅读(1259)  评论(0编辑  收藏  举报