linux驱动移植-Nand Flash ONFI标准和MTD子系统
----------------------------------------------------------------------------------------------------------------------------
内核版本:linux 5.2.8
根文件系统:busybox 1.25.0
u-boot:2016.05
----------------------------------------------------------------------------------------------------------------------------
一、ONFI标准
Nand Flash是嵌入式世界里常见的存储器,对于嵌入式开发而言,Nand Flash主要分为两大类:Serial Nand、Raw Nand,这两类Nand的差异是很大的。
Raw Nand是相对于Serial Nand而言的,Serial Nand即串行接口的Nand Flash,比如采用SPI通信协议的Nand Flash,而Raw Nand是并行接口的Nand Flash。
这里我们首先介绍ONFI协议,主要是因为在Nand Flash驱动源码分析的时候涉及到ONFI协议。而我们使用的K9F2G08U0C这款芯片并没有支持ONFI协议,我们将该芯片支持的命令和ONFI 1.0规定的命令对比就可以发现。
1.1 ONFI标准
说到Raw Nand发展史,其实早期的Raw Nand没有统一标准,虽然早在1989年Toshiba便发表了Nand Flash结构,但具体到Raw Nand芯片,各厂商都是自由设计,因此尺寸不统一、存储结构差异大、接口命令不通用等问题导致客户使用起来很难受。
为了改变这一现状,2006年几个主流的Raw Nand厂商(Hynix、Intel、Micron、Phison、Sony、ST)联合起来商量制订一个Raw Nand标准,这个标准叫Open Nand Flash Interface,简称ONFI,2006年12月ONFI 1.0标准正式推出,此后几乎所有的Raw Nand厂商都按照ONFI标准设计生产Raw Nand,从此不管哪家生产的Raw Nand对嵌入式设计者来说几乎都是一样的,至少在驱动代码层面是一样的。
ONFI官网:http://www.onfi.org/,在这里我们下载到ONFI协议规范:
1.2 Raw Nand分类
1.2.1 单元层数
Nand Flash内存单元按照层数可以分为:
- 单层单元(Single Level Cell,简称SLC):这种类型的闪存在读写数据时具有最为精确,并且还具有持续最长的数据读写寿命的优点。SLC擦写寿命约在9万到10万次之间。这种类型的闪存由于其使用寿命,准确性和综合性能,在企业市场上十分受众。但由于储存成本高、存储容量相对较小,在家用市场则不太受青睐。
- 多层单元(Multi Level Cell,简称MLC):它的命名来源于它在SLC的1位/单元的基础上,变成了2位/单元。这样做的一大优势在于大大降低了大容量储存闪存的成本,约3000--10000次擦写寿命。
- 三层单元(Triple Level Cell,简称TLC):TLC闪存是闪存生产中最低廉的规格,其储存达到了3位/单元,虽然高储存密度实现了较廉价的大容量格式,但其读写的生命周期被极大地缩短,擦写寿命只有短短的500~1000次,同时读写速度较差,只适合普通消费者使用,不能达到工业使用的标准。
- 四层单元(Quad Lebel Cell,简称QLC):QLC每个单元可储存4bit数据,跟TLC相比,QLC的储存密度提高了33%。QLC不仅能经受1000次编程或擦写循环(与TLC相当,甚至更好),而且容量提升了,成本也更低。
结论:SLC>MLC>TLC。
目前大多数U盘都是采用TLC芯片颗粒,其优点是价格便宜,不过速度一般,寿命相对较短。
而SSD固态硬盘中,目前MLC颗粒固态硬盘是主流,其价格适中,速度与寿命相对较好,而低价SSD固态硬盘普遍采用的是TLC芯片颗粒,大家在购买固态硬盘的时候,可以在产品参数中去了解。
SLC颗粒固态目前主要在一些高端固态硬盘中出现,售价多数上千元,甚至更贵。
智能手机方面,目前多数智能手机存储也是采用TLC芯片存储,而苹果iPhone6部分产品采用的TLC芯片,另外还有部分采用的是MLC芯片颗粒。总的来说,MLC闪存芯片颗粒是时下主流,产品在速度、寿命以及价格上适中,比较适合推荐。
1.2.2 数据线宽度
数据线宽度可以分为x8 、x16。
1.2.3 数据采集模式
数据采集模式可以分为 SDR、DDR。
1.2.4 接口命令标准
接口命令标准可以分为:非标、ONFI。
1.3 Raw Nand内存模型
ONFI规定了Raw Nand内存单元从大到小最多分为:Device、LUN(Die、Target)、Plane、Block、Page、Cell。
- Device:就是指单片Nand Flash,对外提供Package封装的芯片,1个Device包含1个或者多个LUN;
- LUN(Die、Target):是接收和执行Flash命令的基本单元,1个LUN包含1个或者多个plane。
- Plane:1个Plane包含多个Block。
- Block:能够执行擦除操作的最小单元,通常由多个Page组成。
- Page:能够执行编程和读操作的最小单元,通常大小为2KB等。
- Cell:Page中的最小操作擦写读单元,对应一个浮栅晶体管,可以存储1bit或多bit。
其中Page和Block是必有的,因为Page是读写的最小单元,Block是擦除的最小单元。而LUN和Plane则不是必有的(如没有,可认为LUN=1, Plane=1),一般在大容量Raw Nand(至少8Gb以上)上才会出现。
常见的Nand Flash内部只有一个chip(LUN)、每个chip只有1个plane,而有些复杂得,容量更大的Nand Flash,内部有多个chip,每个chip有多个plane。这类的Nand Flash,其实就是多了一个主控将多块Flash叠加在一起,如下图:
注:对于chip的概念,我理解就是上面的LUN,其实任何型号的Nand Flash,都可以称其是一个chip;但是上面我们所提到的,是针对内部来说的,也就是某型号的Nand Flash,内部有几个chip,比如:
- 三星的2GB的K9WAG08U1A芯片(可以理解为外部芯片/型号)内部装了2个单片是1GB的K9K8G08U0A,此时就称K9WAG08U1A内部有2个chip;
- 而有些单个的chip,内部又包含多个plane,比如上面的K9K8G08U0A内部包含4个单片是2Gb的Plane;
1.4 Raw Nand信号与封装
ONFI规定了Raw Nand信号线与封装,如下是典型的x8 Raw Nand内部结构图:
除了内存单元外,还有两大组成,分别是IO控制单元和逻辑控制单元,信号线主要挂在IO控制与逻辑单元,x8 Raw Nand主要有15根信号线(其中必须的是13根,$\overline{CE}$和$R\overline{B}$可以不用)。
引脚名称 | 描述 |
CLE | 命令使能,当CLE为高电平时,$\overline{WE}$ 上升沿锁存I/O输入到命令寄存器 |
ALE | 地址使能,当ALE为高电平时,$\overline{WE}$上升沿锁存I/O输入到地址寄存器 |
$\overline{CE}$ | 片选信号,低电位有效 |
$\overline{RE}$ | 读使能,低电位有效 |
$\overline{WE}$ | $\overline{WE}$上升沿锁存I/O输入到命令、地址、数据寄存器 |
$\overline{WP}$ | 写保护 |
$R\overline{B}$ | 就绪/忙输出信号(低电平表示操作还在进行中,高电平表示操作完成) |
VCC | 电源 |
VSS | 地 |
NC | 不接 |
I/O0 ~ I/O7 | 数据输入输出(命令、地址、数据公用数据总线) |
ONFI规定的封装标准有很多,比如TSOP48、LGA52、BGA63/100/132/152/272/316,其中对于嵌入式开发而言,最常用的是如下图扁平封装的TSOP-48,这种封装常用于容量较小的Raw Nand(1/2/4/8/16/32Gb),1-32Gb容量对于嵌入式设计而言差不多够用,且TSOP-48封装易于PCB设计,因此得以流行。
1.5 Raw Nand接口命令
ONFI 1.0规定了Raw Nand接口命令,如下表所示,其中一部分是必须要支持的(M),还有一部分是可选支持的(O)。必须支持的命令里最常用的是Read(Read Page)、Page Program、Block Erase、Read Status这三条,涵盖读写擦最基本的三种操作。
此外比较重要的还有:
- Read Status,用于获取命令执行状态与结果。
- Read Parameter Page:用于获取芯片内部存储的出厂信息(包括内存结构、特性、时序、其他行为参数等),其结构已由ONFI规定如下表,在设计Nand软件驱动时,可以通过获取这个Parameter Page来做到代码通用。
二、MTD设备驱动
MTD(Memory Technology Drivers)是用于访问memory设备( ROM 、 Flash)的Linux 的子系统, MTD 的主要目的是为了使新的memory设备的驱动更加简单,为此它在硬件和上层之间提供了一个抽象的接口。
2.1 MTD子系统概要
在介绍MTD之前,我们思考一个问题,linux内核为什么抽象出了MTD子系统呢?
我们回顾一下我们上一节块设备驱动编写的流程:
- 调用register_blkdev注册块设备主设备号;
- 使用alloc_disk申请一个通用磁盘对象gendisk;
- 使用blk_mq_init_sq_queue初始化一个请求队列;
- 设置gendisk结构体的成员;
- 设置成员参数major、first_minor、disk_name、fops;
- 设置成员参数queue,等于之前初始化的请求队列;
- 使用add_disk注册gendisk;
针对于每一种型号的Flash设备,我们进行块设备驱动编写的时候,都要重复进行如上的操作。那我们就开始想了,各种型号的Flash设备有什么区别呢?以Nand Flash为例,主要就是内存模型(页大小、块大小、页数/块、OOB等)、以及时序参数略有差别,那我们是否可以将与Nand Flash紧密相关的部分抽离出来,由Nand Flash驱动层提供,而其他相同部分单独抽离出来。MTD子系统就是做了这样的事情。
2.2 MTD子系统框架
如上图所示,MTD程序框架通用可以分为四层,从上到下以此为设备节点、MTD设备层、MTD原始设备层,Flash驱动层。
- 设备节点:通过mknod在/dev子目录下建立MTD块设备节点(主设备号为31)和MTD字符设备节点(主设备号为90),通过访问此设备节点即可访问MTD字符设备和块设备 。
- MTD设备层:基于MTD原始设备,linux系统可以定义出MTD的块设备(主设备号31)和字符设备(设备号90)。其中:
- mtdchar.c:MTD字符设备接口相关实现;
- mtdblock.c:MTD块设备接口相关实现;这部分负责设备的建立、数据的读写、优化处理等。这跟传统的块设备驱动一样,块设备主设备号的申请,gendisk结构体的分配设置、队列的初始化等,这些都是由内核自动完成。
- MTD原始设备层:用于描述MTD原始设备的数据结构是mtd_info,它定义了大量的关于MTD的数据和操作函数。其中:
- mtdcore.c: MTD原始设备接口相关实现;
- mtdpart.c : MTD分区接口相关实现;
- Flash驱动层:Flash驱动层负责对Flash硬件的读、写和擦除操作,Nand Flash和Nor Flash有不同的协议和硬件细节,这部分知道发什么,如发送什么命令可以识别、读写、擦除等操作,以及硬件该怎么发。Nand Flash有Nand的协议,Nor Flash有Nor的协议,不同协议有不同的函数,通过对应的结构体和函数构造对应的操作环境。用户只需要完成Flash驱动层的相关结构体的分配、设置、注册,并建立从具体设备到MTD原始设备映射关系。
- Nand Flash芯片的驱动位于drivers/mtd/nand/子目录下,Nand Flash使用nand_chip结构体;
- Nor Flash芯片驱动位于drivers/mtd/chips/子目录下,Nor Flash使用map_info结构体;
2.2.1 Flash驱动层
(1) Nor Flash驱动
linux内核实现了针对CFI、JEDEC等接口标准的通用Nor Flash驱动。在上述接口驱动基础上,芯片级驱动较简单 :定义具体内存映射结构体map_info,然后通过接口类型后调用do_map_probe。
以physmap.c(位于drivers/mtd/maps/)为例:
- 定义map_info结构体,初始化成员name、size、phys、bankwidth;
- 通过ioremap映射成员virt(虚拟内存地址);
- 通过函数simple_map_init初始化map_info成员函数read、write、copy_from、copy_to;
- 通过do_map_probe进行CFI接口探测,返回mtd_info结构体;
- 通过mtd_device_parse_register注册MTD原始设备;
(2) Nand Flash驱动
linux内核实现了通用Nand Flash驱动(drivers/mtd/nand/raw/nand_base.c),芯片级驱动需要实现nand_chip结构。
MTD使用nand_chip来表示一个Nand Flash芯片, 该结构体包含了关于Nand Flash的内存模型信息,读写方法,ECC模式,硬件控制等一系列底层机制。
以s3c2410.c(位于drivers/mtd/nand/raw)为例:
-
分配nand_chip内存;
-
根据SOC Nand控制器初始化nand_chip成员,比如:chip->legacy(成员write_buf、read_buf、select_chip、cmd_ctrl、dev_ready、IO_ADDR_R、IO_ADDR_W)、chip->controller;
- 设置chip->priv为mtd_info;
-
以mtd_info为参数调用nand_scan()探测Nand Flash,nand_scan()会读取nand芯片ID:
- 初始化chip->base.mtd(成员writesize、oobsize、erasesize等);
- 初始化chip->base.memorg(成员bits_per_cell、pagesize、oobsize、pages_per_eraseblock、planes_per_lun、luns_per_target、ntatgets等);
- 初始化chip->options、chip->base.eccreq;
- 初始化chip->ecc各个成员(设置ecc模式及处理函数);
- chip成员中所有未初始化函数指针则使用nand_base.c中的默认函数;
-
mtd_info和mtd_partition为参数调用mtd_device_register()进行MTD设备注册;
2.3 MTD核心结构体
2.3.1 struct mtd_info
linux内核使用mtd_info结构体表示MTD原始设备,描述一个设备或一个多分区设备中的一个分区,这其中定义了大量关于MTD的数据和操作函数;所有mtd_info结构体都被存放在mtd_info数组mtd_table中。
mtd_info定义在include/linux/mtd/mtd.h:
struct mtd_info { u_char type; // MTD设备类型 包括MTD_NORFALSH、MTD_NANDFALSH等 uint32_t flags; // 标志 MTD_WRITEABLE、MTD_NO_ERASE等 uint32_t orig_flags; /* Flags as before running mtd checks */ uint64_t size; // Total size of the MTD MTD设备总容量 /* "Major" erase size for the device. Naïve users may take this * to be the only erase size available, or may use the more detailed * information below if they desire */ uint32_t erasesize; // MTD设备擦除单位大小,对于Nand Flash来说就是Block的大小 /* Minimal writable flash unit size. In case of NOR flash it is 1 (even * though individual bits can be cleared), in case of NAND flash it is * one NAND page (or half, or one-fourths of it), in case of ECC-ed NOR * it is of ECC block size, etc. It is illegal to have writesize = 0. * Any driver registering a struct mtd_info must ensure a writesize of * 1 or larger. */ uint32_t writesize; // 可写入数据最小字节数,对于Nor Flash是字节,对于Nand Flash为一页 /* * Size of the write buffer used by the MTD. MTD devices having a write * buffer can write multiple writesize chunks at a time. E.g. while * writing 4 * writesize bytes to a device with 2 * writesize bytes * buffer the MTD driver can (but doesn't have to) do 2 writesize * operations, but not 4. Currently, all NANDs have writebufsize * equivalent to writesize (NAND page size). Some NOR flashes do have * writebufsize greater than writesize. uint32_t writebufsize; uint32_t oobsize; // Amount of OOB data per block (e.g. 16) uint32_t oobavail; // Available OOB bytes per block /* * If erasesize is a power of 2 then the shift is stored in * erasesize_shift otherwise erasesize_shift is zero. Ditto writesize. */ unsigned int erasesize_shift; // 擦除数据偏移值,根据erasesize计算 unsigned int writesize_shift; // 写入数据偏移值,根据writesize计算 /* Masks based on erasesize_shift and writesize_shift */ unsigned int erasesize_mask; // 擦除数据大小掩码,根据erasesize_shift计算 unsigned int writesize_mask; // 写入数据大小掩码,根据writesize_shift计算 /* * read ops return -EUCLEAN if max number of bitflips corrected on any * one region comprising an ecc step equals or exceeds this value. * Settable by driver, else defaults to ecc_strength. User can override * in sysfs. N.B. The meaning of the -EUCLEAN return code has changed; * see Documentation/ABI/testing/sysfs-class-mtd for more detail. */ unsigned int bitflip_threshold; /* Kernel-only stuff starts here. */ const char *name; // MTD设备名称 int index; // 索引值 /* OOB layout description */ const struct mtd_ooblayout_ops *ooblayout; // oob布局描述 /* NAND pairing scheme, only provided for MLC/TLC NANDs */ const struct mtd_pairing_scheme *pairing; /* the ecc step size. */ unsigned int ecc_step_size; /* max number of correctible bit errors per ecc step */ unsigned int ecc_strength; /* Data for variable erase regions. If numeraseregions is zero, * it means that the whole device has erasesize as given above. */ int numeraseregions; // 可变擦除区域的数目,通常为1 struct mtd_erase_region_info *eraseregions; // 可变擦除区域 /* * Do not call via these pointers, use corresponding mtd_*() * wrappers instead. */ int (*_erase) (struct mtd_info *mtd, struct erase_info *instr); // 擦除 int (*_point) (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, void **virt, resource_size_t *phys); int (*_unpoint) (struct mtd_info *mtd, loff_t from, size_t len); int (*_read) (struct mtd_info *mtd, loff_t from, size_t len, // 读取 size_t *retlen, u_char *buf); int (*_write) (struct mtd_info *mtd, loff_t to, size_t len, // 写入 size_t *retlen, const u_char *buf); int (*_panic_write) (struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen, const u_char *buf); int (*_read_oob) (struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops); int (*_write_oob) (struct mtd_info *mtd, loff_t to, struct mtd_oob_ops *ops); int (*_get_fact_prot_info) (struct mtd_info *mtd, size_t len, size_t *retlen, struct otp_info *buf); int (*_read_fact_prot_reg) (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, u_char *buf); int (*_get_user_prot_info) (struct mtd_info *mtd, size_t len, size_t *retlen, struct otp_info *buf); int (*_read_user_prot_reg) (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, u_char *buf); int (*_write_user_prot_reg) (struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen, u_char *buf); int (*_lock_user_prot_reg) (struct mtd_info *mtd, loff_t from, size_t len); int (*_writev) (struct mtd_info *mtd, const struct kvec *vecs, unsigned long count, loff_t to, size_t *retlen); void (*_sync) (struct mtd_info *mtd); int (*_lock) (struct mtd_info *mtd, loff_t ofs, uint64_t len); int (*_unlock) (struct mtd_info *mtd, loff_t ofs, uint64_t len); int (*_is_locked) (struct mtd_info *mtd, loff_t ofs, uint64_t len); int (*_block_isreserved) (struct mtd_info *mtd, loff_t ofs); int (*_block_isbad) (struct mtd_info *mtd, loff_t ofs); int (*_block_markbad) (struct mtd_info *mtd, loff_t ofs); int (*_max_bad_blocks) (struct mtd_info *mtd, loff_t ofs, size_t len); int (*_suspend) (struct mtd_info *mtd); void (*_resume) (struct mtd_info *mtd); void (*_reboot) (struct mtd_info *mtd); /* * If the driver is something smart, like UBI, it may need to maintain * its own reference counting. The below functions are only for driver. */ int (*_get_device) (struct mtd_info *mtd); void (*_put_device) (struct mtd_info *mtd); struct notifier_block reboot_notifier; /* default mode before reboot */ /* ECC status information */ struct mtd_ecc_stats ecc_stats; /* Subpage shift (NAND) */ int subpage_sft; void *priv; struct module *owner; struct device dev; int usecount; struct mtd_debug_info dbg; struct nvmem_device *nvmem; };
mtd_info结构体中的read()、write()、read_oob()、write_oob()、erase()是MTD设备驱动要实现的主要函数,这是MTD原始设备与Flash驱动层之间的接口;linux已经已经帮我们实现了一套适合大部分Flash设备的mtd_info成员函数。
2.3.2 struct mtd_part
在MTD中使用mtd_part来表示分区,其中包含了mtd_info,每一个分区都是被看做一个MTD原始设备,在mtd_table中,mtd_part.mtd_info中的大部分数据都从该分区的主分区mtd_part->master中获得。master不作为一个MTD原始设备加入mtd_table中。
mtd_part定义在drivers/mtd/mtdpart.c:
/** * struct mtd_part - our partition node structure * * @mtd: struct holding partition details * @parent: parent mtd - flash device or another partition * @offset: partition offset relative to the *flash device* */ struct mtd_part { struct mtd_info mtd; // 分区信息 struct mtd_info *parent; // 分区的主分区 uint64_t offset; // 分区的偏移地址 struct list_head list; // 双向链表,将mtd_part链接成一个链表 };
2.3.3 struct mtd_partition
在MTD中用mtd_partition来表示分区的信息,mtd_partition定义在include/linux/mtd/partitions.h:
/* * Partition definition structure: * * An array of struct partition is passed along with a MTD object to * mtd_device_register() to create them. * * For each partition, these fields are available: * name: string that will be used to label the partition's MTD device. * types: some partitions can be containers using specific format to describe * embedded subpartitions / volumes. E.g. many home routers use "firmware" * partition that contains at least kernel and rootfs. In such case an * extra parser is needed that will detect these dynamic partitions and * report them to the MTD subsystem. If set this property stores an array * of parser names to use when looking for subpartitions. * size: the partition size; if defined as MTDPART_SIZ_FULL, the partition * will extend to the end of the master MTD device. * offset: absolute starting position within the master MTD device; if * defined as MTDPART_OFS_APPEND, the partition will start where the * previous one ended; if MTDPART_OFS_NXTBLK, at the next erase block; * if MTDPART_OFS_RETAIN, consume as much as possible, leaving size * after the end of partition. * mask_flags: contains flags that have to be masked (removed) from the * master MTD flag set for the corresponding MTD partition. * For example, to force a read-only partition, simply adding * MTD_WRITEABLE to the mask_flags will do the trick. * * Note: writeable partitions require their size and offset be * erasesize aligned (e.g. use MTDPART_OFS_NEXTBLK). */ struct mtd_partition { const char *name; /* identifier string 分区名 */ const char *const *types; /* names of parsers to use if any */ uint64_t size; /* partition size 分区大小 */ uint64_t offset; /* offset within the master MTD space 分区的偏移值 */ uint32_t mask_flags; /* master MTD flags to mask out for this partition 标志掩码 */ struct device_node *of_node; };
2.4 Nand相关结构体
2.4.1 struct nand_chip
nand_chip是一个比较重要的数据结构,MTD使用nand_chip来表示一个Nand Flash内部的芯片,该结构体包含了关于Nand Flash的内存模型信息,读写方法,ECC模式,硬件控制等一系列底层机制。其定义在include/linux/mtd/rawnand.h:
/** * struct nand_chip - NAND Private Flash Chip Data * @base: Inherit from the generic NAND device * @legacy: All legacy fields/hooks. If you develop a new driver, * don't even try to use any of these fields/hooks, and if * you're modifying an existing driver that is using those * fields/hooks, you should consider reworking the driver * avoid using them. * @setup_read_retry: [FLASHSPECIFIC] flash (vendor) specific function for * setting the read-retry mode. Mostly needed for MLC NAND. * @ecc: [BOARDSPECIFIC] ECC control structure * @buf_align: minimum buffer alignment required by a platform * @oob_poi: "poison value buffer," used for laying out OOB data * before writing * @page_shift: [INTERN] number of address bits in a page (column * address bits). * @phys_erase_shift: [INTERN] number of address bits in a physical eraseblock * @bbt_erase_shift: [INTERN] number of address bits in a bbt entry * @chip_shift: [INTERN] number of address bits in one chip * @options: [BOARDSPECIFIC] various chip options. They can partly * be set to inform nand_scan about special functionality. * See the defines for further explanation. * @bbt_options: [INTERN] bad block specific options. All options used * here must come from bbm.h. By default, these options * will be copied to the appropriate nand_bbt_descr's. * @badblockpos: [INTERN] position of the bad block marker in the oob * area. * @badblockbits: [INTERN] minimum number of set bits in a good block's * bad block marker position; i.e., BBM == 11110111b is * not bad when badblockbits == 7 * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is * set to the actually used ONFI mode if the chip is * ONFI compliant or deduced from the datasheet if * the NAND chip is not ONFI compliant. * @pagemask: [INTERN] page number mask = number of (pages / chip) - 1 * @data_buf: [INTERN] buffer for data, size is (page size + oobsize). * @pagecache: Structure containing page cache related fields * @pagecache.bitflips: Number of bitflips of the cached page * @pagecache.page: Page number currently in the cache. -1 means no page is * currently cached * @subpagesize: [INTERN] holds the subpagesize * @id: [INTERN] holds NAND ID * @parameters: [INTERN] holds generic parameters under an easily * readable form. * @data_interface: [INTERN] NAND interface timing information * @cur_cs: currently selected target. -1 means no target selected, * otherwise we should always have cur_cs >= 0 && * cur_cs < nanddev_ntargets(). NAND Controller drivers * should not modify this value, but they're allowed to * read it. * @read_retries: [INTERN] the number of read retry modes supported * @lock: lock protecting the suspended field. Also used to * serialize accesses to the NAND device. * @suspended: set to 1 when the device is suspended, 0 when it's not. * @bbt: [INTERN] bad block table pointer * @bbt_td: [REPLACEABLE] bad block table descriptor for flash * lookup. * @bbt_md: [REPLACEABLE] bad block table mirror descriptor * @badblock_pattern: [REPLACEABLE] bad block scan pattern used for initial * bad block scan. * @controller: [REPLACEABLE] a pointer to a hardware controller * structure which is shared among multiple independent * devices. * @priv: [OPTIONAL] pointer to private chip data * @manufacturer: [INTERN] Contains manufacturer information * @manufacturer.desc: [INTERN] Contains manufacturer's description * @manufacturer.priv: [INTERN] Contains manufacturer private information */ struct nand_chip { struct nand_device base; // 可以看作mtd_info子类 struct nand_legacy legacy; // 硬件操作函数 int (*setup_read_retry)(struct nand_chip *chip, int retry_mode); unsigned int options; // 与具体的nand芯片相关的一些选项,如NAND_BUSWIDTH_16等 unsigned int bbt_options; int page_shift; // 用来表示nand芯片的page大小,如某nand芯片的一个page有512个字节,那么该值就是9 int phys_erase_shift; // 用来表示nand芯片每次可擦除的大小,如某nand芯片每次可擦除16kb(通常为一个block大小),那么该值就是14 int bbt_erase_shift; // 用来表示bad block table的大小,通常bbt占用一个block,所以该值通常和phys_erase_shift相同 int chip_shift; // 使用位表示nand芯片的容量 int pagemask; // nand总容量/每页字节数 - 1 得到页掩码 u8 *data_buf; struct { unsigned int bitflips; int page; } pagecache; int subpagesize; int onfi_timing_mode_default; unsigned int badblockpos; int badblockbits; struct nand_id id; // 保存从nand读取到的设备id信息,包含厂家ID、设备ID等 struct nand_parameters parameters; struct nand_data_interface data_interface; int cur_cs; // 当前选中的目标 int read_retries; struct mutex lock; unsigned int suspended : 1; uint8_t *oob_poi; struct nand_controller *controller; // nand controller struct nand_ecc_ctrl ecc; // ecc校验结构体,里面有大量函数进行ecc校验 unsigned long buf_align; uint8_t *bbt; struct nand_bbt_descr *bbt_td; struct nand_bbt_descr *bbt_md; struct nand_bbt_descr *badblock_pattern; void *priv; struct { const struct nand_manufacturer *desc; void *priv; } manufacturer; // 厂家ID信息 };
nand_chip中的ecc主要做一些与ecc有关的操作,如read_page_raw、write_pager_raw,里面含有大量函数进行ecc校验。
nand_chip中的legacy中读写函数,如read_buf、cmdfunc等,与具体的Nand Controller相关,这部分函数与硬件交互,通常需要我们自己根据SOC Nand Controller来实现。
2.4.2 struct nand_legacy
nand_legacy该结构体就是保存与SOC Nand Controller硬件相关的函数:
/** * struct nand_legacy - NAND chip legacy fields/hooks * @IO_ADDR_R: address to read the 8 I/O lines of the flash device * @IO_ADDR_W: address to write the 8 I/O lines of the flash device * @select_chip: select/deselect a specific target/die * @read_byte: read one byte from the chip * @write_byte: write a single byte to the chip on the low 8 I/O lines * @write_buf: write data from the buffer to the chip * @read_buf: read data from the chip into the buffer * @cmd_ctrl: hardware specific function for controlling ALE/CLE/nCE. Also used * to write command and address * @cmdfunc: hardware specific function for writing commands to the chip. * @dev_ready: hardware specific function for accessing device ready/busy line. * If set to NULL no access to ready/busy is available and the * ready/busy information is read from the chip status register. * @waitfunc: hardware specific function for wait on ready. * @block_bad: check if a block is bad, using OOB markers * @block_markbad: mark a block bad * @set_features: set the NAND chip features * @get_features: get the NAND chip features * @chip_delay: chip dependent delay for transferring data from array to read * regs (tR). * @dummy_controller: dummy controller implementation for drivers that can * only control a single chip * * If you look at this structure you're already wrong. These fields/hooks are * all deprecated. */ struct nand_legacy { void __iomem *IO_ADDR_R; // 读8根I/O线地址 比如S3C2440设置为数据寄存器地址 NFDATA void __iomem *IO_ADDR_W; // 写8根I/O线地址 比如S3C2440设置为数据寄存器地址 NFDATA void (*select_chip)(struct nand_chip *chip, int cs); // 片选/取消片选 u8 (*read_byte)(struct nand_chip *chip); // 读取一个字节数据 void (*write_byte)(struct nand_chip *chip, u8 byte); // 写入一个字节数据 void (*write_buf)(struct nand_chip *chip, const u8 *buf, int len); // 写入len个长度字节 void (*read_buf)(struct nand_chip *chip, u8 *buf, int len); // 读取len个长度字节 void (*cmd_ctrl)(struct nand_chip *chip, int dat, unsigned int ctrl); // 硬件相关控制函数 写命令/地址 void (*cmdfunc)(struct nand_chip *chip, unsigned command, int column, // 发送写数据命令 传入列地址、页地址 int page_addr); int (*dev_ready)(struct nand_chip *chip); // 获取nand状态 繁忙/就绪 int (*waitfunc)(struct nand_chip *chip); // 等待nand就绪 int (*block_bad)(struct nand_chip *chip, loff_t ofs); // 检测是否有坏块 int (*block_markbad)(struct nand_chip *chip, loff_t ofs); // 标记坏块 int (*set_features)(struct nand_chip *chip, int feature_addr, u8 *subfeature_para); int (*get_features)(struct nand_chip *chip, int feature_addr, u8 *subfeature_para); int chip_delay; // 延迟时间 struct nand_controller dummy_controller; };
2.4.3 struct nand_ecc_ctrl
nand_ecc_ctrl中的读写函数read_page_raw、write_pager_raw等主要是用来做一些与ecc有关的操作:
/** * struct nand_ecc_ctrl - Control structure for ECC * @mode: ECC mode * @algo: ECC algorithm * @steps: number of ECC steps per page * @size: data bytes per ECC step * @bytes: ECC bytes per step * @strength: max number of correctible bits per ECC step * @total: total number of ECC bytes per page * @prepad: padding information for syndrome based ECC generators * @postpad: padding information for syndrome based ECC generators * @options: ECC specific options (see NAND_ECC_XXX flags defined above) * @priv: pointer to private ECC control data * @calc_buf: buffer for calculated ECC, size is oobsize. * @code_buf: buffer for ECC read from flash, size is oobsize. * @hwctl: function to control hardware ECC generator. Must only * be provided if an hardware ECC is available * @calculate: function for ECC calculation or readback from ECC hardware * @correct: function for ECC correction, matching to ECC generator (sw/hw). * Should return a positive number representing the number of * corrected bitflips, -EBADMSG if the number of bitflips exceed * ECC strength, or any other error code if the error is not * directly related to correction. * If -EBADMSG is returned the input buffers should be left * untouched. * @read_page_raw: function to read a raw page without ECC. This function * should hide the specific layout used by the ECC * controller and always return contiguous in-band and * out-of-band data even if they're not stored * contiguously on the NAND chip (e.g. * NAND_ECC_HW_SYNDROME interleaves in-band and * out-of-band data). * @write_page_raw: function to write a raw page without ECC. This function * should hide the specific layout used by the ECC * controller and consider the passed data as contiguous * in-band and out-of-band data. ECC controller is * responsible for doing the appropriate transformations * to adapt to its specific layout (e.g. * NAND_ECC_HW_SYNDROME interleaves in-band and * out-of-band data). * @read_page: function to read a page according to the ECC generator * requirements; returns maximum number of bitflips corrected in * any single ECC step, -EIO hw error * @read_subpage: function to read parts of the page covered by ECC; * returns same as read_page() * @write_subpage: function to write parts of the page covered by ECC. * @write_page: function to write a page according to the ECC generator * requirements. * @write_oob_raw: function to write chip OOB data without ECC * @read_oob_raw: function to read chip OOB data without ECC * @read_oob: function to read chip OOB data * @write_oob: function to write chip OOB data */ struct nand_ecc_ctrl { nand_ecc_modes_t mode; enum nand_ecc_algo algo; int steps; int size; int bytes; int total; int strength; int prepad; int postpad; unsigned int options; void *priv; u8 *calc_buf; u8 *code_buf; void (*hwctl)(struct nand_chip *chip, int mode); int (*calculate)(struct nand_chip *chip, const uint8_t *dat, uint8_t *ecc_code); int (*correct)(struct nand_chip *chip, uint8_t *dat, uint8_t *read_ecc, uint8_t *calc_ecc); int (*read_page_raw)(struct nand_chip *chip, uint8_t *buf, int oob_required, int page); int (*write_page_raw)(struct nand_chip *chip, const uint8_t *buf, int oob_required, int page); int (*read_page)(struct nand_chip *chip, uint8_t *buf, int oob_required, int page); int (*read_subpage)(struct nand_chip *chip, uint32_t offs, uint32_t len, uint8_t *buf, int page); int (*write_subpage)(struct nand_chip *chip, uint32_t offset, uint32_t data_len, const uint8_t *data_buf, int oob_required, int page); int (*write_page)(struct nand_chip *chip, const uint8_t *buf, int oob_required, int page); int (*write_oob_raw)(struct nand_chip *chip, int page); int (*read_oob_raw)(struct nand_chip *chip, int page); int (*read_oob)(struct nand_chip *chip, int page); int (*write_oob)(struct nand_chip *chip, int page); };
2.4.4 struct nand_manufacturer
nand_manufacturer保存生产厂家信息,定义在drivers/mtd/nand/raw/internals.h:
/* * NAND Flash Manufacturer ID Codes */ #define NAND_MFR_AMD 0x01 #define NAND_MFR_ATO 0x9b #define NAND_MFR_EON 0x92 #define NAND_MFR_ESMT 0xc8 #define NAND_MFR_FUJITSU 0x04 #define NAND_MFR_HYNIX 0xad #define NAND_MFR_INTEL 0x89 #define NAND_MFR_MACRONIX 0xc2 #define NAND_MFR_MICRON 0x2c #define NAND_MFR_NATIONAL 0x8f #define NAND_MFR_RENESAS 0x07 #define NAND_MFR_SAMSUNG 0xec // 三星厂家 #define NAND_MFR_SANDISK 0x45 #define NAND_MFR_STMICRO 0x20 #define NAND_MFR_TOSHIBA 0x98 #define NAND_MFR_WINBOND 0xef /** * struct nand_manufacturer_ops - NAND Manufacturer operations * @detect: detect the NAND memory organization and capabilities * @init: initialize all vendor specific fields (like the ->read_retry() * implementation) if any. * @cleanup: the ->init() function may have allocated resources, ->cleanup() * is here to let vendor specific code release those resources. * @fixup_onfi_param_page: apply vendor specific fixups to the ONFI parameter * page. This is called after the checksum is verified. */ struct nand_manufacturer_ops { void (*detect)(struct nand_chip *chip); int (*init)(struct nand_chip *chip); void (*cleanup)(struct nand_chip *chip); void (*fixup_onfi_param_page)(struct nand_chip *chip, struct nand_onfi_params *p); }; /** * struct nand_manufacturer - NAND Flash Manufacturer structure * @name: Manufacturer name * @id: manufacturer ID code of device. * @ops: manufacturer operations */ struct nand_manufacturer { int id; // 厂家ID char *name; // 厂家名字 const struct nand_manufacturer_ops *ops; // 操作函数 };
2.4.5 struct nand_device
struct nand_device定义在include/linux/mtd/nand.h:
/** * struct nand_device - NAND device * @mtd: MTD instance attached to the NAND device * @memorg: memory layout * @eccreq: ECC requirements * @rowconv: position to row address converter * @bbt: bad block table info * @ops: NAND operations attached to the NAND device * * Generic NAND object. Specialized NAND layers (raw NAND, SPI NAND, OneNAND) * should declare their own NAND object embedding a nand_device struct (that's * how inheritance is done). * struct_nand_device->memorg and struct_nand_device->eccreq should be filled * at device detection time to reflect the NAND device * capabilities/requirements. Once this is done nanddev_init() can be called. * It will take care of converting NAND information into MTD ones, which means * the specialized NAND layers should never manually tweak * struct_nand_device->mtd except for the ->_read/write() hooks. */ struct nand_device { struct mtd_info mtd; struct nand_memory_organization memorg; struct nand_ecc_req eccreq; struct nand_row_converter rowconv; struct nand_bbt bbt; const struct nand_ops *ops; };
2.5 Nor相关结构体
2.5.1 struct map_info
struct map_info定义在include/linux/mtd/map.h,内存映射结构体描述具体Nor Flash芯片的基本信息,主要包括芯片名字、大小、位宽、芯片、在系统中的起始物理地址等,在具体的芯片驱动文件中定义。
/* The map stuff is very simple. You fill in your struct map_info with a handful of routines for accessing the device, making sure they handle paging etc. correctly if your device needs it. Then you pass it off to a chip probe routine -- either JEDEC or CFI probe or both -- via do_map_probe(). If a chip is recognised, the probe code will invoke the appropriate chip driver (if present) and return a struct mtd_info. At which point, you fill in the mtd->module with your own module address, and register it with the MTD core code. Or you could partition it and register the partitions instead, or keep it for your own private use; whatever. The mtd->priv field will point to the struct map_info, and any further private data required by the chip driver is linked from the mtd->priv->fldrv_priv field. This allows the map driver to get at the destructor function map->fldrv_destroy() when it's tired of living. */ struct map_info { const char *name; // Nor Flash名称 unsigned long size; // 容量大小 resource_size_t phys; // 物理基地址 #define NO_XIP (-1UL) void __iomem *virt; // 虚拟地址 void *cached; int swap; /* this mapping's byte-swapping requirement */ int bankwidth; /* in octets. This isn't necessarily the width of actual bus cycles -- it's the repeat interval in bytes, before you are talking to the first chip again. */ #ifdef CONFIG_MTD_COMPLEX_MAPPINGS map_word (*read)(struct map_info *, unsigned long); void (*copy_from)(struct map_info *, void *, unsigned long, ssize_t); void (*write)(struct map_info *, const map_word, unsigned long); void (*copy_to)(struct map_info *, unsigned long, const void *, ssize_t); /* We can perhaps put in 'point' and 'unpoint' methods, if we really want to enable XIP for non-linear mappings. Not yet though. */ #endif /* It's possible for the map driver to use cached memory in its copy_from implementation (and _only_ with copy_from). However, when the chip driver knows some flash area has changed contents, it will signal it to the map driver through this routine to let the map driver invalidate the corresponding cache as needed. If there is no cache to care about this can be set to NULL. */ void (*inval_cache)(struct map_info *, unsigned long, ssize_t); /* This will be called with 1 as parameter when the first map user * needs VPP, and called with 0 when the last user exits. The map * core maintains a reference counter, and assumes that VPP is a * global resource applying to all mapped flash chips on the system. */ void (*set_vpp)(struct map_info *, int); unsigned long pfow_base; unsigned long map_priv_1; // 私有数据 unsigned long map_priv_2; // 私有数据 struct device_node *device_node; void *fldrv_priv; struct mtd_chip_driver *fldrv; };
2.6 结构体关系图
注意:重点关注上图结构体中的使用灰色标注的成员变量。
- 我们使用nand_chip结构来描述一个Nand Flash内部的chip;
- 使用nand_chip.base.mtd描述整个MTD设备(一般使用nand_scan函数探测Nand芯片信息,并进行nand_chip.base.mtd成员初始化);
- 使用mtd_part描述MTD设备的每一个分区(如果只有一个分区,就不存在mtd_part);
2.7 核心函数
如果MTD设备只有一个分区,那么使用下面两个函数注册和注销MTD设备:
int add_mtd_device(struct mtd_info *mtd) int del_mtd_device (struct mtd_info *mtd)
如果MTD设备存在其他分区,那么使用下面两个函数注册和注销MTD设备:
int add_mtd_partitions(struct mtd_info *master,const struct mtd_partition *parts,int nbparts) int del_mtd_partitions(struct mtd_info *master)
三、MTD设备注册
3.1 add_mtd_device
add_mtd_device定义在drivers/mtd/mtdcore.c,该函数主要就是用来进行MTD设备的注册;
/** * add_mtd_device - register an MTD device * @mtd: pointer to new MTD device info structure * * Add a device to the list of MTD devices present in the system, and * notify each currently active MTD 'user' of its arrival. Returns * zero on success or non-zero on failure. */ int add_mtd_device(struct mtd_info *mtd) { struct mtd_notifier *not; int i, error; /* * May occur, for instance, on buggy drivers which call * mtd_device_parse_register() multiple times on the same master MTD, * especially with CONFIG_MTD_PARTITIONED_MASTER=y. */ if (WARN_ONCE(mtd->dev.type, "MTD already registered\n")) return -EEXIST; BUG_ON(mtd->writesize == 0); /* * MTD drivers should implement ->_{write,read}() or * ->_{write,read}_oob(), but not both. */ if (WARN_ON((mtd->_write && mtd->_write_oob) || // 校验函数指针 (mtd->_read && mtd->_read_oob))) return -EINVAL; if (WARN_ON((!mtd->erasesize || !mtd->_erase) && !(mtd->flags & MTD_NO_ERASE))) return -EINVAL; mutex_lock(&mtd_table_mutex); // 互斥锁 i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 为mtd设备分配index if (i < 0) { error = i; goto fail_locked; } mtd->index = i; mtd->usecount = 0; /* default value if not set by driver */ if (mtd->bitflip_threshold == 0) // 计算擦除数据偏移 mtd->bitflip_threshold = mtd->ecc_strength; if (is_power_of_2(mtd->erasesize)) mtd->erasesize_shift = ffs(mtd->erasesize) - 1; else mtd->erasesize_shift = 0; if (is_power_of_2(mtd->writesize)) // 计算写入数据偏移值 mtd->writesize_shift = ffs(mtd->writesize) - 1; else mtd->writesize_shift = 0; mtd->erasesize_mask = (1 << mtd->erasesize_shift) - 1; // 计算擦除数据大小掩码 mtd->writesize_mask = (1 << mtd->writesize_shift) - 1; // 计算写入数据大小掩码 /* Some chips always power up locked. Unlock them now */ if ((mtd->flags & MTD_WRITEABLE) && (mtd->flags & MTD_POWERUP_LOCK)) { // 有些芯片总是通电锁定,立即解锁(一般flash芯片都支持lock机制,在驱动上很少使用) error = mtd_unlock(mtd, 0, mtd->size); if (error && error != -EOPNOTSUPP) printk(KERN_WARNING "%s: unlock failed, writes may not work\n", mtd->name); /* Ignore unlock failures? */ error = 0; } /* Caller should have set dev.parent to match the * physical device, if appropriate. */ mtd->dev.type = &mtd_devtype; // 设置设备类型 mtd->dev.class = &mtd_class; // 设置设备类 会在/syc/class创建mtd类 mtd->dev.devt = MTD_DEVT(i); // 设置设备号,关于设备号的申请是在mtdchar.c模块入口函数中完成的 dev_set_name(&mtd->dev, "mtd%d", i); // 设置设备节点名字mtd%d dev_set_drvdata(&mtd->dev, mtd); // mtd->dev.driver_data = mtd; of_node_get(mtd_get_of_node(mtd)); error = device_register(&mtd->dev); // 注册MTD字符设备,会在/sys/class/mtd类下创建mtd%d文件,然后mdev通过这个自动创建/dev/mtd%d这个字符设备节点 if (error) goto fail_added; /* Add the nvmem provider */ error = mtd_nvmem_add(mtd); if (error) goto fail_nvmem_add; if (!IS_ERR_OR_NULL(dfs_dir_mtd)) { mtd->dbg.dfs_dir = debugfs_create_dir(dev_name(&mtd->dev), dfs_dir_mtd); if (IS_ERR_OR_NULL(mtd->dbg.dfs_dir)) { pr_debug("mtd device %s won't show data in debugfs\n", dev_name(&mtd->dev)); } } device_create(&mtd_class, mtd->dev.parent, MTD_DEVT(i) + 1, NULL, // 创建MTD字符设备,内部调用了device_register 在/sys/class/mtd下创建mtd%dro设备,然后mdev通过这个自动创建/dev/mtd%dro这个字符设备节点 "mtd%dro", i); pr_debug("mtd: Giving out device %d to %s\n", i, mtd->name); /* No need to get a refcount on the module containing the notifier, since we hold the mtd_table_mutex */ list_for_each_entry(not, &mtd_notifiers, list) // 调用mtd子系统的notify机制,实现针对mtd设备添加、移除,移除notify机制,实现注册的notify hook not->add(mtd); mutex_unlock(&mtd_table_mutex); // 解锁 /* We _know_ we aren't being removed, because our caller is still holding us here. So none of this try_ nonsense, and no bitching about it either. :) */ __module_get(THIS_MODULE); return 0; fail_nvmem_add: device_unregister(&mtd->dev); fail_added: of_node_put(mtd_get_of_node(mtd)); idr_remove(&mtd_idr, i); fail_locked: mutex_unlock(&mtd_table_mutex); return error; }
该函数主要进行了以下操作:
(1) 对mtd原始设备必要字段以及函数指针进行校验;
(2) 在mtd_idr树中为该mtd原始设备分配节点,并返回分配的节点ID:
i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 分配ID mtd_idr是一个redix树、将mtd与新分配的ID关联
idr_alloc函数用于为mtd_idr树新增一个节点,该节点在mtd_idr树中有唯一的ID,并且将这个节点与mtd关联。通过ID就可以定位到mtd。
此外该函数第三个参数和第四个参数含义如下:为ID的起始范围,结束范围设置为0,表示mtd_idr树允许的最大ID。
全局变量mtd_idr定义在drivers/mtd/mtdcore.c:
static DEFINE_IDR(mtd_idr);
关于IDR的定义这里就不介绍了,IDR主要实现ID与数据结构的绑定具体可以参考linux内核IDR机制详解(一)。
后续字符设备及块设备注册需要该ID,比如后面设置mtd设备对应的device类型变量设备号为MTD_DEVT(i);
#define MTD_DEVT(index) MKDEV(MTD_CHAR_MAJOR, (index)*2)
主设备号为MTD_CHAR_MAJOR,即90,次设备号为index*2;
(3) 设备mtd原始设备的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息;
(4) 针对设置可写属性,且上电时对Flash进行lock的芯片,则调用unlock接口,进行解锁(一般Flasg芯片都支持lock机制,但在驱动上很少使用);
(5) 设置mtd原始设备对应的device类型变量所属的class为mtd_class,并设置其设备号,类型、名称、driver_data;
mtd_class定义为:
static struct class mtd_class = { .name = "mtd", .owner = THIS_MODULE, .pm = MTD_CLS_PM_OPS, };
(6) 调用device_register完成名字为mtd%d MTD字符设备的注册;
(7)调用device_create完成名字为mtd%dro MTD字符设备的创建、初始化以及注册;
(8) 调用mtd子系统的notify机制,实现针对mtd设备添加、移除,移除notify机制,实现注册的notify hook;
list_for_each_entry(not, &mtd_notifiers, list)
not->add(mtd);
list_for_each_entry函数包含三个参数,以此为pos、head、member;它实际上是一个for循环,利用传入的pos作为循环变量,从链表头head开始,逐项向后(next方向)移动pos,直至又回到head。
链表mtd_notifiers定义为:
static LIST_HEAD(mtd_notifiers);
这里实际上就是遍历这个链表得到当前时刻的元素not,类型为mtd_notifiers,然后调用not->add(mtd)方法,在这个方法里会进行名字为mtdblock%d MTD块设备的注册。
3.2 add_mtd_partitions
add_mtd_partitions定义在drivers/mtd/mtdpart.c,该函数主要就是遍历分区表,为每个分区注册一个MTD设备;
/* * This function, given a master MTD object and a partition table, creates * and registers slave MTD objects which are bound to the master according to * the partition definitions. * * For historical reasons, this function's caller only registers the master * if the MTD_PARTITIONED_MASTER config option is set. */ int add_mtd_partitions(struct mtd_info *master, // MTD设备信息 const struct mtd_partition *parts, // 分区表 int nbparts) // 分区个数 { struct mtd_part *slave; uint64_t cur_offset = 0; int i, ret; printk(KERN_NOTICE "Creating %d MTD partitions on \"%s\":\n", nbparts, master->name); for (i = 0; i < nbparts; i++) { // 遍历分区表 slave = allocate_partition(master, parts + i, i, cur_offset); // 分配mtd_part if (IS_ERR(slave)) { ret = PTR_ERR(slave); goto err_del_partitions; } mutex_lock(&mtd_partitions_mutex); list_add(&slave->list, &mtd_partitions); // slave添加到链表mtd_partitions mutex_unlock(&mtd_partitions_mutex); ret = add_mtd_device(&slave->mtd); // 为每个分区注册mtd设备,会在/dev下成成mtdblock%d文件块设备文件 if (ret) { mutex_lock(&mtd_partitions_mutex); list_del(&slave->list); mutex_unlock(&mtd_partitions_mutex); free_partition(slave); goto err_del_partitions; } mtd_add_partition_attrs(slave); /* Look for subpartitions */ parse_mtd_partitions(&slave->mtd, parts[i].types, NULL); cur_offset = slave->offset + slave->mtd.size; } return 0; err_del_partitions: del_mtd_partitions(master); return ret; }
3.2.1 allocate_partition
allocate_partition定义在drivers/mtd/mtdpart,该函数主要是根据mstart MTD以及第cur_offset(从0开始)分区表信息,动态分配并初始化mtd_part:
static struct mtd_part *allocate_partition(struct mtd_info *parent, const struct mtd_partition *part, int partno, uint64_t cur_offset) { int wr_alignment = (parent->flags & MTD_NO_ERASE) ? parent->writesize : parent->erasesize; struct mtd_part *slave; u32 remainder; char *name; u64 tmp; /* allocate the partition structure */ slave = kzalloc(sizeof(*slave), GFP_KERNEL); name = kstrdup(part->name, GFP_KERNEL); if (!name || !slave) { printk(KERN_ERR"memory allocation error while creating partitions for \"%s\"\n", parent->name); kfree(name); kfree(slave); return ERR_PTR(-ENOMEM); } /* set up the MTD object for this partition */ slave->mtd.type = parent->type; slave->mtd.flags = parent->orig_flags & ~part->mask_flags; slave->mtd.orig_flags = slave->mtd.flags; slave->mtd.size = part->size; slave->mtd.writesize = parent->writesize; slave->mtd.writebufsize = parent->writebufsize; slave->mtd.oobsize = parent->oobsize; slave->mtd.oobavail = parent->oobavail; slave->mtd.subpage_sft = parent->subpage_sft; slave->mtd.pairing = parent->pairing; slave->mtd.name = name; slave->mtd.owner = parent->owner; /* NOTE: Historically, we didn't arrange MTDs as a tree out of * concern for showing the same data in multiple partitions. * However, it is very useful to have the master node present, * so the MTD_PARTITIONED_MASTER option allows that. The master * will have device nodes etc only if this is set, so make the * parent conditional on that option. Note, this is a way to * distinguish between the master and the partition in sysfs. */ slave->mtd.dev.parent = IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER) || mtd_is_partition(parent) ? &parent->dev : parent->dev.parent; slave->mtd.dev.of_node = part->of_node; if (parent->_read) slave->mtd._read = part_read; if (parent->_write) slave->mtd._write = part_write; if (parent->_panic_write) slave->mtd._panic_write = part_panic_write; if (parent->_point && parent->_unpoint) { slave->mtd._point = part_point; slave->mtd._unpoint = part_unpoint; } if (parent->_read_oob) slave->mtd._read_oob = part_read_oob; if (parent->_write_oob) slave->mtd._write_oob = part_write_oob; if (parent->_read_user_prot_reg) slave->mtd._read_user_prot_reg = part_read_user_prot_reg; if (parent->_read_fact_prot_reg) slave->mtd._read_fact_prot_reg = part_read_fact_prot_reg; if (parent->_write_user_prot_reg) slave->mtd._write_user_prot_reg = part_write_user_prot_reg; if (parent->_lock_user_prot_reg) slave->mtd._lock_user_prot_reg = part_lock_user_prot_reg; if (parent->_get_user_prot_info) slave->mtd._get_user_prot_info = part_get_user_prot_info; if (parent->_get_fact_prot_info) slave->mtd._get_fact_prot_info = part_get_fact_prot_info; if (parent->_sync) slave->mtd._sync = part_sync; if (!partno && !parent->dev.class && parent->_suspend && parent->_resume) { slave->mtd._suspend = part_suspend; slave->mtd._resume = part_resume; } if (parent->_writev) slave->mtd._writev = part_writev; if (parent->_lock) slave->mtd._lock = part_lock; if (parent->_unlock) slave->mtd._unlock = part_unlock; if (parent->_is_locked) slave->mtd._is_locked = part_is_locked; if (parent->_block_isreserved) slave->mtd._block_isreserved = part_block_isreserved; if (parent->_block_isbad) slave->mtd._block_isbad = part_block_isbad; if (parent->_block_markbad) slave->mtd._block_markbad = part_block_markbad; if (parent->_max_bad_blocks) slave->mtd._max_bad_blocks = part_max_bad_blocks; if (parent->_get_device) slave->mtd._get_device = part_get_device; if (parent->_put_device) slave->mtd._put_device = part_put_device; slave->mtd._erase = part_erase; slave->parent = parent; // 分区的主分区 slave->offset = part->offset; // 分区偏移地址 if (slave->offset == MTDPART_OFS_APPEND) slave->offset = cur_offset; if (slave->offset == MTDPART_OFS_NXTBLK) { tmp = cur_offset; slave->offset = cur_offset; remainder = do_div(tmp, wr_alignment); if (remainder) { slave->offset += wr_alignment - remainder; printk(KERN_NOTICE "Moving partition %d: " "0x%012llx -> 0x%012llx\n", partno, (unsigned long long)cur_offset, (unsigned long long)slave->offset); } } if (slave->offset == MTDPART_OFS_RETAIN) { slave->offset = cur_offset; if (parent->size - slave->offset >= slave->mtd.size) { slave->mtd.size = parent->size - slave->offset - slave->mtd.size; } else { printk(KERN_ERR "mtd partition \"%s\" doesn't have enough space: %#llx < %#llx, disabled\n", part->name, parent->size - slave->offset, slave->mtd.size); /* register to preserve ordering */ goto out_register; } } if (slave->mtd.size == MTDPART_SIZ_FULL) slave->mtd.size = parent->size - slave->offset; printk(KERN_NOTICE "0x%012llx-0x%012llx : \"%s\"\n", (unsigned long long)slave->offset, (unsigned long long)(slave->offset + slave->mtd.size), slave->mtd.name); /* let's do some sanity checks */ if (slave->offset >= parent->size) { /* let's register it anyway to preserve ordering */ slave->offset = 0; slave->mtd.size = 0; /* Initialize ->erasesize to make add_mtd_device() happy. */ slave->mtd.erasesize = parent->erasesize; printk(KERN_ERR"mtd: partition \"%s\" is out of reach -- disabled\n", part->name); goto out_register; } if (slave->offset + slave->mtd.size > parent->size) { slave->mtd.size = parent->size - slave->offset; printk(KERN_WARNING"mtd: partition \"%s\" extends beyond the end of device \"%s\" -- size truncated to %#llx\n", part->name, parent->name, (unsigned long long)slave->mtd.size); } if (parent->numeraseregions > 1) { /* Deal with variable erase size stuff */ int i, max = parent->numeraseregions; u64 end = slave->offset + slave->mtd.size; struct mtd_erase_region_info *regions = parent->eraseregions; /* Find the first erase regions which is part of this * partition. */ for (i = 0; i < max && regions[i].offset <= slave->offset; i++) ; /* The loop searched for the region _behind_ the first one */ if (i > 0) i--; /* Pick biggest erasesize */ for (; i < max && regions[i].offset < end; i++) { if (slave->mtd.erasesize < regions[i].erasesize) { slave->mtd.erasesize = regions[i].erasesize; } } BUG_ON(slave->mtd.erasesize == 0); } else { /* Single erase size */ slave->mtd.erasesize = parent->erasesize; } /* * Slave erasesize might differ from the master one if the master * exposes several regions with different erasesize. Adjust * wr_alignment accordingly. */ if (!(slave->mtd.flags & MTD_NO_ERASE)) wr_alignment = slave->mtd.erasesize; tmp = part_absolute_offset(parent) + slave->offset; remainder = do_div(tmp, wr_alignment); if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) { /* Doesn't start on a boundary of major erase size */ /* FIXME: Let it be writable if it is on a boundary of * _minor_ erase size though */ slave->mtd.flags &= ~MTD_WRITEABLE; printk(KERN_WARNING"mtd: partition \"%s\" doesn't start on an erase/write block boundary -- force read-only\n", part->name); } tmp = part_absolute_offset(parent) + slave->mtd.size; remainder = do_div(tmp, wr_alignment); if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) { slave->mtd.flags &= ~MTD_WRITEABLE; printk(KERN_WARNING"mtd: partition \"%s\" doesn't end on an erase/write block -- force read-only\n", part->name); } mtd_set_ooblayout(&slave->mtd, &part_ooblayout_ops); slave->mtd.ecc_step_size = parent->ecc_step_size; slave->mtd.ecc_strength = parent->ecc_strength; slave->mtd.bitflip_threshold = parent->bitflip_threshold; if (parent->_block_isbad) { uint64_t offs = 0; while (offs < slave->mtd.size) { if (mtd_block_isreserved(parent, offs + slave->offset)) slave->mtd.ecc_stats.bbtblocks++; else if (mtd_block_isbad(parent, offs + slave->offset)) slave->mtd.ecc_stats.badblocks++; offs += slave->mtd.erasesize; } } out_register: return slave; }
3.2.2 mtd_partitions
链表mtd_partitions定义在drivers/mtd/mtdpart.c:
static LIST_HEAD(mtd_partitions);
3.3 mtd_device_register
mtd_device_register函数同样是用于MTD设备的注册,只不过这个函数对add_mtd_device、add_mtd_partitions进行了包装,根据分区参数决定执行哪个函数。
宏mtd_device_register定义在include/linux/mtd/mtd.h:
#define mtd_device_register(master, parts, nr_parts) \ mtd_device_parse_register(master, NULL, NULL, parts, nr_parts)
函数mtd_device_parse_register定义在drivers/mtd/mtdcore.c:
/** * mtd_device_parse_register - parse partitions and register an MTD device. * * @mtd: the MTD device to register * @types: the list of MTD partition probes to try, see * 'parse_mtd_partitions()' for more information * @parser_data: MTD partition parser-specific data * @parts: fallback partition information to register, if parsing fails; * only valid if %nr_parts > %0 * @nr_parts: the number of partitions in parts, if zero then the full * MTD device is registered if no partition info is found * * This function aggregates MTD partitions parsing (done by * 'parse_mtd_partitions()') and MTD device and partitions registering. It * basically follows the most common pattern found in many MTD drivers: * * * If the MTD_PARTITIONED_MASTER option is set, then the device as a whole is * registered first. * * Then It tries to probe partitions on MTD device @mtd using parsers * specified in @types (if @types is %NULL, then the default list of parsers * is used, see 'parse_mtd_partitions()' for more information). If none are * found this functions tries to fallback to information specified in * @parts/@nr_parts. * * If no partitions were found this function just registers the MTD device * @mtd and exits. * * Returns zero in case of success and a negative error code in case of failure. */ int mtd_device_parse_register(struct mtd_info *mtd, const char * const *types, struct mtd_part_parser_data *parser_data, const struct mtd_partition *parts, // 分区表 int nr_parts) // 分区个数 { int ret; mtd_set_dev_defaults(mtd); if (IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER)) { // 将Nand Flash当做一个分区注册进内核 ret = add_mtd_device(mtd); // 注册MTD设备 if (ret) return ret; } /* Prefer parsed partitions over driver-provided fallback */ ret = parse_mtd_partitions(mtd, types, parser_data); if (ret > 0) ret = 0; else if (nr_parts) // 多个分区 注册MTD设备 ret = add_mtd_partitions(mtd, parts, nr_parts); else if (!device_is_registered(&mtd->dev)) // 只有一个分区 ret = add_mtd_device(mtd); else ret = 0; if (ret) goto out; /* * FIXME: some drivers unfortunately call this function more than once. * So we have to check if we've already assigned the reboot notifier. * * Generally, we can make multiple calls work for most cases, but it * does cause problems with parse_mtd_partitions() above (e.g., * cmdlineparts will register partitions more than once). */ WARN_ONCE(mtd->_reboot && mtd->reboot_notifier.notifier_call, "MTD already registered\n"); if (mtd->_reboot && !mtd->reboot_notifier.notifier_call) { mtd->reboot_notifier.notifier_call = mtd_reboot_notifier; register_reboot_notifier(&mtd->reboot_notifier); } out: if (ret && device_is_registered(&mtd->dev)) del_mtd_device(mtd); // 卸载MTD设备 return ret; }
四、mtdblock.c
之前我们已经介绍过mtdbloc.c文件,该文件实现了MTD块设备相关接口,我们直接定位到drivers/mtd/mtdblock.c文件,并对源码进行解析。
4.1 模块入口函数
我们定位到MTD块设备模块入口函数:
static struct mtd_blktrans_ops mtdblock_tr = { // 这里面定义了MTD块设备相关信息以及操作函数 .name = "mtdblock", .major = MTD_BLOCK_MAJOR, // MTD块设备主设备号 31 .part_bits = 0, // 磁盘设备分区位数 0表示不分区 1表示有2个分区 2表示有4个分区... .blksize = 512, // 扇区大小 .open = mtdblock_open, .flush = mtdblock_flush, .release = mtdblock_release, .readsect = mtdblock_readsect, .writesect = mtdblock_writesect, .add_mtd = mtdblock_add_mtd, .remove_dev = mtdblock_remove_dev, .owner = THIS_MODULE, }; static int __init init_mtdblock(void) { return register_mtd_blktrans(&mtdblock_tr); }
4.2 register_mtd_blktrans
定位到register_mtd_blktrans函数,该函数位于drivers/mtd/mtd_blkdevs.c:
int register_mtd_blktrans(struct mtd_blktrans_ops *tr) { struct mtd_info *mtd; int ret; /* Register the notifier if/when the first device type is registered, to prevent the link/init ordering from fucking us over. */ if (!blktrans_notifier.list.next) // next指向NULL,进入 register_mtd_user(&blktrans_notifier); // 注册blktrans_notifier到mtd_notifiers链表 mutex_lock(&mtd_table_mutex); ret = register_blkdev(tr->major, tr->name); // 注册块设备,主设备号为MTD_BLOCK_MAJOR,定义为31 if (ret < 0) { printk(KERN_WARNING "Unable to register %s block device on major %d: %d\n", tr->name, tr->major, ret); mutex_unlock(&mtd_table_mutex); return ret; } if (ret) tr->major = ret; tr->blkshift = ffs(tr->blksize) - 1; INIT_LIST_HEAD(&tr->devs); list_add(&tr->list, &blktrans_majors); // 注册tr到链表blktrans_majors mtd_for_each_device(mtd) if (mtd->type != MTD_ABSENT) tr->add_mtd(tr, mtd); mutex_unlock(&mtd_table_mutex); return 0; }
该函数主要包含三部分:
- 调用register_mtd_user:注册blktrans_notifier到链表mtd_notifiers,然后遍历全局变量mtd_idr获取mtd,执行blktrans_notify_add(mtd);
- 调用register_blkdev注册块设备,主设备号为31,块设备名称为mtdblock;
- 注册mtdblock_tr到链表blktrans_majors,链表定义为static LIST_HEAD(blktrans_majors);;
- 然后遍历全局变量mtd_idr获取mtd,执行mtdblock_add_mtd(mtdblock_tr,mtd);
4.2.1 mtd_notifier
mtd_notifier定义在include/linux/mtd/mtd.h:
struct mtd_notifier { void (*add)(struct mtd_info *mtd); void (*remove)(struct mtd_info *mtd); struct list_head list; };
4.2.2 blktrans_notifier
这里我们关注一下register_mtd_user(&blktrans_notifier),变量blktrans_notifier,定义在drivers/mtd/mtd_blkdevs.c:
static struct mtd_notifier blktrans_notifier = { .add = blktrans_notify_add, .remove = blktrans_notify_remove, };
4.2.3 register_mtd_user
register_mtd_user函数将new->list添加到链表mtd_notifiers:
/** * register_mtd_user - register a 'user' of MTD devices. * @new: pointer to notifier info structure * * Registers a pair of callbacks function to be called upon addition * or removal of MTD devices. Cau ses the 'add' callback to be immediately * invoked for each MTD device currently present in the system. */ void register_mtd_user (struct mtd_notifier *new) { struct mtd_info *mtd; mutex_lock(&mtd_table_mutex); // 互斥锁 list_add(&new->list, &mtd_notifiers); // 加入链表 __module_get(THIS_MODULE); mtd_for_each_device(mtd) // 遍历mtd_idr,得到mtd new->add(mtd); // 最终执行blktrans_notify_add(mtd) mutex_unlock(&mtd_table_mutex); // 解锁 }
4.2.4 mtd_for_each_device
mtd_for_each_device宏定义在drivers/mtd/mtdcore.h:
#define mtd_for_each_device(mtd) \ for ((mtd) = __mtd_next_device(0); \ (mtd) != NULL; \ (mtd) = __mtd_next_device(mtd->index + 1))
__mtd_next_device定义在drivers/mtd/mtdcore.c:
struct mtd_info *__mtd_next_device(int i) { return idr_get_next(&mtd_idr, &i); }
这里实际上就是去遍历mtd_idr这个redix树上的所有节点,得到每个节点关联的mtd。
4.2.5 blktrans_notify_add
然后进入blktrans_notifier变量的blktrans_notify_add ()函数。
static void blktrans_notify_add(struct mtd_info *mtd) { struct mtd_blktrans_ops *tr; if (mtd->type == MTD_ABSENT) return; list_for_each_entry(tr, &blktrans_majors, list) // 遍历blktrans_majors链表 tr->add_mtd(tr, mtd); // 执行mtd_blktrans_ops结构体的add_mtd }
在MTD块设备驱动入口函数中,会将mtdblock_tr添加到链表blktrans_majors,所以这里遍历blktrans_majors链表,实际上得到的tr就是mtdblock_tr:然后执行mtdblock_tr.add_mtd(mtdblock_tr,mtd)方法。
mtdblock_tr的add_mtd函数,就是mtdblock_add_mtd函数。
4.2.6 在mtdblock_add_mtd
static void mtdblock_add_mtd(struct mtd_blktrans_ops *tr, struct mtd_info *mtd) { struct mtdblk_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) return; dev->mbd.mtd = mtd; // 设置MTD原始设备 dev->mbd.devnum = mtd->index; // 设置起始次设备号 dev->mbd.size = mtd->size >> 9; // 总扇区个数 dev->mbd.tr = tr; if (!(mtd->flags & MTD_WRITEABLE)) dev->mbd.readonly = 1; if (add_mtd_blktrans_dev(&dev->mbd)) kfree(dev); }
mtdblock_add_mtd函数:
- 分配了一个mtdblk_dev结构体遍历dev:
- 初始化dev成员;
- 调用add_mtd_blktrans_dev(dev->mtd);
mtdblk_dev数据结构实际描述的就是一个MTD块设备,其包含MTD原始设备,定义在drivers/mtd/mtdblock.c:
struct mtdblk_dev { struct mtd_blktrans_dev mbd; int count; struct mutex cache_mutex; unsigned char *cache_data; unsigned long cache_offset; unsigned int cache_size; enum { STATE_EMPTY, STATE_CLEAN, STATE_DIRTY } cache_state; };
struct mtd_blktrans_dev { struct mtd_blktrans_ops *tr; // MTD设备相关信息以及操作函数 struct list_head list; struct mtd_info *mtd; // MTD原始设备 struct mutex lock; int devnum; // 用于计算起始次设备号(devnum<<tr->part_bits,左移0位),由于一个MTD块设备可能存在若干个分区,假设有2个分区 那两个分区次设备号就是devnum+1,devnum+2,其中devnum表示整个磁盘 bool bg_stop; unsigned long size; // 扇区个数 int readonly; int open; struct kref ref; struct gendisk *disk; // 磁盘设备 struct attribute_group *disk_attributes; struct request_queue *rq; // 请求队列 struct list_head rq_list; struct blk_mq_tag_set *tag_set; // 标签集 spinlock_t queue_lock; void *priv; fmode_t file_mode; };
4.2.7 add_mtd_blktrans_dev
add_mtd_blktrans_dev定义在drivers/mtd/mtd_blkdevs.c:
int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) { struct mtd_blktrans_ops *tr = new->tr; struct mtd_blktrans_dev *d; int last_devnum = -1; struct gendisk *gd; int ret; if (mutex_trylock(&mtd_table_mutex)) { mutex_unlock(&mtd_table_mutex); BUG(); } mutex_lock(&blktrans_ref_mutex); list_for_each_entry(d, &tr->devs, list) { // tr->devs是个链表,遍历链表得到mtd_blktrans_dev if (new->devnum == -1) { // new设备未设置devnum号,分配一个空闲的devnum,默认从0开始分配,逐渐递增..... /* Use first free number */ if (d->devnum != last_devnum+1) { /* Found a free devnum. Plug it in here */ new->devnum = last_devnum+1; // 新的devnum list_add_tail(&new->list, &d->list); // 将当前new添加到链表尾部 goto added; } } else if (d->devnum == new->devnum) { // new设置的devnum已经被占用 /* Required number taken */ mutex_unlock(&blktrans_ref_mutex); return -EBUSY; } else if (d->devnum > new->devnum) { /* Required number was free */ list_add_tail(&new->list, &d->list); goto added; } last_devnum = d->devnum; // 更新最新设备分配的次设备号 } ret = -EBUSY; if (new->devnum == -1) new->devnum = last_devnum+1; /* Check that the device and any partitions will get valid * minor numbers and that the disk naming code below can cope * with this number. */ if (new->devnum > (MINORMASK >> tr->part_bits) || (tr->part_bits && new->devnum >= 27 * 26)) { mutex_unlock(&blktrans_ref_mutex); goto error1; } list_add_tail(&new->list, &tr->devs); added: mutex_unlock(&blktrans_ref_mutex); mutex_init(&new->lock); kref_init(&new->ref); if (!tr->writesect) new->readonly = 1; /* Create gendisk */ ret = -ENOMEM; gd = alloc_disk(1 << tr->part_bits); // 分配一个gendisk结构体,设置分区个数 if (!gd) goto error2; new->disk = gd; gd->private_data = new; // 私有数据 gd->major = tr->major; // 设置主设备号 gd->first_minor = (new->devnum) << tr->part_bits; // 设置起始次设备号 gd->fops = &mtd_block_ops; // 设置块设备操作函数 if (tr->part_bits) //0 if (new->devnum < 26) snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c", tr->name, 'a' + new->devnum); else snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c%c", tr->name, 'a' - 1 + new->devnum / 26, 'a' + new->devnum % 26); else // 设置磁盘名 即/dev/mtdblock%d snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%d", tr->name, new->devnum); set_capacity(gd, ((u64)new->size * tr->blksize) >> 9); // 设置容量 单位扇区 /* Create the request queue */ spin_lock_init(&new->queue_lock); INIT_LIST_HEAD(&new->rq_list); new->tag_set = kzalloc(sizeof(*new->tag_set), GFP_KERNEL); if (!new->tag_set) goto error3; new->rq = blk_mq_init_sq_queue(new->tag_set, &mtd_mq_ops, 2, BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING); // 设置请求队列,同时设置块设备驱动行为的回调函数为mtd_mq_ops if (IS_ERR(new->rq)) { ret = PTR_ERR(new->rq); new->rq = NULL; goto error4; } if (tr->flush) blk_queue_write_cache(new->rq, true, false); new->rq->queuedata = new; blk_queue_logical_block_size(new->rq, tr->blksize); blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq); blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq); if (tr->discard) { blk_queue_flag_set(QUEUE_FLAG_DISCARD, new->rq); blk_queue_max_discard_sectors(new->rq, UINT_MAX); } gd->queue = new->rq; // 设置请求队列 if (new->readonly) set_disk_ro(gd, 1); device_add_disk(&new->mtd->dev, gd, NULL); // 向内核注册gendisk if (new->disk_attributes) { ret = sysfs_create_group(&disk_to_dev(gd)->kobj, new->disk_attributes); WARN_ON(ret); } return 0; error4: kfree(new->tag_set); error3: put_disk(new->disk); error2: list_del(&new->list); error1: return ret; }
从该函数我们可以看到无论注册多少个MTD块设备,其主设备号都是31,只是次设备号不一样而已,主设备号用来表示一个特定的驱动程序。次设备号用来表示使用该驱动程序的各设备。
4.2.8 mtd_block_ops
这里我们关注一下MTD块设备操作集mtd_block_ops,定义在drivers/mtd/mtd_blkdevs.c。
static const struct block_device_operations mtd_block_ops = { .owner = THIS_MODULE, .open = blktrans_open, .release = blktrans_release, .ioctl = blktrans_ioctl, .getgeo = blktrans_getgeo, };
其中部分函数指针的意义:
- open:当打开一个MTD块设备的时候被调用;
- release:当关闭一个MTD块设备的时候被调用;
- getgeo:获取驱动器的集合信息,获取到的信息会被填充在一个hd_geometry结构中;
- ioctl:对MTD块设备进行一些特殊操作时调用;
4.2.9 blktrans_open
static int blktrans_open(struct block_device *bdev, fmode_t mode) { struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk); int ret = 0; if (!dev) return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/ mutex_lock(&mtd_table_mutex); mutex_lock(&dev->lock); if (dev->open) goto unlock; kref_get(&dev->ref); __module_get(dev->tr->owner); if (!dev->mtd) goto unlock; if (dev->tr->open) { ret = dev->tr->open(dev); // 实际上调用了mtd_blktrans_ops的open函数 if (ret) goto error_put; } ret = __get_mtd_device(dev->mtd); if (ret) goto error_release; dev->file_mode = mode; unlock: dev->open++; mutex_unlock(&dev->lock); mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); return ret; error_release: if (dev->tr->release) dev->tr->release(dev); error_put: module_put(dev->tr->owner); kref_put(&dev->ref, blktrans_dev_release); mutex_unlock(&dev->lock); mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev);
4.2.10 blktrans_ioctl
static int blktrans_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk); int ret = -ENXIO; if (!dev) return ret; mutex_lock(&dev->lock); if (!dev->mtd) goto unlock; switch (cmd) { case BLKFLSBUF: ret = dev->tr->flush ? dev->tr->flush(dev) : 0; break; default: ret = -ENOTTY; } unlock: mutex_unlock(&dev->lock); blktrans_dev_put(dev); return ret; }
4.2.11 mtd_mq_ops
这里我们关注一下MTD块设备驱动mq的操作集合,定义在drivers/mtd/mtd_blkdevs.c。
static const struct blk_mq_ops mtd_mq_ops = { .queue_rq = mtd_queue_rq, };
在上一节分析我们已经知道将request请求派发给块设备驱动的时候会被调用queue_rq函数,该函数本质上就是进行磁盘和内存之间的数据交互操作。比如将内存数据写入磁盘、或者从磁盘读取数据到内存等。
static blk_status_t mtd_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { struct mtd_blktrans_dev *dev; dev = hctx->queue->queuedata; if (!dev) { blk_mq_start_request(bd->rq); return BLK_STS_IOERR; } spin_lock_irq(&dev->queue_lock); list_add_tail(&bd->rq->queuelist, &dev->rq_list); mtd_blktrans_work(dev); // 这里就不细究了,读取操作会调用mtdblock_tr.readsect、写入操作会调用mtdblock_tr.writesect,有兴趣自己研究哈 spin_unlock_irq(&dev->queue_lock); return BLK_STS_OK; }
4.3 MTD块设备流程图
register_mtd_blktrans函数执行流程如图:
MTD块设备的入口函数:
- 将blktrans_notifier添加到mtd_notifiers链表中;
- 上图第一个双向循环里mtd_idr树只有根节点,所以并不会进入循环,循环内这块代码不会执行;
- 然后接着注册块设备号主设备号,主设备号为31,块设备名称为mtdblock;
- 然后进入下面第二个循环里,同理,第二个循环也不会进入。
然后在add_mtd_device(mtd)函数中(如果MTD设备存在多个分区,将会多次调用该方法,为每个分区注册一个MTD设备):
- 为mtd原始设备分配节点;
- 设置mtd原始设备的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息;
- 设置mtd原始设备对应的device类型变量所属的class为mtd_class,并设置其设备号,类型、名称、driver_data;调用device_register完成名字为mtd%d MTD字符设备的注册;
- 调用device_create完成名字为mtd%dro MTD字符设备的创建、初始化以及注册;
- 遍历blktrans_notifier,当查找到有blktrans_notifier时,就调用blktrans_notifier->add(mtd):
- 分配gendisk结构体,设置成员参数:
- private_data;
- 设置主设备号major(MTD_BLOCK_MAJOR,值为31);
- 设置起始次设备号first_minor(如果注册了多个MTD设备,该值是逐渐递增的);
- 磁盘设备disk_name,设置为mtdblock%d,会在/dev下创建该文件;
- 块设备操作集fops;
- 初始化请求队列;
- 最后注册gendisk。
比如开发板启动后,我们加载Nand Flash驱动后,可以查看到如下信息:
[root@zy:/]# ls /sys/class/mtd/ -l total 0 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd0 -> ../../devices/virtual/mtd/mtd0 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd0ro -> ../../devices/virtual/mtd/mtd0ro lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd1 -> ../../devices/virtual/mtd/mtd1 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd1ro -> ../../devices/virtual/mtd/mtd1ro lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd2 -> ../../devices/virtual/mtd/mtd2 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd2ro -> ../../devices/virtual/mtd/mtd2ro lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd3 -> ../../devices/virtual/mtd/mtd3 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd3ro -> ../../devices/virtual/mtd/mtd3ro [root@zy:/]# ls -l /dev/mtd* crw-rw---- 1 0 0 90, 0 Jan 1 00:00 /dev/mtd0 crw-rw---- 1 0 0 90, 1 Jan 1 00:00 /dev/mtd0ro crw-rw---- 1 0 0 90, 2 Jan 1 00:00 /dev/mtd1 crw-rw---- 1 0 0 90, 3 Jan 1 00:00 /dev/mtd1ro crw-rw---- 1 0 0 90, 4 Jan 1 00:00 /dev/mtd2 crw-rw---- 1 0 0 90, 5 Jan 1 00:00 /dev/mtd2ro crw-rw---- 1 0 0 90, 6 Jan 1 00:00 /dev/mtd3 crw-rw---- 1 0 0 90, 7 Jan 1 00:00 /dev/mtd3ro brw-rw---- 1 0 0 31, 0 Jan 1 00:00 /dev/mtdblock0 brw-rw---- 1 0 0 31, 1 Jan 1 00:00 /dev/mtdblock1 brw-rw---- 1 0 0 31, 2 Jan 1 00:00 /dev/mtdblock2 brw-rw---- 1 0 0 31, 3 Jan 1 00:00 /dev/mtdblock3
五、mtdchar.c
之前我们已经介绍过mtdchar.c文件,该文件实现了MTD字符设备相关接口,我们直接定位到drivers/mtd/mtdchar.c文件,并对源码进行解析。
5.1 模块入口函数
static const struct file_operations mtd_fops = { // 字符设备操作集 .owner = THIS_MODULE, .llseek = mtdchar_lseek, .read = mtdchar_read, .write = mtdchar_write, .unlocked_ioctl = mtdchar_unlocked_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = mtdchar_compat_ioctl, #endif .open = mtdchar_open, .release = mtdchar_close, .mmap = mtdchar_mmap, #ifndef CONFIG_MMU .get_unmapped_area = mtdchar_get_unmapped_area, .mmap_capabilities = mtdchar_mmap_capabilities, #endif }; int __init init_mtdchar(void) { int ret; ret = __register_chrdev(MTD_CHAR_MAJOR, 0, 1 << MINORBITS, // MTD字符设备主设备号90, MINORBITS=20 "mtd", &mtd_fops); // 字符设备名称为mtd%d if (ret < 0) { pr_err("Can't allocate major number %d for MTD\n", MTD_CHAR_MAJOR); return ret; } return ret; }
5.2 __register_chrdev
定位到__register_chrdev函数,该函数位于fs/char_dev.c:
/** * __register_chrdev() - create and register a cdev occupying a range of minors * @major: major device number or 0 for dynamic allocation * @baseminor: first of the requested range of minor numbers * @count: the number of minor numbers required * @name: name of this range of devices * @fops: file operations associated with this devices * * If @major == 0 this functions will dynamically allocate a major and return * its number. * * If @major > 0 this function will attempt to reserve a device with the given * major number and will return zero on success. * * Returns a -ve errno on failure. * * The name of this device has nothing to do with the name of the device in * /dev. It only helps to keep track of the different owners of devices. If * your module name has only one type of devices it's ok to use e.g. the name * of the module here. */ int __register_chrdev(unsigned int major, unsigned int baseminor, unsigned int count, const char *name, const struct file_operations *fops) { struct char_device_struct *cd; struct cdev *cdev; int err = -ENOMEM; cd = __register_chrdev_region(major, baseminor, count, name); // 静态注册一组字符设备号 if (IS_ERR(cd)) return PTR_ERR(cd); cdev = cdev_alloc(); // 动态申请字符设备 if (!cdev) goto out2; cdev->owner = fops->owner; // 初始化字符设备 cdev->ops = fops; kobject_set_name(&cdev->kobj, "%s", name); err = cdev_add(cdev, MKDEV(cd->major, baseminor), count); // 将字符设备注册到系统 if (err) goto out; cd->cdev = cdev; return major ? 0 : cd->major; out: kobject_put(&cdev->kobj); out2: kfree(__unregister_chrdev_region(cd->major, baseminor, count)); return err; }
实际上我们发现模块入口函数中主要进行了:
- 字符设备号的申请,主设备号90,次设备号数量1<<20;
- 字符设备的动态申请;
- 字符设备的注册;
但是这里并没有创建class类、以及类下的文件,这一块是在add_mtd_device中实现的:
- 调用class_create、device_create生成/sys/class下的class类(这里为mtd)以及class类下的dev文件,供mdev程序扫描生成/dev下的节点;
参考文章
[2]痞子衡嵌入式:并行NAND接口标准(ONFI)及SLC Raw NAND简介
[3]最新SSD固态硬盘颗粒QLC、SLC、MLC、TLC详解
[6]MTD(Memory Technology Device) -1