linux驱动移植-Nand Flash ONFI标准和MTD子系统【转】
转自:https://www.cnblogs.com/zyly/p/16756273.html#_label0
目录
- 一、ONFI标准
- 二、MTD设备驱动
- 三、MTD设备注册
- 四、mtdblock.c
- 五、mtdchar.c
一、ONFI标准
Nand Flash是嵌入式世界里常见的存储器,对于嵌入式开发而言,Nand Flash主要分为两大类:Serial Nand、Raw Nand,这两类Nand的差异是很大的。
Raw Nand是相对于Serial Nand而言的,Serial Nand即串行接口的Nand Flash,比如采用SPI通信协议的Nand Flash,而Raw Nand是并行接口的Nand Flash。
这里我们首先介绍ONFI协议,主要是因为在Nand Flash驱动源码分析的时候涉及到ONFI协议。而我们使用的K9F2G08U0C这款芯片并没有支持ONFI协议,我们将该芯片支持的命令和ONFI 1.0规定的命令对比就可以发现。
1.1 ONFI标准
说到Raw Nand发展史,其实早期的Raw Nand没有统一标准,虽然早在1989年Toshiba便发表了Nand Flash结构,但具体到Raw Nand芯片,各厂商都是自由设计,因此尺寸不统一、存储结构差异大、接口命令不通用等问题导致客户使用起来很难受。
为了改变这一现状,2006年几个主流的Raw Nand厂商(Hynix、Intel、Micron、Phison、Sony、ST)联合起来商量制订一个Raw Nand标准,这个标准叫Open Nand Flash Interface,简称ONFI,2006年12月ONFI 1.0标准正式推出,此后几乎所有的Raw Nand厂商都按照ONFI标准设计生产Raw Nand,从此不管哪家生产的Raw Nand对嵌入式设计者来说几乎都是一样的,至少在驱动代码层面是一样的。
ONFI官网:http://www.onfi.org/,在这里我们下载到ONFI协议规范:
1.2 Raw Nand分类
1.2.1 单元层数
Nand Flash内存单元按照层数可以分为:
- 单层单元(Single Level Cell,简称SLC):这种类型的闪存在读写数据时具有最为精确,并且还具有持续最长的数据读写寿命的优点。SLC擦写寿命约在9万到10万次之间。这种类型的闪存由于其使用寿命,准确性和综合性能,在企业市场上十分受众。但由于储存成本高、存储容量相对较小,在家用市场则不太受青睐。
- 多层单元(Multi Level Cell,简称MLC):它的命名来源于它在SLC的1位/单元的基础上,变成了2位/单元。这样做的一大优势在于大大降低了大容量储存闪存的成本,约3000--10000次擦写寿命。
- 三层单元(Triple Level Cell,简称TLC):TLC闪存是闪存生产中最低廉的规格,其储存达到了3位/单元,虽然高储存密度实现了较廉价的大容量格式,但其读写的生命周期被极大地缩短,擦写寿命只有短短的500~1000次,同时读写速度较差,只适合普通消费者使用,不能达到工业使用的标准。
- 四层单元(Quad Lebel Cell,简称QLC):QLC每个单元可储存4bit数据,跟TLC相比,QLC的储存密度提高了33%。QLC不仅能经受1000次编程或擦写循环(与TLC相当,甚至更好),而且容量提升了,成本也更低。
结论:SLC>MLC>TLC。
目前大多数U盘都是采用TLC芯片颗粒,其优点是价格便宜,不过速度一般,寿命相对较短。
而SSD固态硬盘中,目前MLC颗粒固态硬盘是主流,其价格适中,速度与寿命相对较好,而低价SSD固态硬盘普遍采用的是TLC芯片颗粒,大家在购买固态硬盘的时候,可以在产品参数中去了解。
SLC颗粒固态目前主要在一些高端固态硬盘中出现,售价多数上千元,甚至更贵。
智能手机方面,目前多数智能手机存储也是采用TLC芯片存储,而苹果iPhone6部分产品采用的TLC芯片,另外还有部分采用的是MLC芯片颗粒。总的来说,MLC闪存芯片颗粒是时下主流,产品在速度、寿命以及价格上适中,比较适合推荐。
1.2.2 数据线宽度
数据线宽度可以分为x8 、x16。
1.2.3 数据采集模式
数据采集模式可以分为 SDR、DDR。
1.2.4 接口命令标准
接口命令标准可以分为:非标、ONFI。
1.3 Raw Nand内存模型
ONFI规定了Raw Nand内存单元从大到小最多分为:Device、LUN(Die、Target)、Plane、Block、Page、Cell。
- Device:就是指单片Nand Flash,对外提供Package封装的芯片,1个Device包含1个或者多个LUN;
- LUN(Die、Target):是接收和执行Flash命令的基本单元,1个LUN包含1个或者多个plane。
- Plane:1个Plane包含多个Block。
- Block:能够执行擦除操作的最小单元,通常由多个Page组成。
- Page:能够执行编程和读操作的最小单元,通常大小为2KB等。
- Cell:Page中的最小操作擦写读单元,对应一个浮栅晶体管,可以存储1bit或多bit。
其中Page和Block是必有的,因为Page是读写的最小单元,Block是擦除的最小单元。而LUN和Plane则不是必有的(如没有,可认为LUN=1, Plane=1),一般在大容量Raw Nand(至少8Gb以上)上才会出现。
常见的Nand Flash内部只有一个chip(LUN)、每个chip只有1个plane,而有些复杂得,容量更大的Nand Flash,内部有多个chip,每个chip有多个plane。这类的Nand Flash,其实就是多了一个主控将多块Flash叠加在一起,如下图:
注:对于chip的概念,我理解就是上面的LUN,其实任何某个型号的Nand Flash,都可以称其是一个chip,但是实际上,这里我们所提到的,是针对内部来说的,也就是某型号的Nand Flash,内部有几个chip,比如:
- 三星的2GB的K9WAG08U1A芯片(可以理解为外部芯片/型号)内部装了2个单片是1GB的K9K8G08U0A,此时就称K9WAG08U1A内部有2个chip;
- 而有些单个的chip,内部又包含多个plane,比如上面的K9K8G08U0A内部包含4个单片是2Gb的Plane;
1.4 Raw Nand信号与封装
ONFI规定了Raw Nand信号线与封装,如下是典型的x8 Raw Nand内部结构图:
除了内存单元外,还有两大组成,分别是IO控制单元和逻辑控制单元,信号线主要挂在IO控制与逻辑单元,x8 Raw Nand主要有15根信号线(其中必须的是13根,CE¯¯¯¯¯¯¯¯CE¯和RB¯¯¯¯RB¯可以不用)。
引脚名称 | 描述 |
CLE | 命令使能,当CLE为高电平时,WE¯¯¯¯¯¯¯¯¯WE¯ 上升沿锁存I/O输入到命令寄存器 |
ALE | 地址使能,当ALE为高电平时,WE¯¯¯¯¯¯¯¯¯WE¯上升沿锁存I/O输入到地址寄存器 |
CE¯¯¯¯¯¯¯¯CE¯ | 片选信号,低电位有效 |
RE¯¯¯¯¯¯¯¯RE¯ | 读使能,低电位有效 |
WE¯¯¯¯¯¯¯¯¯WE¯ | WE¯¯¯¯¯¯¯¯¯WE¯上升沿锁存I/O输入到命令、地址、数据寄存器 |
WP¯¯¯¯¯¯¯¯¯WP¯ | 写保护 |
RB¯¯¯¯RB¯ | 就绪/忙输出信号(低电平表示操作还在进行中,高电平表示操作完成) |
VCC | 电源 |
VSS | 地 |
NC | 不接 |
I/O0 ~ I/O7 | 数据输入输出(命令、地址、数据公用数据总线) |
ONFI规定的封装标准有很多,比如TSOP48、LGA52、BGA63/100/132/152/272/316,其中对于嵌入式开发而言,最常用的是如下图扁平封装的TSOP-48,这种封装常用于容量较小的Raw Nand(1/2/4/8/16/32Gb),1-32Gb容量对于嵌入式设计而言差不多够用,且TSOP-48封装易于PCB设计,因此得以流行。
1.5 Raw Nand接口命令
ONFI 1.0规定了Raw Nand接口命令,如下表所示,其中一部分是必须要支持的(M),还有一部分是可选支持的(O)。必须支持的命令里最常用的是Read(Read Page)、Page Program、Block Erase、Read Status这三条,涵盖读写擦最基本的三种操作。
此外比较重要的还有:
- Read Status,用于获取命令执行状态与结果。
- Read Parameter Page:用于获取芯片内部存储的出厂信息(包括内存结构、特性、时序、其他行为参数等),其结构已由ONFI规定如下表,在设计Nand软件驱动时,可以通过获取这个Parameter Page来做到代码通用。
二、MTD设备驱动
MTD(Memory Technology Drivers)是用于访问memory设备( ROM 、 Flash)的Linux 的子系统, MTD 的主要目的是为了使新的memory设备的驱动更加简单,为此它在硬件和上层之间提供了一个抽象的接口。
2.1 MTD子系统概要
在介绍MTD之前,我们思考一个问题,linux内核为什么抽象出了MTD子系统呢?
我们回顾一下我们上一节块设备驱动编写的流程:
- 调用register_blkdev注册块设备主设备号;
- 使用alloc_disk申请一个通用磁盘对象gendisk;
- 使用blk_mq_init_sq_queue初始化一个请求队列;
- 设置成员参数major、first_minor、disk_name、fops;
- 设置请求队列queue,等于之前初始化的请求队列;
- 设置gendisk结构体的成员;
- 使用add_disk注册gendisk;
针对于每一种型号的Flash设备,我们进行块设备驱动编写的时候,都要重复进行如上的操作。那我们就开始想了,各种型号的Flash设备有什么区别呢?以Nand Flash为例,主要就是内存模型(页大小、块大小、页数/块、OOB等)、以及时序参数略有差别,那我们是否可以将与Nand Flash紧密相关的部分抽离出来,由Nand Flash驱动层提供,而其他相同部分单独抽离出来。MTD子系统就是做了这样的事情。
2.2 MTD子系统框架
如上图所示,MTD程序框架通用可以分为四层,从上到下以此为设备节点、MTD设备层、MTD原始设备层,Flash驱动层。
- 设备节点:通过mknod在/dev子目录下建立MTD块设备节点(主设备号为31)和MTD字符设备节点(主设备号为90),通过访问此设备节点即可访问MTD字符设备和块设备 。
- MTD设备层:基于MTD原始设备,linux系统可以定义出MTD的块设备(主设备号31)和字符设备(设备号90)。其中:
- mtdchar.c:MTD字符设备接口相关实现;
- mtdblock.c:MTD块设备接口相关实现;这部分负责设备的建立、数据的读写、优化处理等。这跟传统的块设备驱动类型,块设备主设备号的申请,gendisk结构体的分配设置、队列的初始化等,这些都是由内核自动完成。
- MTD原始设备层:用于描述MTD原始设备的数据结构是mtd_info,它定义了大量的关于MTD的数据和操作函数。其中:
- mtdcore.c: MTD原始设备接口相关实现;
- mtdpart.c : MTD分区接口相关实现;
- Flash驱动层:Flash驱动层负责对Flash硬件的读、写和擦除操作,Nand Flash和Nor Flash有不同的协议和硬件细节,这部分知道发什么,如发送什么命令可以识别、读写、擦除等操作,以及硬件该怎么发。Nand Flash有Nand的协议,Nor Flash有Nor的协议,不同协议有不同的函数,通过对应的结构体和函数构造对应的操作环境。用户只需要完成Flash驱动层的相关结构体的分配、设置、注册,并建立从具体设备到MTD原始设备映射关系。
- Nand Flash芯片的驱动位于drivers/mtd/nand/子目录下,Nand Flash使用nand_chip结构体;
- Nor Flash芯片驱动位于drivers/mtd/chips/子目录下,Nor Flash使用map_info结构体;
2.2.1 Flash驱动层
(1) Nor Flash驱动
linux内核实现了针对CFI、JEDEC等接口标准的通用Nor Flash驱动。在上述接口驱动基础上,芯片级驱动较简单 :定义具体内存映射结构体map_info,然后通过接口类型后调用do_map_probe。
以scb2_flash.c(位于drivers/mtd/maps/)为例:
- 定义map_info结构体,初始化成员name、size、phys、bankwidth;
- 通过ioremap映射成员virt(虚拟内存地址);
- 通过函数simple_map_init初始化map_info成员函数read、write、copy_from、copy_to;
- 通过do_map_probe进行CFI接口探测,返回mtd_info结构体;
- 通过parse_mtd_partitions、add_mtd_partitions注册MTD原始设备;
(2) Nand Flash驱动
linux内核实现了通用Nand Flash驱动(drivers/mtd/nand/raw/nand_base.c),芯片级驱动需要实现nand_chip结构。
MTD使用nand_chip来表示一个Nand Flash芯片, 该结构体包含了关于Nand Flash的内存模型信息,读写方法,ECC模式,硬件控制等一系列底层机制。
以s3c2410.c(位于drivers/mtd/nand/raw)为例:
-
分配nand_chip内存;
-
根据SOC Nand控制器初始化nand_chip成员,比如:chip->legacy(成员write_buf、read_buf、select_chip、cmd_ctrl、dev_ready、IO_ADDR_R、IO_ADDR_W)、chip->controller;
- 设置chip->priv为mtd_info;
-
以mtd_info为参数调用nand_scan()探测Nand Flash,nand_scan()会读取nand芯片ID:
- 初始化chip->base.mtd(成员writesize、oobsize、erasesize等);
- 初始化chip->base.memorg(成员bits_per_cell、pagesize、oobsize、pages_per_eraseblock、planes_per_lun、luns_per_target、ntatgets等);
- 初始化chip->options、chip->base.eccreq;
- 初始化chip->ecc各个成员(设置ecc模式及处理函数);
- chip成员中所有未初始化函数指针则使用nand_base.c中的默认函数;
-
mtd_info和mtd_partition为参数调用mtd_device_register()进行MTD设备注册;
2.3 核心结构体
2.3.1 struct mtd_info
linux内核使用mtd_info结构体表示MTD原始设备,描述一个设备或一个多分区设备中的一个分区,这其中定义了大量关于MTD的数据和操作函数;所有mtd_info结构体都被存放在mtd_info数组mtd_table中。
mtd_info定义在include/linux/mtd/mtd.h:
struct mtd_info { u_char type; // MTD设备类型 包括MTD_NORFALSH、MTD_NANDFALSH等 uint32_t flags; // 标志 MTD_WRITEABLE、MTD_NO_ERASE等 uint32_t orig_flags; /* Flags as before running mtd checks */ uint64_t size; // Total size of the MTD MTD设备总容量 /* "Major" erase size for the device. Naïve users may take this * to be the only erase size available, or may use the more detailed * information below if they desire */ uint32_t erasesize; // MTD设备擦除单位大小,对于Nand Flash来说就是Block的大小 /* Minimal writable flash unit size. In case of NOR flash it is 1 (even * though individual bits can be cleared), in case of NAND flash it is * one NAND page (or half, or one-fourths of it), in case of ECC-ed NOR * it is of ECC block size, etc. It is illegal to have writesize = 0. * Any driver registering a struct mtd_info must ensure a writesize of * 1 or larger. */ uint32_t writesize; // 可写入数据最小字节数,对于Nor Flash是字节,对于Nand Flash为一页 /* * Size of the write buffer used by the MTD. MTD devices having a write * buffer can write multiple writesize chunks at a time. E.g. while * writing 4 * writesize bytes to a device with 2 * writesize bytes * buffer the MTD driver can (but doesn't have to) do 2 writesize * operations, but not 4. Currently, all NANDs have writebufsize * equivalent to writesize (NAND page size). Some NOR flashes do have * writebufsize greater than writesize. uint32_t writebufsize; uint32_t oobsize; // Amount of OOB data per block (e.g. 16) uint32_t oobavail; // Available OOB bytes per block /* * If erasesize is a power of 2 then the shift is stored in * erasesize_shift otherwise erasesize_shift is zero. Ditto writesize. */ unsigned int erasesize_shift; // 擦除数据偏移值,根据erasesize计算 unsigned int writesize_shift; // 写入数据偏移值,根据writesize计算 /* Masks based on erasesize_shift and writesize_shift */ unsigned int erasesize_mask; // 擦除数据大小掩码,根据erasesize_shift计算 unsigned int writesize_mask; // 写入数据大小掩码,根据writesize_shift计算 /* * read ops return -EUCLEAN if max number of bitflips corrected on any * one region comprising an ecc step equals or exceeds this value. * Settable by driver, else defaults to ecc_strength. User can override * in sysfs. N.B. The meaning of the -EUCLEAN return code has changed; * see Documentation/ABI/testing/sysfs-class-mtd for more detail. */ unsigned int bitflip_threshold; /* Kernel-only stuff starts here. */ const char *name; // MTD设备名称 int index; // 索引值 /* OOB layout description */ const struct mtd_ooblayout_ops *ooblayout; // oob布局描述 /* NAND pairing scheme, only provided for MLC/TLC NANDs */ const struct mtd_pairing_scheme *pairing; /* the ecc step size. */ unsigned int ecc_step_size; /* max number of correctible bit errors per ecc step */ unsigned int ecc_strength; /* Data for variable erase regions. If numeraseregions is zero, * it means that the whole device has erasesize as given above. */ int numeraseregions; // 可变擦除区域的数目,通常为1 struct mtd_erase_region_info *eraseregions; // 可变擦除区域 /* * Do not call via these pointers, use corresponding mtd_*() * wrappers instead. */ int (*_erase) (struct mtd_info *mtd, struct erase_info *instr); // 擦除 int (*_point) (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, void **virt, resource_size_t *phys); int (*_unpoint) (struct mtd_info *mtd, loff_t from, size_t len); int (*_read) (struct mtd_info *mtd, loff_t from, size_t len, // 读取 size_t *retlen, u_char *buf); int (*_write) (struct mtd_info *mtd, loff_t to, size_t len, // 写入 size_t *retlen, const u_char *buf); int (*_panic_write) (struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen, const u_char *buf); int (*_read_oob) (struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops); int (*_write_oob) (struct mtd_info *mtd, loff_t to, struct mtd_oob_ops *ops); int (*_get_fact_prot_info) (struct mtd_info *mtd, size_t len, size_t *retlen, struct otp_info *buf); int (*_read_fact_prot_reg) (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, u_char *buf); int (*_get_user_prot_info) (struct mtd_info *mtd, size_t len, size_t *retlen, struct otp_info *buf); int (*_read_user_prot_reg) (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, u_char *buf); int (*_write_user_prot_reg) (struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen, u_char *buf); int (*_lock_user_prot_reg) (struct mtd_info *mtd, loff_t from, size_t len); int (*_writev) (struct mtd_info *mtd, const struct kvec *vecs, unsigned long count, loff_t to, size_t *retlen); void (*_sync) (struct mtd_info *mtd); int (*_lock) (struct mtd_info *mtd, loff_t ofs, uint64_t len); int (*_unlock) (struct mtd_info *mtd, loff_t ofs, uint64_t len); int (*_is_locked) (struct mtd_info *mtd, loff_t ofs, uint64_t len); int (*_block_isreserved) (struct mtd_info *mtd, loff_t ofs); int (*_block_isbad) (struct mtd_info *mtd, loff_t ofs); int (*_block_markbad) (struct mtd_info *mtd, loff_t ofs); int (*_max_bad_blocks) (struct mtd_info *mtd, loff_t ofs, size_t len); int (*_suspend) (struct mtd_info *mtd); void (*_resume) (struct mtd_info *mtd); void (*_reboot) (struct mtd_info *mtd); /* * If the driver is something smart, like UBI, it may need to maintain * its own reference counting. The below functions are only for driver. */ int (*_get_device) (struct mtd_info *mtd); void (*_put_device) (struct mtd_info *mtd); struct notifier_block reboot_notifier; /* default mode before reboot */ /* ECC status information */ struct mtd_ecc_stats ecc_stats; /* Subpage shift (NAND) */ int subpage_sft; void *priv; struct module *owner; struct device dev; int usecount; struct mtd_debug_info dbg; struct nvmem_device *nvmem; };
mtd_info结构体中的read()、write()、read_oob()、write_oob()、erase()是MTD设备驱动要实现的主要函数,这是MTD原始设备与Flash驱动层之间的接口;linux已经已经帮我们实现了一套适合大部分Flash设备的mtd_info成员函数。
2.3.2 mtd_part
在MTD中使用mtd_part来表示分区,其中包含了mtd_info,每一个分区都是被看做一个MTD原始设备,在mtd_table中,mtd_part.mtd_info中的大部分数据都从该分区的主分区mtd_part->master中获得。master不作为一个MTD原始设备加入mtd_table中。
mtd_part定义在drivers/mtd/mtdpart.c:
/** * struct mtd_part - our partition node structure * * @mtd: struct holding partition details * @parent: parent mtd - flash device or another partition * @offset: partition offset relative to the *flash device* */ struct mtd_part { struct mtd_info mtd; // 分区信息 struct mtd_info *parent; // 分区的主分区 uint64_t offset; // 分区的偏移地址 struct list_head list; // 双向链表,将mtd_part链接成一个链表 };
2.3.3 struct mtd_partition
在MTD中用mtd_partition来表示分区的信息,mtd_partition定义在include/linux/mtd/partitions.h:
/* * Partition definition structure: * * An array of struct partition is passed along with a MTD object to * mtd_device_register() to create them. * * For each partition, these fields are available: * name: string that will be used to label the partition's MTD device. * types: some partitions can be containers using specific format to describe * embedded subpartitions / volumes. E.g. many home routers use "firmware" * partition that contains at least kernel and rootfs. In such case an * extra parser is needed that will detect these dynamic partitions and * report them to the MTD subsystem. If set this property stores an array * of parser names to use when looking for subpartitions. * size: the partition size; if defined as MTDPART_SIZ_FULL, the partition * will extend to the end of the master MTD device. * offset: absolute starting position within the master MTD device; if * defined as MTDPART_OFS_APPEND, the partition will start where the * previous one ended; if MTDPART_OFS_NXTBLK, at the next erase block; * if MTDPART_OFS_RETAIN, consume as much as possible, leaving size * after the end of partition. * mask_flags: contains flags that have to be masked (removed) from the * master MTD flag set for the corresponding MTD partition. * For example, to force a read-only partition, simply adding * MTD_WRITEABLE to the mask_flags will do the trick. * * Note: writeable partitions require their size and offset be * erasesize aligned (e.g. use MTDPART_OFS_NEXTBLK). */ struct mtd_partition { const char *name; /* identifier string 分区名 */ const char *const *types; /* names of parsers to use if any */ uint64_t size; /* partition size 分区大小 */ uint64_t offset; /* offset within the master MTD space 分区的偏移值 */ uint32_t mask_flags; /* master MTD flags to mask out for this partition 标志掩码 */ struct device_node *of_node; };
2.3.4 struct nand_chip
nand_chip是一个比较重要的数据结构,MTD使用nand_chip来表示一个Nand Flash内部的芯片,该结构体包含了关于Nand Flash的内存模型信息,读写方法,ECC模式,硬件控制等一系列底层机制。其定义在include/linux/mtd/rawnand.h:
/** * struct nand_chip - NAND Private Flash Chip Data * @base: Inherit from the generic NAND device * @legacy: All legacy fields/hooks. If you develop a new driver, * don't even try to use any of these fields/hooks, and if * you're modifying an existing driver that is using those * fields/hooks, you should consider reworking the driver * avoid using them. * @setup_read_retry: [FLASHSPECIFIC] flash (vendor) specific function for * setting the read-retry mode. Mostly needed for MLC NAND. * @ecc: [BOARDSPECIFIC] ECC control structure * @buf_align: minimum buffer alignment required by a platform * @oob_poi: "poison value buffer," used for laying out OOB data * before writing * @page_shift: [INTERN] number of address bits in a page (column * address bits). * @phys_erase_shift: [INTERN] number of address bits in a physical eraseblock * @bbt_erase_shift: [INTERN] number of address bits in a bbt entry * @chip_shift: [INTERN] number of address bits in one chip * @options: [BOARDSPECIFIC] various chip options. They can partly * be set to inform nand_scan about special functionality. * See the defines for further explanation. * @bbt_options: [INTERN] bad block specific options. All options used * here must come from bbm.h. By default, these options * will be copied to the appropriate nand_bbt_descr's. * @badblockpos: [INTERN] position of the bad block marker in the oob * area. * @badblockbits: [INTERN] minimum number of set bits in a good block's * bad block marker position; i.e., BBM == 11110111b is * not bad when badblockbits == 7 * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is * set to the actually used ONFI mode if the chip is * ONFI compliant or deduced from the datasheet if * the NAND chip is not ONFI compliant. * @pagemask: [INTERN] page number mask = number of (pages / chip) - 1 * @data_buf: [INTERN] buffer for data, size is (page size + oobsize). * @pagecache: Structure containing page cache related fields * @pagecache.bitflips: Number of bitflips of the cached page * @pagecache.page: Page number currently in the cache. -1 means no page is * currently cached * @subpagesize: [INTERN] holds the subpagesize * @id: [INTERN] holds NAND ID * @parameters: [INTERN] holds generic parameters under an easily * readable form. * @data_interface: [INTERN] NAND interface timing information * @cur_cs: currently selected target. -1 means no target selected, * otherwise we should always have cur_cs >= 0 && * cur_cs < nanddev_ntargets(). NAND Controller drivers * should not modify this value, but they're allowed to * read it. * @read_retries: [INTERN] the number of read retry modes supported * @lock: lock protecting the suspended field. Also used to * serialize accesses to the NAND device. * @suspended: set to 1 when the device is suspended, 0 when it's not. * @bbt: [INTERN] bad block table pointer * @bbt_td: [REPLACEABLE] bad block table descriptor for flash * lookup. * @bbt_md: [REPLACEABLE] bad block table mirror descriptor * @badblock_pattern: [REPLACEABLE] bad block scan pattern used for initial * bad block scan. * @controller: [REPLACEABLE] a pointer to a hardware controller * structure which is shared among multiple independent * devices. * @priv: [OPTIONAL] pointer to private chip data * @manufacturer: [INTERN] Contains manufacturer information * @manufacturer.desc: [INTERN] Contains manufacturer's description * @manufacturer.priv: [INTERN] Contains manufacturer private information */ struct nand_chip { struct nand_device base; // 可以看作mtd_info子类 struct nand_legacy legacy; // 硬件操作函数 int (*setup_read_retry)(struct nand_chip *chip, int retry_mode); unsigned int options; // 与具体的nand芯片相关的一些选项,如NAND_BUSWIDTH_16等 unsigned int bbt_options; int page_shift; // 用来表示nand芯片的page大小,如某nand芯片的一个page有512个字节,那么该值就是9 int phys_erase_shift; // 用来表示nand芯片每次可擦除的大小,如某nand芯片每次可擦除16kb(通常为一个block大小),那么该值就是14 int bbt_erase_shift; // 用来表示bad block table的大小,通常bbt占用一个block,所以该值通常和phys_erase_shift相同 int chip_shift; // 使用位表示nand芯片的容量 int pagemask; // nand总容量/每页字节数 - 1 得到页掩码 u8 *data_buf; struct { unsigned int bitflips; int page; } pagecache; int subpagesize; int onfi_timing_mode_default; unsigned int badblockpos; int badblockbits; struct nand_id id; // 保存从nand读取到的设备id信息,包含厂家ID、设备ID等 struct nand_parameters parameters; struct nand_data_interface data_interface; int cur_cs; // 当前选中的目标 int read_retries; struct mutex lock; unsigned int suspended : 1; uint8_t *oob_poi; struct nand_controller *controller; // nand controller struct nand_ecc_ctrl ecc; // ecc校验结构体,里面有大量函数进行ecc校验 unsigned long buf_align; uint8_t *bbt; struct nand_bbt_descr *bbt_td; struct nand_bbt_descr *bbt_md; struct nand_bbt_descr *badblock_pattern; void *priv; struct { const struct nand_manufacturer *desc; void *priv; } manufacturer; // 厂家ID信息 };
nand_chip中的ecc主要做一些与ecc有关的操作,如read_page_raw、write_pager_raw,里面含有大量函数进行ecc校验。
nand_chip中的legacy中读写函数,如read_buf、cmdfunc等,与具体的Nand Controller相关,这部分函数与硬件交互,通常需要我们自己根据SOC Nand Controller来实现。
2.3.5 struct nand_legacy
nand_legacy该结构体就是保存与SOC Nand Controller硬件相关的函数:
/** * struct nand_legacy - NAND chip legacy fields/hooks * @IO_ADDR_R: address to read the 8 I/O lines of the flash device * @IO_ADDR_W: address to write the 8 I/O lines of the flash device * @select_chip: select/deselect a specific target/die * @read_byte: read one byte from the chip * @write_byte: write a single byte to the chip on the low 8 I/O lines * @write_buf: write data from the buffer to the chip * @read_buf: read data from the chip into the buffer * @cmd_ctrl: hardware specific function for controlling ALE/CLE/nCE. Also used * to write command and address * @cmdfunc: hardware specific function for writing commands to the chip. * @dev_ready: hardware specific function for accessing device ready/busy line. * If set to NULL no access to ready/busy is available and the * ready/busy information is read from the chip status register. * @waitfunc: hardware specific function for wait on ready. * @block_bad: check if a block is bad, using OOB markers * @block_markbad: mark a block bad * @set_features: set the NAND chip features * @get_features: get the NAND chip features * @chip_delay: chip dependent delay for transferring data from array to read * regs (tR). * @dummy_controller: dummy controller implementation for drivers that can * only control a single chip * * If you look at this structure you're already wrong. These fields/hooks are * all deprecated. */ struct nand_legacy { void __iomem *IO_ADDR_R; // 读8根I/O线地址 比如S3C2440设置为数据寄存器地址 NFDATA void __iomem *IO_ADDR_W; // 写8根I/O线地址 比如S3C2440设置为数据寄存器地址 NFDATA void (*select_chip)(struct nand_chip *chip, int cs); // 片选/取消片选 u8 (*read_byte)(struct nand_chip *chip); // 读取一个字节数据 void (*write_byte)(struct nand_chip *chip, u8 byte); // 写入一个字节数据 void (*write_buf)(struct nand_chip *chip, const u8 *buf, int len); // 写入len个长度字节 void (*read_buf)(struct nand_chip *chip, u8 *buf, int len); // 读取len个长度字节 void (*cmd_ctrl)(struct nand_chip *chip, int dat, unsigned int ctrl); // 硬件相关控制函数 写命令/地址 void (*cmdfunc)(struct nand_chip *chip, unsigned command, int column, // 发送写数据命令 传入列地址、页地址 int page_addr); int (*dev_ready)(struct nand_chip *chip); // 获取nand状态 繁忙/就绪 int (*waitfunc)(struct nand_chip *chip); // 等待nand就绪 int (*block_bad)(struct nand_chip *chip, loff_t ofs); // 检测是否有坏块 int (*block_markbad)(struct nand_chip *chip, loff_t ofs); // 标记坏块 int (*set_features)(struct nand_chip *chip, int feature_addr, u8 *subfeature_para); int (*get_features)(struct nand_chip *chip, int feature_addr, u8 *subfeature_para); int chip_delay; // 延迟时间 struct nand_controller dummy_controller; };
2.3.6 struct nand_ecc_ctrl
nand_ecc_ctrl中的读写函数read_page_raw、write_pager_raw等主要是用来做一些与ecc有关的操作:
/** * struct nand_ecc_ctrl - Control structure for ECC * @mode: ECC mode * @algo: ECC algorithm * @steps: number of ECC steps per page * @size: data bytes per ECC step * @bytes: ECC bytes per step * @strength: max number of correctible bits per ECC step * @total: total number of ECC bytes per page * @prepad: padding information for syndrome based ECC generators * @postpad: padding information for syndrome based ECC generators * @options: ECC specific options (see NAND_ECC_XXX flags defined above) * @priv: pointer to private ECC control data * @calc_buf: buffer for calculated ECC, size is oobsize. * @code_buf: buffer for ECC read from flash, size is oobsize. * @hwctl: function to control hardware ECC generator. Must only * be provided if an hardware ECC is available * @calculate: function for ECC calculation or readback from ECC hardware * @correct: function for ECC correction, matching to ECC generator (sw/hw). * Should return a positive number representing the number of * corrected bitflips, -EBADMSG if the number of bitflips exceed * ECC strength, or any other error code if the error is not * directly related to correction. * If -EBADMSG is returned the input buffers should be left * untouched. * @read_page_raw: function to read a raw page without ECC. This function * should hide the specific layout used by the ECC * controller and always return contiguous in-band and * out-of-band data even if they're not stored * contiguously on the NAND chip (e.g. * NAND_ECC_HW_SYNDROME interleaves in-band and * out-of-band data). * @write_page_raw: function to write a raw page without ECC. This function * should hide the specific layout used by the ECC * controller and consider the passed data as contiguous * in-band and out-of-band data. ECC controller is * responsible for doing the appropriate transformations * to adapt to its specific layout (e.g. * NAND_ECC_HW_SYNDROME interleaves in-band and * out-of-band data). * @read_page: function to read a page according to the ECC generator * requirements; returns maximum number of bitflips corrected in * any single ECC step, -EIO hw error * @read_subpage: function to read parts of the page covered by ECC; * returns same as read_page() * @write_subpage: function to write parts of the page covered by ECC. * @write_page: function to write a page according to the ECC generator * requirements. * @write_oob_raw: function to write chip OOB data without ECC * @read_oob_raw: function to read chip OOB data without ECC * @read_oob: function to read chip OOB data * @write_oob: function to write chip OOB data */ struct nand_ecc_ctrl { nand_ecc_modes_t mode; enum nand_ecc_algo algo; int steps; int size; int bytes; int total; int strength; int prepad; int postpad; unsigned int options; void *priv; u8 *calc_buf; u8 *code_buf; void (*hwctl)(struct nand_chip *chip, int mode); int (*calculate)(struct nand_chip *chip, const uint8_t *dat, uint8_t *ecc_code); int (*correct)(struct nand_chip *chip, uint8_t *dat, uint8_t *read_ecc, uint8_t *calc_ecc); int (*read_page_raw)(struct nand_chip *chip, uint8_t *buf, int oob_required, int page); int (*write_page_raw)(struct nand_chip *chip, const uint8_t *buf, int oob_required, int page); int (*read_page)(struct nand_chip *chip, uint8_t *buf, int oob_required, int page); int (*read_subpage)(struct nand_chip *chip, uint32_t offs, uint32_t len, uint8_t *buf, int page); int (*write_subpage)(struct nand_chip *chip, uint32_t offset, uint32_t data_len, const uint8_t *data_buf, int oob_required, int page); int (*write_page)(struct nand_chip *chip, const uint8_t *buf, int oob_required, int page); int (*write_oob_raw)(struct nand_chip *chip, int page); int (*read_oob_raw)(struct nand_chip *chip, int page); int (*read_oob)(struct nand_chip *chip, int page); int (*write_oob)(struct nand_chip *chip, int page); };
2.3.7 struct nand_manufacturer
nand_manufacturer保存生产厂家信息,定义在drivers/mtd/nand/raw/internals.h:
/* * NAND Flash Manufacturer ID Codes */ #define NAND_MFR_AMD 0x01 #define NAND_MFR_ATO 0x9b #define NAND_MFR_EON 0x92 #define NAND_MFR_ESMT 0xc8 #define NAND_MFR_FUJITSU 0x04 #define NAND_MFR_HYNIX 0xad #define NAND_MFR_INTEL 0x89 #define NAND_MFR_MACRONIX 0xc2 #define NAND_MFR_MICRON 0x2c #define NAND_MFR_NATIONAL 0x8f #define NAND_MFR_RENESAS 0x07 #define NAND_MFR_SAMSUNG 0xec // 三星厂家 #define NAND_MFR_SANDISK 0x45 #define NAND_MFR_STMICRO 0x20 #define NAND_MFR_TOSHIBA 0x98 #define NAND_MFR_WINBOND 0xef /** * struct nand_manufacturer_ops - NAND Manufacturer operations * @detect: detect the NAND memory organization and capabilities * @init: initialize all vendor specific fields (like the ->read_retry() * implementation) if any. * @cleanup: the ->init() function may have allocated resources, ->cleanup() * is here to let vendor specific code release those resources. * @fixup_onfi_param_page: apply vendor specific fixups to the ONFI parameter * page. This is called after the checksum is verified. */ struct nand_manufacturer_ops { void (*detect)(struct nand_chip *chip); int (*init)(struct nand_chip *chip); void (*cleanup)(struct nand_chip *chip); void (*fixup_onfi_param_page)(struct nand_chip *chip, struct nand_onfi_params *p); }; /** * struct nand_manufacturer - NAND Flash Manufacturer structure * @name: Manufacturer name * @id: manufacturer ID code of device. * @ops: manufacturer operations */ struct nand_manufacturer { int id; // 厂家ID char *name; // 厂家名字 const struct nand_manufacturer_ops *ops; // 操作函数 };
2.3.8 struct nand_device
struct nand_device定义在include/linux/mtd/nand.h:
/** * struct nand_device - NAND device * @mtd: MTD instance attached to the NAND device * @memorg: memory layout * @eccreq: ECC requirements * @rowconv: position to row address converter * @bbt: bad block table info * @ops: NAND operations attached to the NAND device * * Generic NAND object. Specialized NAND layers (raw NAND, SPI NAND, OneNAND) * should declare their own NAND object embedding a nand_device struct (that's * how inheritance is done). * struct_nand_device->memorg and struct_nand_device->eccreq should be filled * at device detection time to reflect the NAND device * capabilities/requirements. Once this is done nanddev_init() can be called. * It will take care of converting NAND information into MTD ones, which means * the specialized NAND layers should never manually tweak * struct_nand_device->mtd except for the ->_read/write() hooks. */ struct nand_device { struct mtd_info mtd; struct nand_memory_organization memorg; struct nand_ecc_req eccreq; struct nand_row_converter rowconv; struct nand_bbt bbt; const struct nand_ops *ops; };
2.3.9 结构体关系图
2.4 核心函数
如果MTD设备只有一个分区,那么使用下面两个函数注册和注销MTD设备:
int add_mtd_device(struct mtd_info *mtd) int del_mtd_device (struct mtd_info *mtd)
如果MTD设备存在其他分区,那么使用下面两个函数注册和注销MTD设备:
int add_mtd_partitions(struct mtd_info *master,const struct mtd_partition *parts,int nbparts) int del_mtd_partitions(struct mtd_info *master)
三、MTD设备注册
3.1 add_mtd_device
add_mtd_device定义在drivers/mtd/mtdcore.c:
/** * add_mtd_device - register an MTD device * @mtd: pointer to new MTD device info structure * * Add a device to the list of MTD devices present in the system, and * notify each currently active MTD 'user' of its arrival. Returns * zero on success or non-zero on failure. */ int add_mtd_device(struct mtd_info *mtd) { struct mtd_notifier *not; int i, error; /* * May occur, for instance, on buggy drivers which call * mtd_device_parse_register() multiple times on the same master MTD, * especially with CONFIG_MTD_PARTITIONED_MASTER=y. */ if (WARN_ONCE(mtd->dev.type, "MTD already registered\n")) return -EEXIST; BUG_ON(mtd->writesize == 0); /* * MTD drivers should implement ->_{write,read}() or * ->_{write,read}_oob(), but not both. */ if (WARN_ON((mtd->_write && mtd->_write_oob) || // 校验函数指针 (mtd->_read && mtd->_read_oob))) return -EINVAL; if (WARN_ON((!mtd->erasesize || !mtd->_erase) && !(mtd->flags & MTD_NO_ERASE))) return -EINVAL; mutex_lock(&mtd_table_mutex); // 互斥锁 i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 为mtd设备分配index if (i < 0) { error = i; goto fail_locked; } mtd->index = i; mtd->usecount = 0; /* default value if not set by driver */ if (mtd->bitflip_threshold == 0) // 计算擦除数据偏移 mtd->bitflip_threshold = mtd->ecc_strength; if (is_power_of_2(mtd->erasesize)) mtd->erasesize_shift = ffs(mtd->erasesize) - 1; else mtd->erasesize_shift = 0; if (is_power_of_2(mtd->writesize)) // 计算写入数据偏移值 mtd->writesize_shift = ffs(mtd->writesize) - 1; else mtd->writesize_shift = 0; mtd->erasesize_mask = (1 << mtd->erasesize_shift) - 1; // 计算擦除数据大小掩码 mtd->writesize_mask = (1 << mtd->writesize_shift) - 1; // 计算写入数据大小掩码 /* Some chips always power up locked. Unlock them now */ if ((mtd->flags & MTD_WRITEABLE) && (mtd->flags & MTD_POWERUP_LOCK)) { // 有些芯片总是通电锁定,立即解锁(一般flash芯片都支持lock机制,在驱动上很少使用) error = mtd_unlock(mtd, 0, mtd->size); if (error && error != -EOPNOTSUPP) printk(KERN_WARNING "%s: unlock failed, writes may not work\n", mtd->name); /* Ignore unlock failures? */ error = 0; } /* Caller should have set dev.parent to match the * physical device, if appropriate. */ mtd->dev.type = &mtd_devtype; // 设置设备类型 mtd->dev.class = &mtd_class; // 设置设备类 会在/syc/class创建mtd类 mtd->dev.devt = MTD_DEVT(i); // 设置设备号,关于设备号的申请是在mtdchar.c模块入口函数中完成的 dev_set_name(&mtd->dev, "mtd%d", i); // 设置设备节点名字mtd%d dev_set_drvdata(&mtd->dev, mtd); // mtd->dev.driver_data = mtd; of_node_get(mtd_get_of_node(mtd)); error = device_register(&mtd->dev); // 注册MTD字符设备,会在/sys/class/mtd类下创建mtd%d文件,然后mdev通过这个自动创建/dev/mtd%d这个字符设备节点 if (error) goto fail_added; /* Add the nvmem provider */ error = mtd_nvmem_add(mtd); if (error) goto fail_nvmem_add; if (!IS_ERR_OR_NULL(dfs_dir_mtd)) { mtd->dbg.dfs_dir = debugfs_create_dir(dev_name(&mtd->dev), dfs_dir_mtd); if (IS_ERR_OR_NULL(mtd->dbg.dfs_dir)) { pr_debug("mtd device %s won't show data in debugfs\n", dev_name(&mtd->dev)); } } device_create(&mtd_class, mtd->dev.parent, MTD_DEVT(i) + 1, NULL, // 创建MTD字符设备,内部调用了device_register 在/sys/class/mtd下创建mtd%dro设备,然后mdev通过这个自动创建/dev/mtd%dro这个字符设备节点 "mtd%dro", i); pr_debug("mtd: Giving out device %d to %s\n", i, mtd->name); /* No need to get a refcount on the module containing the notifier, since we hold the mtd_table_mutex */ list_for_each_entry(not, &mtd_notifiers, list) // 调用mtd子系统的notify机制,实现针对mtd设备添加、移除,移除notify机制,实现注册的notify hook not->add(mtd); mutex_unlock(&mtd_table_mutex); // 解锁 /* We _know_ we aren't being removed, because our caller is still holding us here. So none of this try_ nonsense, and no bitching about it either. :) */ __module_get(THIS_MODULE); return 0; fail_nvmem_add: device_unregister(&mtd->dev); fail_added: of_node_put(mtd_get_of_node(mtd)); idr_remove(&mtd_idr, i); fail_locked: mutex_unlock(&mtd_table_mutex); return error; }
该函数主要进行了以下操作:
(1) 对mtd原始设备必要字段以及函数指针进行校验;
(2) 在mtd_idr树中为该mtd原始设备分配节点,并返回分配的节点ID:
i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 分配ID mtd_idr是一个redix树、将mtd与新分配的ID关联
idr_alloc函数用于为mtd_idr树新增一个节点,该节点在mtd_idr树中有唯一的ID,并且将这个节点与mtd关联。通过ID就可以定位到mtd。
此外该函数第三个参数和第四个参数含义如下:为ID的起始范围,结束范围设置为0,表示mtd_idr树允许的最大ID。
全局变量mtd_idr定义在drivers/mtd/mtdcore.c:
static DEFINE_IDR(mtd_idr);
关于IDR的定义这里就不介绍了,IDR主要实现ID与数据结构的绑定具体可以参考linux内核IDR机制详解(一)。
后续字符设备及块设备注册需要该ID,比如后面设置mtd设备对应的device类型变量设备号为MTD_DEVT(i);
#define MTD_DEVT(index) MKDEV(MTD_CHAR_MAJOR, (index)*2)
主设备号为MTD_CHAR_MAJOR,即90,次设备号为index*2;
(3) 设备mtd原始设备的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息;
(4) 针对设置可写属性,且上电时对Flash进行lock的芯片,则调用unlock接口,进行解锁(一般Flasg芯片都支持lock机制,但在驱动上很少使用);
(5) 设置mtd原始设备对应的device类型变量所属的class为mtd_class,并设置其设备号,类型、名称、driver_data;
mtd_class定义为:
static struct class mtd_class = { .name = "mtd", .owner = THIS_MODULE, .pm = MTD_CLS_PM_OPS, };
(6) 调用device_register完成名字为mtd%d MTD字符设备的注册;
(7)调用device_create完成名字为mtd%dro MTD字符设备的创建、初始化以及注册;
(8) 调用mtd子系统的notify机制,实现针对mtd设备添加、移除,移除notify机制,实现注册的notify hook;
list_for_each_entry(not, &mtd_notifiers, list) not->add(mtd);
list_for_each_entry函数包含三个参数,以此为pos、head、member;它实际上是一个for循环,利用传入的pos作为循环变量,从链表头head开始,逐项向后(next方向)移动pos,直至又回到head。
链表mtd_notifiers定义为:
static LIST_HEAD(mtd_notifiers);
这里实际上就是遍历这个链表得到当前时刻的元素not,类型为mtd_notifiers,然后调用not->add(mtd)方法,在这个方法里会进行名字为mtdblock%d MTD块设备的注册。
3.2 add_mtd_partitions
add_mtd_partitions定义在drivers/mtd/mtdpart.c:
/* * This function, given a master MTD object and a partition table, creates * and registers slave MTD objects which are bound to the master according to * the partition definitions. * * For historical reasons, this function's caller only registers the master * if the MTD_PARTITIONED_MASTER config option is set. */ int add_mtd_partitions(struct mtd_info *master, // MTD设备信息 const struct mtd_partition *parts, // 分区表 int nbparts) // 分区个数 { struct mtd_part *slave; uint64_t cur_offset = 0; int i, ret; printk(KERN_NOTICE "Creating %d MTD partitions on \"%s\":\n", nbparts, master->name); for (i = 0; i < nbparts; i++) { // 遍历分区表 slave = allocate_partition(master, parts + i, i, cur_offset); // 分配mtd_part if (IS_ERR(slave)) { ret = PTR_ERR(slave); goto err_del_partitions; } mutex_lock(&mtd_partitions_mutex); list_add(&slave->list, &mtd_partitions); // slave添加到链表mtd_partitions mutex_unlock(&mtd_partitions_mutex); ret = add_mtd_device(&slave->mtd); // 为每个分区注册mtd设备,会在/dev下成成mtdblock%d文件块设备文件 if (ret) { mutex_lock(&mtd_partitions_mutex); list_del(&slave->list); mutex_unlock(&mtd_partitions_mutex); free_partition(slave); goto err_del_partitions; } mtd_add_partition_attrs(slave); /* Look for subpartitions */ parse_mtd_partitions(&slave->mtd, parts[i].types, NULL); cur_offset = slave->offset + slave->mtd.size; } return 0; err_del_partitions: del_mtd_partitions(master); return ret; }
3.2.1 allocate_partition
allocate_partition定义在drivers/mtd/mtdpart.c:
static struct mtd_part *allocate_partition(struct mtd_info *parent, const struct mtd_partition *part, int partno, uint64_t cur_offset) { int wr_alignment = (parent->flags & MTD_NO_ERASE) ? parent->writesize : parent->erasesize; struct mtd_part *slave; u32 remainder; char *name; u64 tmp; /* allocate the partition structure */ slave = kzalloc(sizeof(*slave), GFP_KERNEL); name = kstrdup(part->name, GFP_KERNEL); if (!name || !slave) { printk(KERN_ERR"memory allocation error while creating partitions for \"%s\"\n", parent->name); kfree(name); kfree(slave); return ERR_PTR(-ENOMEM); } /* set up the MTD object for this partition */ slave->mtd.type = parent->type; slave->mtd.flags = parent->orig_flags & ~part->mask_flags; slave->mtd.orig_flags = slave->mtd.flags; slave->mtd.size = part->size; slave->mtd.writesize = parent->writesize; slave->mtd.writebufsize = parent->writebufsize; slave->mtd.oobsize = parent->oobsize; slave->mtd.oobavail = parent->oobavail; slave->mtd.subpage_sft = parent->subpage_sft; slave->mtd.pairing = parent->pairing; slave->mtd.name = name; slave->mtd.owner = parent->owner; /* NOTE: Historically, we didn't arrange MTDs as a tree out of * concern for showing the same data in multiple partitions. * However, it is very useful to have the master node present, * so the MTD_PARTITIONED_MASTER option allows that. The master * will have device nodes etc only if this is set, so make the * parent conditional on that option. Note, this is a way to * distinguish between the master and the partition in sysfs. */ slave->mtd.dev.parent = IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER) || mtd_is_partition(parent) ? &parent->dev : parent->dev.parent; slave->mtd.dev.of_node = part->of_node; if (parent->_read) slave->mtd._read = part_read; if (parent->_write) slave->mtd._write = part_write; if (parent->_panic_write) slave->mtd._panic_write = part_panic_write; if (parent->_point && parent->_unpoint) { slave->mtd._point = part_point; slave->mtd._unpoint = part_unpoint; } if (parent->_read_oob) slave->mtd._read_oob = part_read_oob; if (parent->_write_oob) slave->mtd._write_oob = part_write_oob; if (parent->_read_user_prot_reg) slave->mtd._read_user_prot_reg = part_read_user_prot_reg; if (parent->_read_fact_prot_reg) slave->mtd._read_fact_prot_reg = part_read_fact_prot_reg; if (parent->_write_user_prot_reg) slave->mtd._write_user_prot_reg = part_write_user_prot_reg; if (parent->_lock_user_prot_reg) slave->mtd._lock_user_prot_reg = part_lock_user_prot_reg; if (parent->_get_user_prot_info) slave->mtd._get_user_prot_info = part_get_user_prot_info; if (parent->_get_fact_prot_info) slave->mtd._get_fact_prot_info = part_get_fact_prot_info; if (parent->_sync) slave->mtd._sync = part_sync; if (!partno && !parent->dev.class && parent->_suspend && parent->_resume) { slave->mtd._suspend = part_suspend; slave->mtd._resume = part_resume; } if (parent->_writev) slave->mtd._writev = part_writev; if (parent->_lock) slave->mtd._lock = part_lock; if (parent->_unlock) slave->mtd._unlock = part_unlock; if (parent->_is_locked) slave->mtd._is_locked = part_is_locked; if (parent->_block_isreserved) slave->mtd._block_isreserved = part_block_isreserved; if (parent->_block_isbad) slave->mtd._block_isbad = part_block_isbad; if (parent->_block_markbad) slave->mtd._block_markbad = part_block_markbad; if (parent->_max_bad_blocks) slave->mtd._max_bad_blocks = part_max_bad_blocks; if (parent->_get_device) slave->mtd._get_device = part_get_device; if (parent->_put_device) slave->mtd._put_device = part_put_device; slave->mtd._erase = part_erase; slave->parent = parent; slave->offset = part->offset; if (slave->offset == MTDPART_OFS_APPEND) slave->offset = cur_offset; if (slave->offset == MTDPART_OFS_NXTBLK) { tmp = cur_offset; slave->offset = cur_offset; remainder = do_div(tmp, wr_alignment); if (remainder) { slave->offset += wr_alignment - remainder; printk(KERN_NOTICE "Moving partition %d: " "0x%012llx -> 0x%012llx\n", partno, (unsigned long long)cur_offset, (unsigned long long)slave->offset); } } if (slave->offset == MTDPART_OFS_RETAIN) { slave->offset = cur_offset; if (parent->size - slave->offset >= slave->mtd.size) { slave->mtd.size = parent->size - slave->offset - slave->mtd.size; } else { printk(KERN_ERR "mtd partition \"%s\" doesn't have enough space: %#llx < %#llx, disabled\n", part->name, parent->size - slave->offset, slave->mtd.size); /* register to preserve ordering */ goto out_register; } } if (slave->mtd.size == MTDPART_SIZ_FULL) slave->mtd.size = parent->size - slave->offset; printk(KERN_NOTICE "0x%012llx-0x%012llx : \"%s\"\n", (unsigned long long)slave->offset, (unsigned long long)(slave->offset + slave->mtd.size), slave->mtd.name); /* let's do some sanity checks */ if (slave->offset >= parent->size) { /* let's register it anyway to preserve ordering */ slave->offset = 0; slave->mtd.size = 0; /* Initialize ->erasesize to make add_mtd_device() happy. */ slave->mtd.erasesize = parent->erasesize; printk(KERN_ERR"mtd: partition \"%s\" is out of reach -- disabled\n", part->name); goto out_register; } if (slave->offset + slave->mtd.size > parent->size) { slave->mtd.size = parent->size - slave->offset; printk(KERN_WARNING"mtd: partition \"%s\" extends beyond the end of device \"%s\" -- size truncated to %#llx\n", part->name, parent->name, (unsigned long long)slave->mtd.size); } if (parent->numeraseregions > 1) { /* Deal with variable erase size stuff */ int i, max = parent->numeraseregions; u64 end = slave->offset + slave->mtd.size; struct mtd_erase_region_info *regions = parent->eraseregions; /* Find the first erase regions which is part of this * partition. */ for (i = 0; i < max && regions[i].offset <= slave->offset; i++) ; /* The loop searched for the region _behind_ the first one */ if (i > 0) i--; /* Pick biggest erasesize */ for (; i < max && regions[i].offset < end; i++) { if (slave->mtd.erasesize < regions[i].erasesize) { slave->mtd.erasesize = regions[i].erasesize; } } BUG_ON(slave->mtd.erasesize == 0); } else { /* Single erase size */ slave->mtd.erasesize = parent->erasesize; } /* * Slave erasesize might differ from the master one if the master * exposes several regions with different erasesize. Adjust * wr_alignment accordingly. */ if (!(slave->mtd.flags & MTD_NO_ERASE)) wr_alignment = slave->mtd.erasesize; tmp = part_absolute_offset(parent) + slave->offset; remainder = do_div(tmp, wr_alignment); if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) { /* Doesn't start on a boundary of major erase size */ /* FIXME: Let it be writable if it is on a boundary of * _minor_ erase size though */ slave->mtd.flags &= ~MTD_WRITEABLE; printk(KERN_WARNING"mtd: partition \"%s\" doesn't start on an erase/write block boundary -- force read-only\n", part->name); } tmp = part_absolute_offset(parent) + slave->mtd.size; remainder = do_div(tmp, wr_alignment); if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) { slave->mtd.flags &= ~MTD_WRITEABLE; printk(KERN_WARNING"mtd: partition \"%s\" doesn't end on an erase/write block -- force read-only\n", part->name); } mtd_set_ooblayout(&slave->mtd, &part_ooblayout_ops); slave->mtd.ecc_step_size = parent->ecc_step_size; slave->mtd.ecc_strength = parent->ecc_strength; slave->mtd.bitflip_threshold = parent->bitflip_threshold; if (parent->_block_isbad) { uint64_t offs = 0; while (offs < slave->mtd.size) { if (mtd_block_isreserved(parent, offs + slave->offset)) slave->mtd.ecc_stats.bbtblocks++; else if (mtd_block_isbad(parent, offs + slave->offset)) slave->mtd.ecc_stats.badblocks++; offs += slave->mtd.erasesize; } } out_register: return slave; }
3.2.2 mtd_partitions
链表mtd_partitions定义在drivers/mtd/mtdpart.c:
static LIST_HEAD(mtd_partitions);
3.3 mtd_device_register
宏mtd_device_register定义在include/linux/mtd/mtd.h:
#define mtd_device_register(master, parts, nr_parts) \ mtd_device_parse_register(master, NULL, NULL, parts, nr_parts)
函数mtd_device_parse_register定义在drivers/mtd/mtdcore.c:
/** * mtd_device_parse_register - parse partitions and register an MTD device. * * @mtd: the MTD device to register * @types: the list of MTD partition probes to try, see * 'parse_mtd_partitions()' for more information * @parser_data: MTD partition parser-specific data * @parts: fallback partition information to register, if parsing fails; * only valid if %nr_parts > %0 * @nr_parts: the number of partitions in parts, if zero then the full * MTD device is registered if no partition info is found * * This function aggregates MTD partitions parsing (done by * 'parse_mtd_partitions()') and MTD device and partitions registering. It * basically follows the most common pattern found in many MTD drivers: * * * If the MTD_PARTITIONED_MASTER option is set, then the device as a whole is * registered first. * * Then It tries to probe partitions on MTD device @mtd using parsers * specified in @types (if @types is %NULL, then the default list of parsers * is used, see 'parse_mtd_partitions()' for more information). If none are * found this functions tries to fallback to information specified in * @parts/@nr_parts. * * If no partitions were found this function just registers the MTD device * @mtd and exits. * * Returns zero in case of success and a negative error code in case of failure. */ int mtd_device_parse_register(struct mtd_info *mtd, const char * const *types, struct mtd_part_parser_data *parser_data, const struct mtd_partition *parts, // 分区表 int nr_parts) // 分区个数 { int ret; mtd_set_dev_defaults(mtd); if (IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER)) { // 将Nand Flash当做一个分区注册进内核 ret = add_mtd_device(mtd); // 注册MTD设备 if (ret) return ret; } /* Prefer parsed partitions over driver-provided fallback */ ret = parse_mtd_partitions(mtd, types, parser_data); if (ret > 0) ret = 0; else if (nr_parts) // 注册MTD设备 ret = add_mtd_partitions(mtd, parts, nr_parts); else if (!device_is_registered(&mtd->dev)) ret = add_mtd_device(mtd); else ret = 0; if (ret) goto out; /* * FIXME: some drivers unfortunately call this function more than once. * So we have to check if we've already assigned the reboot notifier. * * Generally, we can make multiple calls work for most cases, but it * does cause problems with parse_mtd_partitions() above (e.g., * cmdlineparts will register partitions more than once). */ WARN_ONCE(mtd->_reboot && mtd->reboot_notifier.notifier_call, "MTD already registered\n"); if (mtd->_reboot && !mtd->reboot_notifier.notifier_call) { mtd->reboot_notifier.notifier_call = mtd_reboot_notifier; register_reboot_notifier(&mtd->reboot_notifier); } out: if (ret && device_is_registered(&mtd->dev)) del_mtd_device(mtd); // 卸载MTD设备 return ret; }
四、mtdblock.c
之前我们已经介绍过mtdbloc.c文件,该文件实现了MTD块设备相关接口,我们直接定位到drivers/mtd/mtdblock.c文件,并对源码进行解析。
4.1 模块入口函数
我们定位到MTD块设备模块入口函数:
static struct mtd_blktrans_ops mtdblock_tr = { // 这里面定义了MTD块设备相关信息以及操作函数 .name = "mtdblock", .major = MTD_BLOCK_MAJOR, // MTD块设备主设备号 31 .part_bits = 0, // 磁盘设备分区位数 0表示不分区 1表示有2个分区 2表示有4个分区... .blksize = 512, // 扇区大小 .open = mtdblock_open, .flush = mtdblock_flush, .release = mtdblock_release, .readsect = mtdblock_readsect, .writesect = mtdblock_writesect, .add_mtd = mtdblock_add_mtd, .remove_dev = mtdblock_remove_dev, .owner = THIS_MODULE, }; static int __init init_mtdblock(void) { return register_mtd_blktrans(&mtdblock_tr); }
4.2 register_mtd_blktrans
定位到register_mtd_blktrans函数,该函数位于drivers/mtd/mtd_blkdevs.c:
int register_mtd_blktrans(struct mtd_blktrans_ops *tr) { struct mtd_info *mtd; int ret; /* Register the notifier if/when the first device type is registered, to prevent the link/init ordering from fucking us over. */ if (!blktrans_notifier.list.next) // next指向NULL,进入 register_mtd_user(&blktrans_notifier); // 注册blktrans_notifier到mtd_notifiers链表 mutex_lock(&mtd_table_mutex); ret = register_blkdev(tr->major, tr->name); // 注册块设备,主设备号为MTD_BLOCK_MAJOR,定义为31 if (ret < 0) { printk(KERN_WARNING "Unable to register %s block device on major %d: %d\n", tr->name, tr->major, ret); mutex_unlock(&mtd_table_mutex); return ret; } if (ret) tr->major = ret; tr->blkshift = ffs(tr->blksize) - 1; INIT_LIST_HEAD(&tr->devs); list_add(&tr->list, &blktrans_majors); // 注册tr到链表blktrans_majors mtd_for_each_device(mtd) if (mtd->type != MTD_ABSENT) tr->add_mtd(tr, mtd); mutex_unlock(&mtd_table_mutex); return 0; }
该函数主要包含三部分:
- 调用register_mtd_user:注册blktrans_notifier到链表mtd_notifiers,然后遍历全局变量mtd_idr获取mtd,执行blktrans_notify_add(mtd);
- 调用register_blkdev注册块设备,主设备号为31,块设备名称为mtdblock;
- 注册mtdblock_tr到链表blktrans_majors,链表定义为static LIST_HEAD(blktrans_majors);;
- 然后遍历全局变量mtd_idr获取mtd,执行mtdblock_add_mtd(mtdblock_tr,mtd);
4.2.1 mtd_notifier
mtd_notifier定义在include/linux/mtd/mtd.h:
struct mtd_notifier { void (*add)(struct mtd_info *mtd); void (*remove)(struct mtd_info *mtd); struct list_head list; };
4.2.2 blktrans_notifier
这里我们关注一下register_mtd_user(&blktrans_notifier),变量blktrans_notifier,定义在drivers/mtd/mtd_blkdevs.c:
static struct mtd_notifier blktrans_notifier = { .add = blktrans_notify_add, .remove = blktrans_notify_remove, };
4.2.3 register_mtd_user
register_mtd_user函数将new->list添加到链表mtd_notifiers:
/** * register_mtd_user - register a 'user' of MTD devices. * @new: pointer to notifier info structure * * Registers a pair of callbacks function to be called upon addition * or removal of MTD devices. Cau ses the 'add' callback to be immediately * invoked for each MTD device currently present in the system. */ void register_mtd_user (struct mtd_notifier *new) { struct mtd_info *mtd; mutex_lock(&mtd_table_mutex); // 互斥锁 list_add(&new->list, &mtd_notifiers); // 加入链表 __module_get(THIS_MODULE); mtd_for_each_device(mtd) // 遍历mtd_idr,得到mtd new->add(mtd); // 最终执行blktrans_notify_add(mtd) mutex_unlock(&mtd_table_mutex); // 解锁 }
4.2.4 mtd_for_each_device
mtd_for_each_device宏定义在drivers/mtd/mtdcore.h:
#define mtd_for_each_device(mtd) \ for ((mtd) = __mtd_next_device(0); \ (mtd) != NULL; \ (mtd) = __mtd_next_device(mtd->index + 1))
__mtd_next_device定义在drivers/mtd/mtdcore.c:
struct mtd_info *__mtd_next_device(int i) { return idr_get_next(&mtd_idr, &i); }
这里实际上就是去遍历mtd_idr这个redix树上的所有节点,得到每个节点关联的mtd。
4.2.5 blktrans_notify_add
然后进入blktrans_notifier变量的blktrans_notify_add ()函数。
static void blktrans_notify_add(struct mtd_info *mtd) { struct mtd_blktrans_ops *tr; if (mtd->type == MTD_ABSENT) return; list_for_each_entry(tr, &blktrans_majors, list) // 遍历blktrans_majors链表 tr->add_mtd(tr, mtd); // 执行mtd_blktrans_ops结构体的add_mtd }
在MTD块设备驱动入口函数中,会将mtdblock_tr添加到链表blktrans_majors,所以这里遍历blktrans_majors链表,实际上得到的tr就是mtdblock_tr:然后执行mtdblock_tr.add_mtd(mtdblock_tr,mtd)方法。
mtdblock_tr的add_mtd函数,就是mtdblock_add_mtd函数。
4.2.6 在mtdblock_add_mtd
static void mtdblock_add_mtd(struct mtd_blktrans_ops *tr, struct mtd_info *mtd) { struct mtdblk_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) return; dev->mbd.mtd = mtd; // 设置MTD原始设备 dev->mbd.devnum = mtd->index; // 设置起始次设备号 dev->mbd.size = mtd->size >> 9; // 总扇区个数 dev->mbd.tr = tr; if (!(mtd->flags & MTD_WRITEABLE)) dev->mbd.readonly = 1; if (add_mtd_blktrans_dev(&dev->mbd)) kfree(dev); }
mtdblock_add_mtd函数:
- 分配了一个mtdblk_dev结构体遍历dev:
- 初始化dev成员;
- 调用add_mtd_blktrans_dev(dev->mtd);
mtdblk_dev数据结构实际描述的就是一个MTD块设备,其包含MTD原始设备,定义在drivers/mtd/mtdblock.c:
struct mtdblk_dev { struct mtd_blktrans_dev mbd; int count; struct mutex cache_mutex; unsigned char *cache_data; unsigned long cache_offset; unsigned int cache_size; enum { STATE_EMPTY, STATE_CLEAN, STATE_DIRTY } cache_state; };
struct mtd_blktrans_dev { struct mtd_blktrans_ops *tr; // MTD设备相关信息以及操作函数 struct list_head list; struct mtd_info *mtd; // MTD原始设备 struct mutex lock; int devnum; // 用于计算起始次设备号(devnum<<tr->part_bits,左移0位),由于一个MTD块设备可能存在若干个分区,假设有2个分区 那两个分区次设备号就是devnum+1,devnum+2,其中devnum表示整个磁盘 bool bg_stop; unsigned long size; // 扇区个数 int readonly; int open; struct kref ref; struct gendisk *disk; // 磁盘设备 struct attribute_group *disk_attributes; struct request_queue *rq; // 请求队列 struct list_head rq_list; struct blk_mq_tag_set *tag_set; // 标签集 spinlock_t queue_lock; void *priv; fmode_t file_mode; };
4.2.7 add_mtd_blktrans_dev
add_mtd_blktrans_dev定义在drivers/mtd/mtd_blkdevs.c:
int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) { struct mtd_blktrans_ops *tr = new->tr; struct mtd_blktrans_dev *d; int last_devnum = -1; struct gendisk *gd; int ret; if (mutex_trylock(&mtd_table_mutex)) { mutex_unlock(&mtd_table_mutex); BUG(); } mutex_lock(&blktrans_ref_mutex); list_for_each_entry(d, &tr->devs, list) { // tr->devs是个链表,遍历链表得到mtd_blktrans_dev if (new->devnum == -1) { // new设备未设置devnum号,分配一个空闲的devnum,默认从0开始分配,逐渐递增..... /* Use first free number */ if (d->devnum != last_devnum+1) { /* Found a free devnum. Plug it in here */ new->devnum = last_devnum+1; // 新的devnum list_add_tail(&new->list, &d->list); // 将当前new添加到链表尾部 goto added; } } else if (d->devnum == new->devnum) { // new设置的devnum已经被占用 /* Required number taken */ mutex_unlock(&blktrans_ref_mutex); return -EBUSY; } else if (d->devnum > new->devnum) { /* Required number was free */ list_add_tail(&new->list, &d->list); goto added; } last_devnum = d->devnum; // 更新最新设备分配的次设备号 } ret = -EBUSY; if (new->devnum == -1) new->devnum = last_devnum+1; /* Check that the device and any partitions will get valid * minor numbers and that the disk naming code below can cope * with this number. */ if (new->devnum > (MINORMASK >> tr->part_bits) || (tr->part_bits && new->devnum >= 27 * 26)) { mutex_unlock(&blktrans_ref_mutex); goto error1; } list_add_tail(&new->list, &tr->devs); added: mutex_unlock(&blktrans_ref_mutex); mutex_init(&new->lock); kref_init(&new->ref); if (!tr->writesect) new->readonly = 1; /* Create gendisk */ ret = -ENOMEM; gd = alloc_disk(1 << tr->part_bits); // 分配一个gendisk结构体,设置分区个数 if (!gd) goto error2; new->disk = gd; gd->private_data = new; // 私有数据 gd->major = tr->major; // 设置主设备号 gd->first_minor = (new->devnum) << tr->part_bits; // 设置起始次设备号 gd->fops = &mtd_block_ops; // 设置块设备操作函数 if (tr->part_bits) //0 if (new->devnum < 26) snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c", tr->name, 'a' + new->devnum); else snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c%c", tr->name, 'a' - 1 + new->devnum / 26, 'a' + new->devnum % 26); else // 设置磁盘名 即/dev/mtdblock%d snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%d", tr->name, new->devnum); set_capacity(gd, ((u64)new->size * tr->blksize) >> 9); // 设置容量 单位扇区 /* Create the request queue */ spin_lock_init(&new->queue_lock); INIT_LIST_HEAD(&new->rq_list); new->tag_set = kzalloc(sizeof(*new->tag_set), GFP_KERNEL); if (!new->tag_set) goto error3; new->rq = blk_mq_init_sq_queue(new->tag_set, &mtd_mq_ops, 2, BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING); // 设置请求队列,同时设置块设备驱动行为的回调函数为mtd_mq_ops if (IS_ERR(new->rq)) { ret = PTR_ERR(new->rq); new->rq = NULL; goto error4; } if (tr->flush) blk_queue_write_cache(new->rq, true, false); new->rq->queuedata = new; blk_queue_logical_block_size(new->rq, tr->blksize); blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq); blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq); if (tr->discard) { blk_queue_flag_set(QUEUE_FLAG_DISCARD, new->rq); blk_queue_max_discard_sectors(new->rq, UINT_MAX); } gd->queue = new->rq; // 设置请求队列 if (new->readonly) set_disk_ro(gd, 1); device_add_disk(&new->mtd->dev, gd, NULL); // 向内核注册gendisk if (new->disk_attributes) { ret = sysfs_create_group(&disk_to_dev(gd)->kobj, new->disk_attributes); WARN_ON(ret); } return 0; error4: kfree(new->tag_set); error3: put_disk(new->disk); error2: list_del(&new->list); error1: return ret; }
从该函数我们可以看到无论注册多少个MTD块设备,其主设备号都是31,只是次设备号不一样而已,主设备号用来表示一个特定的驱动程序。次设备号用来表示使用该驱动程序的各设备。
4.2.8 mtd_block_ops
这里我们关注一下MTD块设备操作集mtd_block_ops,定义在drivers/mtd/mtd_blkdevs.c。
static const struct block_device_operations mtd_block_ops = { .owner = THIS_MODULE, .open = blktrans_open, .release = blktrans_release, .ioctl = blktrans_ioctl, .getgeo = blktrans_getgeo, };
其中部分函数指针的意义:
- open:当打开一个MTD块设备的时候被调用;
- release:当关闭一个MTD块设备的时候被调用;
- getgeo:获取驱动器的集合信息,获取到的信息会被填充在一个hd_geometry结构中;
- ioctl:对MTD块设备进行一些特殊操作时调用;
4.2.9 blktrans_open
static int blktrans_open(struct block_device *bdev, fmode_t mode) { struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk); int ret = 0; if (!dev) return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/ mutex_lock(&mtd_table_mutex); mutex_lock(&dev->lock); if (dev->open) goto unlock; kref_get(&dev->ref); __module_get(dev->tr->owner); if (!dev->mtd) goto unlock; if (dev->tr->open) { ret = dev->tr->open(dev); // 实际上调用了mtd_blktrans_ops的open函数 if (ret) goto error_put; } ret = __get_mtd_device(dev->mtd); if (ret) goto error_release; dev->file_mode = mode; unlock: dev->open++; mutex_unlock(&dev->lock); mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); return ret; error_release: if (dev->tr->release) dev->tr->release(dev); error_put: module_put(dev->tr->owner); kref_put(&dev->ref, blktrans_dev_release); mutex_unlock(&dev->lock); mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev);
4.2.10 blktrans_ioctl
static int blktrans_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk); int ret = -ENXIO; if (!dev) return ret; mutex_lock(&dev->lock); if (!dev->mtd) goto unlock; switch (cmd) { case BLKFLSBUF: ret = dev->tr->flush ? dev->tr->flush(dev) : 0; break; default: ret = -ENOTTY; } unlock: mutex_unlock(&dev->lock); blktrans_dev_put(dev); return ret; }
4.2.11 mtd_mq_ops
这里我们关注一下MTD块设备驱动mq的操作集合,定义在drivers/mtd/mtd_blkdevs.c。
static const struct blk_mq_ops mtd_mq_ops = { .queue_rq = mtd_queue_rq, };
在上一节分析我们已经知道将request请求派发给块设备驱动的时候会被调用queue_rq函数,该函数本质上就是进行磁盘和内存之间的数据交互操作。比如将内存数据写入磁盘、或者从磁盘读取数据到内存等。
static blk_status_t mtd_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { struct mtd_blktrans_dev *dev; dev = hctx->queue->queuedata; if (!dev) { blk_mq_start_request(bd->rq); return BLK_STS_IOERR; } spin_lock_irq(&dev->queue_lock); list_add_tail(&bd->rq->queuelist, &dev->rq_list); mtd_blktrans_work(dev); // 这里就不细究了,读取操作会调用mtdblock_tr.readsect、写入操作会调用mtdblock_tr.writesect,有兴趣自己研究哈 spin_unlock_irq(&dev->queue_lock); return BLK_STS_OK; }
4.3 MTD块设备流程图
register_mtd_blktrans函数执行流程如图:
MTD块设备的入口函数:
- 将blktrans_notifier添加到mtd_notifiers链表中;
- 上图第一个双向循环里mtd_idr树只有根节点,所以并不会进入循环,循环内这块代码不会执行;
- 然后接着注册块设备号主设备号,主设备号为31,块设备名称为mtdblock;
- 然后进入下面第二个循环里,同理,第二个循环也不会进入。
然后在add_mtd_device(mtd)函数中:
- 为mtd原始设备分配节点;
- 设置mtd原始设备的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息;
- 设置mtd原始设备对应的device类型变量所属的class为mtd_class,并设置其设备号,类型、名称、driver_data;调用device_register完成名字为mtd%d MTD字符设备的注册;
- 调用device_create完成名字为mtd%dro MTD字符设备的创建、初始化以及注册;
- 遍历blktrans_notifier,当查找到有blktrans_notifier时,就调用blktrans_notifier->add(mtd):
- 分配gendisk结构体,设置成员参数:
- private_data;
- 设置主设备号major(MTD_BLOCK_MAJOR,值为31);
- 设置起始次设备号first_minor(如果注册了多个MTD设备,该值是逐渐递增的);
- 磁盘设备disk_name,设置为mtdblock%d,会在/dev下创建该文件;
- 块设备操作集fops;
- 初始化请求队列;
- 最后注册gendisk。
比如开发板启动后,我们加载Nand Flash驱动后,可以查看到如下信息:
[root@zy:/]# ls /sys/class/mtd/ -l total 0 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd0 -> ../../devices/virtual/mtd/mtd0 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd0ro -> ../../devices/virtual/mtd/mtd0ro lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd1 -> ../../devices/virtual/mtd/mtd1 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd1ro -> ../../devices/virtual/mtd/mtd1ro lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd2 -> ../../devices/virtual/mtd/mtd2 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd2ro -> ../../devices/virtual/mtd/mtd2ro lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd3 -> ../../devices/virtual/mtd/mtd3 lrwxrwxrwx 1 0 0 0 Jan 1 01:19 mtd3ro -> ../../devices/virtual/mtd/mtd3ro [root@zy:/]# ls -l /dev/mtd* crw-rw---- 1 0 0 90, 0 Jan 1 00:00 /dev/mtd0 crw-rw---- 1 0 0 90, 1 Jan 1 00:00 /dev/mtd0ro crw-rw---- 1 0 0 90, 2 Jan 1 00:00 /dev/mtd1 crw-rw---- 1 0 0 90, 3 Jan 1 00:00 /dev/mtd1ro crw-rw---- 1 0 0 90, 4 Jan 1 00:00 /dev/mtd2 crw-rw---- 1 0 0 90, 5 Jan 1 00:00 /dev/mtd2ro crw-rw---- 1 0 0 90, 6 Jan 1 00:00 /dev/mtd3 crw-rw---- 1 0 0 90, 7 Jan 1 00:00 /dev/mtd3ro brw-rw---- 1 0 0 31, 0 Jan 1 00:00 /dev/mtdblock0 brw-rw---- 1 0 0 31, 1 Jan 1 00:00 /dev/mtdblock1 brw-rw---- 1 0 0 31, 2 Jan 1 00:00 /dev/mtdblock2 brw-rw---- 1 0 0 31, 3 Jan 1 00:00 /dev/mtdblock3
五、mtdchar.c
之前我们已经介绍过mtdchar.c文件,该文件实现了MTD字符设备相关接口,我们直接定位到drivers/mtd/mtdchar.c文件,并对源码进行解析。
5.1 模块入口函数
static const struct file_operations mtd_fops = { // 字符设备操作集 .owner = THIS_MODULE, .llseek = mtdchar_lseek, .read = mtdchar_read, .write = mtdchar_write, .unlocked_ioctl = mtdchar_unlocked_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = mtdchar_compat_ioctl, #endif .open = mtdchar_open, .release = mtdchar_close, .mmap = mtdchar_mmap, #ifndef CONFIG_MMU .get_unmapped_area = mtdchar_get_unmapped_area, .mmap_capabilities = mtdchar_mmap_capabilities, #endif }; int __init init_mtdchar(void) { int ret; ret = __register_chrdev(MTD_CHAR_MAJOR, 0, 1 << MINORBITS, // MTD字符设备主设备号90, MINORBITS=20 "mtd", &mtd_fops); // 字符设备名称为mtd%d if (ret < 0) { pr_err("Can't allocate major number %d for MTD\n", MTD_CHAR_MAJOR); return ret; } return ret; }
5.2 __register_chrdev
定位到__register_chrdev函数,该函数位于fs/char_dev.c:
/** * __register_chrdev() - create and register a cdev occupying a range of minors * @major: major device number or 0 for dynamic allocation * @baseminor: first of the requested range of minor numbers * @count: the number of minor numbers required * @name: name of this range of devices * @fops: file operations associated with this devices * * If @major == 0 this functions will dynamically allocate a major and return * its number. * * If @major > 0 this function will attempt to reserve a device with the given * major number and will return zero on success. * * Returns a -ve errno on failure. * * The name of this device has nothing to do with the name of the device in * /dev. It only helps to keep track of the different owners of devices. If * your module name has only one type of devices it's ok to use e.g. the name * of the module here. */ int __register_chrdev(unsigned int major, unsigned int baseminor, unsigned int count, const char *name, const struct file_operations *fops) { struct char_device_struct *cd; struct cdev *cdev; int err = -ENOMEM; cd = __register_chrdev_region(major, baseminor, count, name); // 静态注册一组字符设备号 if (IS_ERR(cd)) return PTR_ERR(cd); cdev = cdev_alloc(); // 动态申请字符设备 if (!cdev) goto out2; cdev->owner = fops->owner; // 初始化字符设备 cdev->ops = fops; kobject_set_name(&cdev->kobj, "%s", name); err = cdev_add(cdev, MKDEV(cd->major, baseminor), count); // 将字符设备注册到系统 if (err) goto out; cd->cdev = cdev; return major ? 0 : cd->major; out: kobject_put(&cdev->kobj); out2: kfree(__unregister_chrdev_region(cd->major, baseminor, count)); return err; }
实际上我们发现模块入口函数中主要进行了:
- 字符设备号的申请,主设备号90,次设备号数量1<<20;
- 字符设备的动态申请;
- 字符设备的注册;
但是这里并没有创建class类、以及类下的文件,这一块是在add_mtd_device中实现的:
- 调用class_create、device_create生成/sys/class下的class类(这里为mtd)以及class类下的dev文件,供mdev程序扫描生成/dev下的节点;
参考文章
[2]痞子衡嵌入式:并行NAND接口标准(ONFI)及SLC Raw NAND简介