About MTD(存储技术设备),Frequently Asked Questions!(FAQ)

Is an MTD device a block device or a char device?

First off, an MTD is a "Memory Technology Device", so it's just "MTD". An "MTD device" is a pleonasm.

Unix traditionally only knew block devices and character devices. Character devices were things like keyboards or mice, that you could read current data from, but couldn't be seek-ed and didn't have a size. Block devices had a fixed size and could be seek-ed. They also happened to be organized in blocks of multiple bytes, usually 512.

Flash doesn't match the description of either block or character devices. They behave similar to block device, but have differences. For example, block devices don't distinguish between write and erase operations. Therefore, a special device type to match flash characteristics was created: MTD.

So MTD is neither a block nor a char device. There are translations to use them, as if they were. But those translations are nowhere near the original, just like translated Chinese poems.

What are the differences between flash devices and block drives?

The following table describes the differences between block devices and raw flashes. Note, SSD, MMC, eMMC, RS-MMC, SD, mini-SD, micro-SD, USB flash drive, CompactFlash, MemoryStick, MemoryStick Micro, and other FTL devices are block devices, not raw flash devices. Of course, hard drives are also block devices.

Block device MTD device
Consists of sectors Consists of eraseblocks
Sectors are small (512, 1024 bytes) Eraseblocks are larger (typically 128KiB)
Maintains 2 main operations: read sector and write sector Maintains 3 main operations: read from eraseblock,write to eraseblock, and erase eraseblock
Bad sectors are re-mapped and hidden by hardware (at least in modern LBA hard drives); in case of FTL devices it is the responsibility of FTL to provide this Bad eraseblocks are not hidden and should be dealt with in software
Sectors are devoid of the wear-out property (in FTL devices it is the responsibility of FTL to provide this) Eraseblocks wear-out and become bad and unusable after about 103 (for MLC NAND) - 105 (NOR, SLC NAND) erase cycles

So as one sees flashes (MTD devices) are somewhat more difficult to work with.

Can I mount ext2 over an MTD device?

Ext2, ext3, XFS, JFS, FAT and other "conventional" file systems work with block devices. They are designed this way. Flashes are not block devices, they are very different beasts. Please, read this, and this FAQ entries.

Please, do not be confused by USB stick, MMC, SD, CompactFlash and other popular removable devices. Although they are also called "flash", they are not MTD devices. They are out of MTD subsystem's scope. Please, read this FAQ entry.

In order to use one of conventional file systems over an MTD device, you need a software layer which emulates a block device over the MTD device. These layers are often called Flash Translation Layers (FTLs).

There is an extremely simple FTL layer in Linux MTD subsystem - mtdblock. It emulates block devices over MTD devices. There is also an mtdblock_ro module which emulates read-only block devices. When you load this module, it creates a block device for each MTD device in the system. The block devices are then accessible via /dev/mtdblockX device nodes.

But in many cases using mtdblock is a very bad idea because what it basically does if you change any sector of your mtdblockX device, it reads the whole corresponding eraseblock into the memory, erases the eraseblock, changes the sector in RAM, and writes the whole eraseblock back. This is very straightforward. If you have a power failure when the eraseblock is being erased, you lose all the block device sectors in it. The flash will likely decay soon because you will wear few eraseblocks out - most probably those ones which contain FAT/bitmap/inode table/etc.

Unfortunately it is a rather difficult task to create a good FTL layer and nobody still managed to implement one for Linux. But now when we have UBI (see here) it is much easier to do it on top of UBI.

It makes sense to use mtdblock_ro for read-only file systems or read-only mounts. For example, one may use SquashFS as it compresses data quite well. But think twice before using mtdblock in read-write mode. And don't try to use it on NAND flash as it is does not handle bad eraseblocks.

What are the point() and unpoint() functions used for?

Mainly for NOR flash. As long as the flash is only read, it behaves just like normal memory. The read() function for NOR chips is essentially a memcpy(). For some purposes the extra memcpy() is a waste of time, so things can be optimized.

So the point() function does just that, it returns a pointer to the raw flash, so callers can operate directly on the flash.

But of course, things are a little more complicated than that. NOR flash chips can be in several different modes and only when in read mode will the above work. Therefore point() also locks the flash chip in addition to returning a pointer. And while locked, writes to the same flash chips have to wait. So callers have to call unpoint() soon after to release the chip again.

How is the flash partition layout specified?

Most programmers are used to dealing with hard disks. The hard disk partition table resides in a reserved sector on the hard disk. In an IBM-PC, all system software reads the partition layout from the partition table on the disk.

If a flash device emulates a hard disk, the partition table method may still be used. For example, CompactFlash flash modules look like an IDE hard drive to the Linux kernel.

In most embedded system, the flash device is used directly with no hard disk emulation. The MTD mapping driver provides accessor functions to read and write flash memory. The mapping driver can either specify a hard-coded partition layout, read the partition layout from the kernel command line passed in from the boot loader (i.e. U-Boot), or read the partition layout from flash storage (i.e. Redboot boot loader).

Even without partition support, the MTD layer provides access to the entire flash chip as an MTD device. With partition support, each MTD partition will be exported as a separate MTD device. Each device has a descriptive name which can viewed using the following command:

cat /proc/mtd

What filesystem types are suitable for flash?

For a conventional NOR flash, the MTD block device provide a crude block device similar to a hard disk. Traditional hard drives use a 512-byte sector. The MTD block device emulates this sector layout, but there is a severe performance penalty for writes. Since most flash device sectors are >= 64KiBytes in size, updating a 512-byte sector requires a read-modify-write sequence for the entire flash sector! This kind of write is slow and causes many extra erase cycles on the flash - typically a flash sector is rated for 100K – 1 million erase cycles over the device lifetime, so it is wise to limit erase cycles.

Because of this write performance issue, the MTD block device is suitable for read-only filesystems. Some typical read-only filesystems for embedded use are CRAMFS and ROMFS.

CRAMFS has the advantage of compressing each 4Kbyte cluster, providing 2:1 compression. Read-write capability is possible using flash-oriented filesystems such as JFFS2.

JFFS2 is a journaling flash filesystem (hence the name) – the ‘2’ distinguishes JFFS2 from the JFFS filesystem, a largely defunct predecessor. JFFS2 bypasses the block device layer (with its associated buffer cache) and writes directly to the underlying flash device. Naturally, JFFS2 supports both read-write and read-only modes of operation.

For a NAND flash, the filesystem MUST be NAND-aware because both reads and writes must implement ECC error detection/correction. The JFFS2 and YAFFS filesystems are NAND-compatible.

What is the purpose of the MTD character device?

The MTD devices come in two flavors: MTD block device drivers, and MTD character device drivers. The block devices provide a 512 bytes-per-sector layout, for use by the filesystems (HYPERLINK TO What filesystems are suitable for flash?). The character device provides a linear view of a MTD device or an MTD partition. You can read this device as you would any file. Standard UNIX utilities may be used to read the flash. Assuming MTD device 0 is the entire flash, the following command will dump the entire flash image to a file:

cat /dev/mtdchar0 > /tmp/flash.bin

Writing the flash is different. What happens if you run the following commands on a flash partition that already contains valid data?

cat /dev/mtdchar0 < new.bin
cmp /dev/mtdchar0 new.bin
/dev/mtdchar0 new.bin differ: char n, line x

The MTD character device will write the data to the flash, but it will not perform a flash erase command. On a NOR flash device, the write command can only change 1 bits into 0 bits. To change a bit from 0 to 1 requires an erase command. The MTD character device provides IOCTL’s to facilitate erasing. The flash sector geometry may be determined approximately using the MEMGETINFO command: it returns the ‘least common denominator’ erase size (usually 64KiB or 128KiB), ignoring the smaller boot blocks if present. The exact flash layout may be determined using the MEMGETREGIONCOUNT and MEMGETREGIONINFO commands. Once the flash sector geometry is determined, the MEMERASE command may be issued to erase the desired blocks.

MTD provides user-space applications to automate the erasing process. The following commands will correctly write the new image to flash. We assume that the flash does not support locking, or the sectors are already unlocked; otherwise the flash_unlock could be used to unlock the appropriate sectors.

flash_eraseall /dev/mtdchar0
cat /dev/mtdchar0 < new.bin

Table 1: MTD IOCTL commands
Name Description Argument
MEMGETINFO Get layout and capabilities struct mtd_info_user *
MEMERASE Erase flash blocks struct erase_info_user *
MEMLOCK Lock flash blocks to disallow changes struct erase_info_user *
MEMUNLOCK Unlock flash to allow changes struct erase_info_user *
MEMGETREGIONCOUNT Return number of erase block regions int *
MEMGETREGIONINFO struct region_info_user *

MEMWRITEOOB NAND only: write out-of-band info (ECC) struct mtd_oob_buf *
MEMREADOOB NAND only: read out-of-band info (ECC) struct mtd_oob_buf *
MEMSETOOBSEL NAND only: set default OOB info struct nand_oobinfo*

How do I access the flash from user-space applications?

If the flash is mounted as a filesystem, the normal open/close/read/write system calls will work (obviously write() will not function on a read-only filesystem).

Otherwise, the flash may be accessed using the MTD character device (HYPERLINK TO What is the purpose of the MTD character device?)

How do I upgrade the boot loader or kernel or root filesystem?

Each component (boot loader, kernel, root filesystem) usually has its own MTD device partition, which can be accessed by the MTD character device. Usually the kernel is executing instructions from RAM – although some handheld computers do execute in flash, a.k.a. XIP (execute-in-place). When the kernel is executing from RAM, the kernel flash partition may be updated freely. The root filesystem is a special case – if files are open in the root filesystem (i.e. executables) during the update, confusion will result. Even without open files, a root JFFS2 filesystem would get its internal data structures out-of-sync with the flash contents.

Upgrading the root filesystem usually is done on a file-by-file basis. Sometimes it is convenient to package the upgrade as a .tar or .tar.gz archive.

How much redundancy or error recovery do I need for my upgrade procedure?

Most redundancy schemes require some support from your boot-loader. At a minimum, you should store the image in RAM or a ramdisk and verify the image before writing it into flash. The amount of redundancy needed depends on your application reliability and cost requirements. Many inexpensive Linux devices, such as the Linksys WRT54G, do not have redundant images due to cost concerns.

How should I create an empty JFFS2 partition?

As noted in the JFFS2 FAQ, the JFFS2 filesystem uses marks erased blocks with ‘cleanmarkers’. The cleanmarker was introduced to address the scenario where the device powers down during a flash block erase. If the cleanmarker or another node type is not present in the block, JFFS2 will redo the erase operation and write the cleanmarker at the beginning of the block.

The ‘-j’ option to the flash_eraseall command inserts the cleanmarker at the beginning of each block, so that the JFFS2 won’t redo the erase operation.

How much overhead space does JFFS2 use for its own book-keeping?

JFFS2 requires five spare erase blocks to implement garbage collection. On a two bit-per-cell device such as Intel StrataFlash or Spansion MirrorBit, the erase block size is 128KiB, so the wasted space is more than half a megabyte.

The spare erase blocks requirements are defined in fs/jffs2/nodelist.h. The JFFS2_RESERVED_BLOCK_BASE parameter is 3 by default. If you change this value to 1, you’ll save two erase blocks. If you change this value, you should do some stress testing to verify nothing was broken – the default has been left at 3 to maximize reliability.

Should I provide any options when I mount a JFFS2 filesystem?

A useful mount option for a read-write JFFS2 filesystem is ‘noatime’. The ‘noatime’ option turns off the updating of file access times, which would cause a flash write every time a file is read. If the filesystem is the root filesystem, the option can be supplied one of two ways:

1) Pass the following parameter to the kernel command line:
rootflags=noatime

2) Remount the root filesystem with the noatime option: mount –t jffs2 –o remount,noatime /dev/mtdblock3 /

Is flash chip X supported?

Most NOR flash chips are supported. In the old JEDEC drivers, you had to add an entry for each new flash to specify the sector layout and programming algorithms. The entry was indexed by the Manufacturer and Device ID numbers.

The MTD CFI driver uses the Common Flash Interface (CFI). The following description of CFI is excerpted from the latest CFI 2.0 standard []:

The Common Flash Interface (CFI) specification outlines device and host system software interrogation handshake that allows specific vendor-specified software algorithms to be used for entire families of devices. This allows device-independent, JEDEC ID-independent and forward- and backward-compatible software support for the specified flash device families. It allows flash vendors to standardize their existing interfaces for long-term compatibility.

The MTD CFI support probes the hardware for the CFI data. The CFI data includes the chip ID, command set ID, flash geometry and supported command types. The MTD CFI code supports three command sets: Intel (0001), AMD (0002) and ST Advanced Architecture (0020). You can even compile in support for all of these command sets in the same kernel. Once the command set support is present, you can use any CFI-compliant chip, assuming your low-level chip select timings and address range are compatible with the new flash device.

How do I add support for my platform?

In the kernel source, the drivers/mtd/maps directory contains the mapping drivers. You may be able to use the generic physmap.c driver. Specify the base address, chip size and bus width in the kernel configuration, and the physmap.c driver will probe the flash type. Generic memory accesses are used to read the flash. The physmap.c driver can even handle several flash chips in the contiguous memory area.

Some flashes do not use straightforward memory mappings, due to external bus addressing limitations. Or you may have more than one flash with non-adjacent memory mappings. In this case, you should write your own mapping driver. You can use physmap.c as a reference

posted @ 2011-04-26 17:15  leao  阅读(722)  评论(0编辑  收藏  举报