在 IA-32 系统中,物理内存最开始的1GB 被称为“低端内存”,1GB 以上的部分称为“高端内存”。先前的Linux 核心版本要求通往存储设备的数据缓存必须放在物理RAM 的低端内存区域,即使是应用程序可以同时使用高端内存和低端内存也存在同样状况。这样,来自低端内存区域数据缓存的I/O 请求可以直接进行内存存取操作。但是,当应用程序发出一个I/O 请求,其中包含位于高端内存的数据缓存时,核心将强制在低端内存中分配一个临时数据缓存,并将位于高端内存的应用程序缓存数据复制到此处,这个数据缓存相当于一个跳转的buffer。例如一些老设备只能访问16M以下的内存,但DMA的目的地址却在16M以上时,就需要在设备能访问16M范围内设置一个buffer作为跳转。这种额外的数据拷贝被称为“bounce buffering”,会明显地降低I/O 密集的数据库应用的性能,因为大量分配的bounce buffers 会占用许多内存,而且bouncebuffer 的复制会增加系统内存总线的负荷。
http://www.linuxdoc.org/HOWTO/IO-Perf-HOWTO/overview.html
3. Avoiding Bounce Buffers
This section provides information on applying and using the bounce buffer patch on the Linux 2.4 kernel. The bounce buffer patch, written by Jens Axboe, enables device drivers that support direct memory access (DMA) I/O to high-address physical memory to avoid bounce buffers.
This document provides a brief overview on memory and addressing in the Linux kernel, followed by information on why and how to make use of the bounce buffer patch.
3.1. Memory and Addressing in the Linux 2.4 Kernel
The Linux 2.4 kernel includes configuration options for specifying the amount of physical memory in the target computer. By default, the configuration is limited to the amount of memory that can be directly mapped into the kernel's virtual address space starting at PAGE_OFFSET. On i386 systems the default mapping scheme limits kernel-mode addressability to the first gigabyte (GB) of physical memory, also known as low memory. Conversely, high memory is normally the memory above 1 GB. High memory is not directly accessible or permanently mapped by the kernel. Support for high memory is an option that is enabled during configuration of the Linux kernel.
3.2. The Problem with Bounce Buffers
When DMA I/O is performed to or from high memory, an area is allocated in low memory known as a bounce buffer. When data travels between a device and high memory, it is first copied through the bounce buffer.
Systems with a large amount of high memory and intense I/O activity can create a large number of bounce buffers that can cause memory shortage problems. In addition, the excessive number of bounce buffer data copies can lead to performance degradation.
Peripheral component interface (PCI) devices normally address up to 4 GB of physical memory. When a bounce buffer is used for high memory that is below 4 GB, time and memory are wasted because the peripheral has the ability to address that memory directly. Using the bounce buffer patch can decrease, and possibly eliminate, the use of bounce buffers.
3.3. Locating the Patch
The latest version of the bounce buffer patch is block-highmem-all-18b.bz2, and it is available from Andrea Arcangeli's -aa series kernels athttp://kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/.
3.3.1. Configuring the Linux Kernel to Avoid Bounce Buffers
This section includes information on configuring the Linux kernel to avoid bounce buffers. The Linux Kernel-HOWTO at http://www.linuxdoc.org/HOWTO/Kernel-HOWTO.html explains the process of re-compiling the Linux kernel.
The following kernel configuration options are required to enable the bounce buffer patch:
Development Code - To enable the configurator to display the High I/O Support option, select the Code maturity level options category and specify "y" to Prompt for development and/or incomplete code/drivers.
High Memory Support - To enable support for physical memory that is greater than 1 GB, select the Processor type and features category, and select a value from the High Memory Support option.
High Memory I/O Support - To enable DMA I/O to physical addresses greater than 1 GB, select the Processor type and features category, and enter "y" to the HIGHMEM I/O support option. This configuration option is a new option introduced by the bounce buffer patch.
3.3.2. Enabled Device Drivers
The bounce buffer patch provides the kernel infrastructure, as well as the SCSI and IDE mid-level driver modifications to support DMA I/O to high memory. Updates for several device drivers to make use of the added support are also included with the patch.
If the bounce buffer patch is applied and you configure the kernel to support high memory I/O, many IDE configurations and the device drivers listed below perform DMA I/O without the use of bounce buffers:
aic7xxx_drv.o |
aic7xxx_old.o |
cciss.o |
cpqarray.o |
megaraid.o |
qlogicfc.o |
sym53c8xx.o |
3.4. Modifying Your Device Driver to Avoid Bounce Buffers
If your device drivers are not listed above in the Enabled Device Drivers section, and the device is capable of high-memory DMA I/O, you can modify your device driver to make use of the bounce buffer patch as follows. More information on rebuilding a Linux device driver is available at http://www.xml.com/ldd/chapter/book/index.html.
-
A.) For SCSI Adapter Drivers: set the highmem_io bit in the Scsi_Host_Template structure.
B.) For IDE Adapter Drivers: set the highmembit in the ide_hwif_t structure.
-
Call pci_set_dma_mask(struct pci_dev *pdev, dma_addr_t mask) to specify the address bits that the device can successfully use on DMA operations.
If DMA I/O can be supported with the specified mask, pci_set_dma_mask() will set pdev->dma_mask and return 0. For SCSI or IDE, the mask value will also be passed by the mid-level drivers toblk_queue_bounce_limit(request_queue_t *q, u64 dma_addr) so that bounce buffers are not created for memory directly addressable by the device. Drivers other than SCSI or IDE must callblk_queue_bounce_limit() directly.
-
Use pci_map_page(dev, page, offset, size, direction), instead of pci_map_single(dev, address, size, direction) to map a memory region so that it is accessible by the peripheral device. pci_map_page()supports both high and low memory.
The address parameter for pci_map_single() correlates to the page and offset parameters for pci_map_page(). Use the virt_to_page() macro to convert an address to a page and offset. The virt_to_page()macro is defined by including pci.h. For example:
void *address;
struct page *page;
unsigned long offset;
page = virt_to_page(address);
offset = (unsigned long) address & ~PAGE_MASK;
Call pci_unmap_page() after the DMA I/O transfer is complete to remove the mapping established by pci_map_page().
pci_map_single() is implemented using virt_to_bus(). virt_to_bus() handles low memory addresses only. Drivers supporting high memory should no longer call virt_to_bus() or bus_to_virt().
-
If your driver calls pci_map_sg() to map a scatter-gather DMA operation, your driver should set the page and offset fields instead of the address field of the scatterlist structure. Refer to step 3 for converting an address to a page and offset.
If your driver is already using the PCI DMA API, continue to use pci_map_page() or pci_map_sg() as appropriate. However, do not use the address field of the scatterlist structure.