Author: JIURL

               Home: http://jiurl.yeah.net

    Date: 2005-06-02


1 PageFile Swap File Introduction

Windows 2000 uses page-based virtual memory management, contents of some physical pages can be out swapped into PageFile swap file, so these physical pages can be used for new purpose.

In Windows 2000, file "pagefile.sys" is the PageFile swap file。

The physical pages that can be out swapped to PageFile swap file include: thread's kernel mode stack. Pages in process's WorkingSet except for file-mapping, such as, process's heap, thread's user mode stack, process environment block, thread environment block, process's page table, etc. Pages in PagedPool in system address space. Under some conditions, even process's page directory, process's WorkingSet may be also out swapped to PageFile swap file. There's another special case, a file-mapping created by calling CreateFileMapping and parameter 'hFile' is INVALID_HANDLE_VALUE, basing on file-mapping mechanism, the file-mapping is also backed by PageFile swap file.

2 PageFile Swap File Details

The entire PageFile is, from beginning to end, divided into one by one PAGE_SIZE blocks. For 32-bit CPU, PAGE_SIZE is 4KB. When contents of physical pages are out swapped to PageFile, find the unused blocks in PageFile, then write the contents of physical pages into the blocks, then set PTE's corresponding flag bits to indicate that the corresponding page's content of the PTE is in PageFile, and save the corresponding block number in the PTE. When content of page are in swapped from PageFile, according to the information in the PTE, find the corresponding block in PageFile, then read the content of the block into a physical page.

In memory, system maintains a MMPAGING_FILE structure for every PageFile. Depending on the information in the structure, system can know PageFile is which file, system can know which blocks in PageFile have been used, which blocks haven't been used, to find the unused block to use.

MMPAGING_FILE structure defined as following:

typedef struct _MMPAGING_FILE {
PFN_NUMBER Size;
PFN_NUMBER MaximumSize;
PFN_NUMBER MinimumSize;
PFN_NUMBER FreeSpace;
PFN_NUMBER CurrentUsage;
PFN_NUMBER PeakUsage;
PFN_NUMBER Hint;
PFN_NUMBER HighestPage;
PMMMOD_WRITER_MDL_ENTRY Entry[2];
PRTL_BITMAP Bitmap;
PFILE_OBJECT File;
UNICODE_STRING PageFileName;
ULONG PageFileNumber;
BOOLEAN Extended;
BOOLEAN HintSetToZero;
} MMPAGING_FILE, *PMMPAGING_FILE;

RTL_BITMAP structure defined as following:

typedef struct _RTL_BITMAP {
ULONG SizeOfBitMap; // Number of bits in bit map
PULONG Buffer; // Pointer to the bit map itself
} RTL_BITMAP;

Depending on "PFILE_OBJECT File", corresponding PageFile file can be found.
Depending on "PRTL_BITMAP Bitmap", which blocks in PageFile haven't been used can be konwn.

Each bit in Bitmap->Buffer corresponds to a block in PageFile, value of bit is 0 means that corresponding block haven't been used, value of bit is 1 means that corresponding block have been used. Use RtlFindClearBitsAndSet to find unused blocks in Bitmap. SizeOfBitMap depends on the PageFile's MaximumSize.

A PTE's bit0, bit10, bit11 are all 0, means that the content of the page is in PageFile. Bit0 is Valid flag bit, the bit is 0 means that this is an invalid page, the other bits of an invalid page's PTE are all defined and used by system. For an invalid PTE, bit10 is Prototype flag bit. For an invalid PTE and Prototype flag bit is 0, bit11 is Transition flag bit, Transition flag bit is 0 means that the content of the page is in a PageFile.

PTE structure which point to a block in a PageFile defined as following:

{
ULONG Valid : 1; // equal to 0
ULONG PageFileNumber : 4;
ULONG Protection : 5;
ULONG Prototype : 1; // equal to 0
ULONG Transition : 1; // equal to 0
ULONG PageFileBlockNumber : 20;
}

PageFileNumber:
4 bits, used to find the corresponding PageFile in PMMPAGING_FILE MmPagingFile[] array.
There's a PMMPAGING_FILE MmPagingFile[] array in system, which keep pointer of every PageFile's MMPAGING_FILE structure. PageFileNumber is 4 bits so that there may be 16 PageFiles at most in system.

PageFileBlockNumber:
20 bits, the value as a index is used to find the corresponding block in PageFile.
PageFileBlockNumber is 20 bits so that it can index 1M blocks at most. For 32-bit CPU, the size of block is 4KB so that the possible maximum size of a PageFile is 4GB. 
Every block needs a bit in Bitmap->Buffer to indicate whether it has been used, so that 1M blocks * 1 bit/block = 1M bits = 256KB, so that the Bitmap->Buffer of maximum size PageFile needs only 256KB memory.

3 Out Swap

Out swap of a page is divided into two steps. First step, page turns to Transition state from Valid state. Second step, the physical page in Transition state is out swapped to PageFile, only modified physical pages can be out swapped to PageFile, and only modified physical pages need be out swapped to PageFile. Only modified physical pages can be out swapped to PageFile, so that we only introduce the modified physical page case.

Page turns to Transition state from Valid state, PTE truns to invliad from valid. The higher-order 20 bits of invalid PTE is still a physical frame number, the physical page is just the physical page that is pointed by PTE when the PTE is valid, and the content of the phsical page is still the content when the PTE is valid. The physical page's PFN (PageFrameNumber) structure also turns to Modified state from Valid state, and be linked into MmModifiedPageListHead list.

Every physical page has a PFN structure, to keep the physical page's relative information, such as whether the physical page has been used, whether the physical page has been modified. The size of PFN structure is 24 Bytes. There's a structure called PFN database in system, it is a PFN array, orderly correspond every physical pages. There are MmZeroedPageListHead list, MmFreePageListHead list, MmStandbyPageListHead list, MmModifiedPageListHead list in system. All modified Transition state physical pages' PFN structure are linked in MmModifiedPageListHead list. All non-modified Transition state physical pages' PFN structure are linked in MmStandbyPageListHead list. All free and zeroed physical pages' PFN structure are linked in MmZeroedPageListHead list. All free and non-zeroed physical pages' PFN structure are linked in MmFreePageListHead list.

PFN in different state has different structure definition, PFN structure in Modified state defined as following:

{
ULONG Flink;
PULONG PteAddress;
ULONG Blink;
USHORT Flags;
USHORT ReferenceCount;
ULONG OriginalPte;
ULONG PteFrame;
}

Flink, next PFN structure's physical frame number in list.
Blink, previous PFN structure's physical frame number in list.

There are mainly two cases that page turns to Transition state from Valid state, one is the out swap of thread's kernel mode stack, one is triming process's WorkingSet.

Every a moment, for example 4 seconds, system thread KeSwapProcessOrStack gets executed. Find the wait state threads which have been wait state longer than the stack protect time, trun those threads' kernel mode stack pages, from Valid state to Transition state.

When available memory is very tight, system thread KeBalanceSetManager will call MmWorkingSetManager, MmWorkingSetManager will call MiTrimWorkingSet to trim WorkingSet. Process's WorkingSet structure keeps the process's all valid pages' virtual address, include all valid pages in user address space, process's page tables, etc. In WorkingSet, modified valid pages include, process's heap, thread's user mode stack, process environment block, thread environment block, process's page table, etc. Triming WorkingSet will turn some valid pages from Valid state to Transition state, and these pages will be removed from WorkingSet. The pages removed from WorkingSet and turned from Valid state to Transition State, can be modified pages or non-modified pages, but modified pages will all be linked in MmModifiedPageListHead list.

When MmAvailablePages < MmMinimumFreePages or the number of physical pages in MmModifiedPageList above or equal to MmModifiedPageMaximum, system thread MiModifiedPageWriter gets executed, finally call MiGatherPagefilePages to out swap the physical pages into PageFile.

In MiGatherPagefilePages, call RtlFindClearBitsAndSet to find the unused block in MMPAGING_FILE's Bitmap, according to the position of found bit in Bitmap, can calculate the corresponding block's block number, remove physical page from MmModifiedPageList, save the block number , PageFileNumber, etc in the physical page's PFN structure's OriginalPte, then write the content of the physcial page into the corresponding block in PageFile.

Call stack of write into PageFile:


00 nt!IoAsynchronousPageWrite
01 nt!MiGatherPagefilePages+0x290
02 nt!MiModifiedPageWriterWorker+0x224
03 nt!MiModifiedPageWriter+0x14c
04 nt!PspSystemThreadStartup+0x69
05 nt!KiThreadStartup+0x16

When physical page write to PageFile complete, MiWriteComplete will be executed. In MiWriteComplete, link the physical page to MmStandbyPageList list. This moment, the content of the physical page is still the content when the PTE is valid, corresponding PTE is still in Transition state and points to this physcial page.

When system needs physical pages, system will call, such as, MiRemoveAnyPage, MiRemoveZeroPage to obtain physcial pages. MiRemoveAnyPage, MiRemoveZeroPage first remove physical pages from ZeroedPageList and FreePageList, when there is no physical pages in ZeroedPageList, FreePageList, will remove physical pages from StandbyPageList. In the progress of removing physical pages from StandbyPageList, will set corresponding PTE according to the physical page's PFN structure's OriginalPte. There saved the block's block number which keeped the content of the physical page and other information in PFN structure's OriginalPte, so that the corresponding PTE points to the corresponding block in PageFile. So physical page removed from StandbyPageList can be used for other purpose.

4 In Swap

Program codes access the page which has been out swapped in PageFile, will cause an exception, exception handler procedure will in swap the content of the page. The PTE corresponding the out swapped page in PageFile is invalid, and points to a block in PageFile. Codes access the content of the page which has been out swapped in PageFile, because the corresponding PTE is invalid, will cause an Page-Fault exception, the interrupt vector of Page-Fault exception is 0xe, the entry address of the interrupt handler procedure is KiTrap0E. In Page-Fault exception processing, according to the faulting address, find the corresponding PTE, PTE's bit0, bit10, bit11 are all 0, means that the content of the page is in PageFile. According to PageFileNumber in PTE, find the corresponding PageFile in MmPagingFile[], then according to PageFileBlockNumber in PTE, can find the needed block in the PageFile. Read the corresponding block into a physical page, then set the PTE to point to the physical page, and set the PTE valid, the in swapping is complete.

In Page-Fault exception processing, finding that the PTE's bit10, bit11 are all 0, MiDispatchFault will call MiResolvePageFileFault. MiResolvePageFileFault find the corresponding block according to PageFileNumber, PageFileBlockNumber in PTE, make ready to read, then return. MiDispatchFault will call IoPageRead to read the corresponding block in PageFile into physical page.

Call stack of MiResolvePageFileFault:

00 nt!MiResolvePageFileFault
01 nt!MiDispatchFault+0x13c
02 nt!MmAccessFault+0xc28
03 nt!KiTrap0E+0xc3
...

Call stack of IoPageRead:

00 nt!IoPageRead
01 nt!MiDispatchFault+0x231
02 nt!MmAccessFault+0xc28
03 nt!KiTrap0E+0xc3
...

END.

Comments: guestbook
Homepage: http://jiurl.nease.net Email: microjiurl@163.com

posted on 2005-09-20 15:15  stone  阅读(2594)  评论(1编辑  收藏  举报