[转]Part2: Understanding !PTE, Part2: Flags and Large Pages
Hello, it's Ryan Mangipano with part two of my PTE series. Today I'll discuss PDE/PTE flags, the TLB, and show you a manual conversion of x86 PAE Large Page Virtual Addresses to Physical. If you haven’t read the first part of this series please find it here. It's a good primer before proceeding.
PDE and PTE flags
I'll start with a discussion about the PDE/PTE flags. If you recall from part one not all of the bits of the Page Directory Entry (PDE) are related to the index (used to form the pointer to the base of the next level). This is true of the table entries in all the levels. For example, on a PAE x86 systems only 9 bits of the PTE (page table entry) are used for the index. During our previous conversion, we only used some of the bits for the index into the next table. The rest of the data, we simply dropped off and replaced with zeros as needed. So what are the other bits used for? They are used for a series of flags. You will observe the state of these flags output by !PTE in the following manner: (-G-DA—KWEV).
These flags are documented in the Intel Manuals. Intel and AMD reserved some of the flags for use by the Operating System. All of these are also documented in chapter 9 (Memory Management) of “Windows Internals, 5th edition”. Let’s dump the PDE from the virtual address we dissected last time. This will allow you to see some of the flags that are present in the other bits
Obtaining the Virtual Address of the PDE
1: kd> !pte 0xf9a10054
VA f9a10054
PDE at 00000000C0603E68 PTE at 00000000C07CD080
contains 000000000102D963 contains 0000000002010121
pfn 102d -G-DA--KWEV pfn 2010 -G--A—KREV
Here is the data-type of our PDE
1: kd> dt nt!_MMPTE u.Hard
+0x000 u :
+0x000 Hard : _MMPTE_HARDWARE
Dumping the PDE and flags
1: kd> dt _MMPTE_HARDWARE 00000000C0603E68
nt!_MMPTE_HARDWARE
+0x000 Valid : 0y1
+0x000 Writable : 0y1
+0x000 Owner : 0y0
+0x000 WriteThrough : 0y0
+0x000 CacheDisable : 0y0
+0x000 Accessed : 0y1
+0x000 Dirty : 0y1
+0x000 LargePage : 0y0
+0x000 Global : 0y1
+0x000 CopyOnWrite : 0y0
+0x000 Prototype : 0y0
+0x000 Write : 0y1
+0x000 PageFrameNumber : 0y00000000000001000000101101 (0x102d)
+0x000 reserved1 : 0y00000000000000000000000000 (0)
Take note of the Letters in the PDE and PTE section of the !pte output, such as -G-DA--KWEV . These letters represent various flags. The presence or absence of the letter in the !PTE output tells you the state of the flag. These flags can also be seen in the hardware pte output above.
Valid (V) - Indicates that the data is located in physical memory. If this flag is not set, then the software can use ALL of the rest of the bits for whatever it wants(like storing the pagefile number and offset where the page is stored.
Write (W/R) - Indicates if the data is writeable or read-only. Multiprocessor or Vista or later. Hardware bit is documented in the processor manuals. Reserved Bit 11’s use is documented in Windows Internals, Chap. 9.
Owner (K/U) - Indicates if the page is owned kernel mode or usermode. Kernel if cleared. User if set.
WriteThrough (T) - When set indicates Writethrough caching policy. When not set indicates write-back caching policy
CacheDisable (N) - If set, the page translation table or physical page it points to cannot be cached.
Accessed (A) - Set when the page itself, or the table referencing it has been read from or written to
Dirty (D) - Indicates if any data on this page has been updated
LargePage (L) - This field is only used on PDEs, not PTEs. It indicates whether or not the PDE is the last table level (meaning that this entry references an actual page in memory) or if it is instead referencing a Page Table. If this bit is set in the PDE, this PDE points directly to a 2-MB page when PAE is in use. If PAE is not being used, the large page size that we are referencing is 4-MB. So basically, this is the page size bit. If this bit is cleared, the final destination page is 4k and can be found in the page table that this PDE points to. If this bit is set, then the final destination page is equal to the size of a large page on your system (2MB when PAE is in use) and can be located using the index value of this particular PDE since it becomes the last level. Keep in mind that a larger offset will be needed to reference all the positions in this large page since it is larger. To use this feature, the PSE bit (bit 4 which is the 5th bit over) must be set in CR4. The setting in CR4 is a global setting, enable the use of large pages on the system. The flag in the PDE only applies to the individual PDE.
Global (G) - If not set Translation Caching flushes affect this bit. If set, other processes use this translation also, so don’t flush it from the Translation Lookaside Buffer cache upon process context switches.
CopyOnWrite (C) - Intel states this is a software field. Windows uses this for processes to share the same copy of a page. The system will give the process a private copy of this page if there is any attempt to write to the page by the process (by copying it). Any attempt to execute code in this page occurs on a No execute system will cause an access violation.
Prototype (P) - Intel states this is a software field. Windows uses this to indicate that this is a prototype PTE.
Reserved0 - These Bits are Reserved
E (E) - Executable page. E is always displayed on platformst that Do not support hardware No-Execute.
Inspecting the state of the flags is important when attempting to manually convert addresses from Virtual Addresses to Physical. For example, since the valid bit is not set in the following invalid PTE, all of the fields are available for Windows to use. This means the information in the processor manuals doesn’t apply. Instead it is an nt!_MMPTE_SOFTWARE which references data located in the page file.
3: kd> !pte b8ae900c
VA b8ae900c
PDE at 00000000C0602E28 PTE at 00000000C05C5748
contains 000000000B880863 contains 000B8AF500000000
pfn b880 ---DA--KWEV not valid
PageFile: 0
Offset: b8af5
Protect: 0
For more information on the different types of invalid PTEs, refer to page 775 of “Windows Internals, 5th edition”.
Manually Converting x86 PAE Large Page Virtual Address to Physical
In part one of this blog, we manually translated a PAE 4-KByte Page Virtual Address (VA). Now we are going to manually translate a VA that represents a Large Page from our PAE system. As discussed in the previous section on PTE flags, a large page allocation means that the page size is larger and the PDE points directly to the page itself. The PDE will not point to the base of a page table. This means that there will be one less level of tables used in the translation. This also means that more bits will be needed to represent the offsets in the large page. I found the following address on my system that references a Large Page, 8054099e. Once again, all the required information was obtained from the processor manuals, debugger help file, and Windows Internals Book.
1: kd> !pte 8054099e
VA 8054099e
PDE at 00000000C0602010 PTE at 00000000C0402A00
contains 00000000004009E3 contains 0000000000000000
pfn 400 -GLDA--KWEV LARGE PAGE pfn 540
Below is the Virtual Address in binary.
1: kd> .formats 8054099e
Binary: 10000000 01010100 00001001 10011110
I have split this VA into it's three parts.
10 Page Directory Pointer Table Offset
000000 010 Page Directory Table Offset
10100 00001001 10011110 This is the Offset into the large page
Let’s get the base of the Page Directory Pointer Table and indentify which of the four entries we will need to follow.
1: kd> !dq (@cr3 & 0xffffffe0) + ( 0y10 * 8) L1
# 23406f0 00000000`06c46801
Now take our address from above, add our zeros and we have the base of Page Directory Table. Then add the offset from our Virtual Address and we'll dump out the PDE.
1: kd> !dq (6c46801 & 0xFFFFFF000) + ( 0y000000010 * 8) L1
# 6c46010 00000000`004009e3
Let’s convert the PDE to binary format to analyze the lower 12 bits. This will allow us to analyze the flags. The last Twelve bits (0-11) are used for the PFN. They are used for the flags that we discussed earlier.
1: kd> .formats (00000000`004009e3 & 0x0000000FFF)
Binary: 00000000 00000000 00001001 11100011
Let’s analyze the flags from this VA using the information we learned earlier....
· Bit Zero is set indicating that the page is Valid, located in physical memory, and all other bits
· Bit One is set indicating that this page is Writeable (Hardware Field)
· Bit Two is cleared indicating that this is a Kernel Mode Page
· Bit Three is cleared indicating a Write-Back Caching policy (caching of writes to the page is enabled)
· Bit Four is cleared indicating that caching is not disabled for the page.
· Bit Five is set indicating this page has been Accessed
· Bit Six is set indicating that this page is Dirty
· Bit Seven is set indicating that this is a Large Page. This PDE points directly to a page, not a Page Table.
· Bit Eight is set indicating other process share this Global PDE. No Delete upon TLB Cache Flush for process context switches.
· Bit Nine is cleared indicating this page is not Copy-On-Write
· Bit Ten is cleared indicating this is NOT a Prototype PTE
· Bit Eleven is set also indicating this page is Writeable (Reserved Field, See Windows Internals, Chap. 9.)
...and compare our findings to the Flags output from !PTE, -GLDA—KWEV. My system doesn’t support No-Execute, so the E is also displayed. For more information, .hh !PTE in windbg.
We know this is a Large Page and is Valid, so we can obtain the directory of our 2-MB Large Page (on this PAE system) from this PDE. The Intel Manual states that in our PDE the last 21 bits aren’t part of the address base.
1: kd> .formats (004009e3 & 0y11111111111000000000000000000000)
Binary: 00000000 01000000 00000000 00000000
So let’s combine the data from the PDE (Highlighted) with the offset from the VA (Virtual Address).
00000000 010 10100 00001001 10011110
Now I'll remove the spaces, precede this binary value with 0y, and send it to .formats.
1: kd> .formats 0y00000000010101000000100110011110
Hex: 0054099e
We could have obtained the same data in this manner
1: kd> ? (004009e3 & 0y11111111111000000000000000000000) + (8054099e & 0y00000000000111111111111111111111)
Evaluate expression: 5507486 = 0054099e
Now let’s dump the data in memory at this physical address
1: kd> !db 0054099e
# 54099e 33 db 8b 75 18 8b 7d 1c-0f 23 fb 0f 23 c6 8b 5d 3..u..}..#..#..]
# 5409ae 20 0f 23 cf 0f 23 d3 8b-75 24 8b 7d 28 8b 5d 2c .#..#..u$.}(.],
# 5409be 0f 23 de 0f 23 f7 0f 23-fb e9 43 ff ff ff 8b 44 .#..#..#..C....D
Now let’s dump the same data using the virtual address
1: kd> db 8054099e
8054099e 33 db 8b 75 18 8b 7d 1c-0f 23 fb 0f 23 c6 8b 5d 3..u..}..#..#..]
805409ae 20 0f 23 cf 0f 23 d3 8b-75 24 8b 7d 28 8b 5d 2c .#..#..u$.}(.],
805409be 0f 23 de 0f 23 f7 0f 23-fb e9 43 ff ff ff 8b 44 .#..#..#..C....D
So now you can see how I used the debugger to translate virtual addresses to physical adrresess. This concludes part two of this blog and in part three we will cover translation of x86 Non-PAE Virtual Address Translation, x64 Address Translation, and the TLB.