Virtual Memory(3)

Integrating Caches and VM 

TODO

peeding up Address Translation with a TLB 

Many systems try to eliminate even this cost by including a small cache of PTEs in the MMU called a translation lookaside buffer (TLB). 

A TLB is a small, virtually addressed cache where each line holds a block consisting of a single PTE

A TLB usually has a high degree of associativity.  

As shown in Figure 9.15, the index and tag fields that are used for set selection and line matching are extracted from the virtual page number in the virtual address. 

If the TLB has T = 2^t sets, then the TLB index (TLBI) consists of the t least significant bits of the VPN, and the TLB tag (TLBT) consists of the remaining bits in the VPN. 

Figure 9.16(a) shows the steps involved when there is a TLB hit (the usual case).

The key point here is that all of the address translation steps are performed inside the on-chip MMU, and thus are fast

  • .  Step 1: The CPU generates a virtual address.

  • .  Steps 2 and 3: The MMU fetches the appropriate PTE from the TLB.

  • .  Step 4: The MMU translates the virtual address to a physical address and sends it to the cache/main memory.

  • .  Step 5: The cache/main memory returns the requested data word to the CPU. 

When there is a TLB miss, then the MMU must fetch the PTE from the L1 cache, as shown in Figure 9.16(b).

The newly fetched PTE is stored in the TLB, possibly overwriting an existing entry. 

Multi-Level Page Tables 

To this point we have assumed that the system uses a single page table to do address translation.

But if we had a 32-bit address space, 4 KB pages, and a 4-byte PTE, then we would need a 4 MB page table resident in memory at all times, even if the application referenced only a small chunk of the virtual address space.

The common approach for compacting the page table is to use a hierarchy of page tables instead.  

Notice that with 4-byte PTEs, each level 1 and level 2 page table is 4K bytes, which conveniently is the same size as a page

This scheme reduces memory requirements in two ways.

First, if a PTE in the level 1 table is null, then the corresponding level 2 page table does not even have to exist. This represents a significant potential savings, since most of the 4 GB virtual address space for a typical program is unallocated.  

Second, only the level 1 table needs to be in main memory at all times. The level 2 page tables can be created and paged in and out by the VM system as they are needed, which reduces pressure on main memory.

Only the most heavily used level 2 page tables need to be cached in main memory. 

Figure 9.18 summarizes address translation with a k-level page table hierarchy

The virtual address is partitioned into k VPNs and a VPO. 

Each VPN i, 1 ≤ i ≤ k, is an index into a page table at level i.  

Each PTE in a level-j table, 1 ≤ j ≤ k − 1, points to the base of some page table at level j + 1. 

Each PTE in a level-k table contains either the PPN of some physical page or the address of a disk block. 

To construct the physical address, the MMU must access k PTEs before it can determine the PPN.

Accessing k PTEs may seem expensive and impractical at first glance.

How- ever, the TLB comes to the rescue here by caching PTEs from the page tables at the different levels.

In practice, address translation with multi-level page tables is not significantly slower than with single-level page tables. 

 

Putting It Together: End-to-end Address Translation 

To keep things manageable, we make the following assumptions: 

  • .  The memory is byte addressable.

  • .  Memory accesses are to 1-byte words (not 4-byte words). 

  • .  Virtual addresses are 14 bits wide (n = 14).

  • .  Physical addresses are 12 bits wide (m = 12).

  • .  The page size is 64 bytes (P = 64).

         . The TLB is four-way set associative with 16 total entries.

         . The L1 d-cache is physically addressed and direct mapped, with a 4-byte line size and 16 total sets. 

Since each page is 2^6 = 64 bytes, the low-order 6 bits of the virtual and physical addresses serve as the VPO and PPO respectively.

The high-order 8 bits of the virtual address serve as the VPN. The high-order 6 bits of the physical address serve as the PPN

When the CPU executes a load instruction that reads the byte at address 0x03d4

To begin, the MMU extracts the VPN (0x0F) from the virtual address and checks with the TLB to see if it has cached a copy of PTE 0x0F from some previous memory reference.  

The TLB extracts the TLB index (0x03) and the TLB tag (0x3) from the VPN, hits on a valid match in the second entry of Set 0x3,

and returns the cached PPN (0x0D) to the MMU

If the TLB had missed, then the MMU would need to fetch the PTE from main memory.

The MMU now has everything it needs to form the physical address.

It does this by concatenating the PPN (0x0D) from the PTE with the VPO (0x14) from the virtual address, which forms the physical address (0x354). 

Next, the MMU sends the physical address to the cache, which extracts the cache offset CO (0x0), the cache set index CI (0x5), and the cache tag CT (0x0D) from the physical address. 

Since the tag in Set 0x5 matches CT, the cache detects a hit, reads out the data byte (0x36) at offset CO, and returns it to the MMU, which then passes it back to the CPU. 

Other paths through the translation process are also possible.

For example, if the TLB misses, then the MMU must fetch the PPN from a PTE in the page table.

If the resulting PTE is invalid, then there is a page fault and the kernel must page in the appropriate page and rerun the load instruction.

Another possibility is that the PTE is valid, but the necessary memory block misses in the cache. 

Terms explanation:

TLB: The TLB is virtually addressed using the bits of the VPN.

Since the TLB has four sets, the 2 low-order bits of the VPN serve as the set index (TLBI).

The remaining 6 high-order bits serve as the tag (TLBT) that distinguishes the different VPNs that might map to the same TLB set. 

Page table: 

The page table is a single-level design with a total of 2^8 = 256 page table entries (PTEs).

For convenience, we have labeled each PTE with the VPN that indexes it; but keep in mind that these VPNs are not part of the page table and not stored in memory.

Also, notice that the PPN of each invalid PTE is denoted with a dash to reinforce the idea that whatever bit values might happen to be stored there are not meaningful. 

Core i7 Address Translation 

// TODO

Linux Virtual Memory System 

Linux maintains a separate virtual address space for each process of the form shown in Figure 9.26. 

The kernel virtual memory contains the code and data structures in the kernel.

Some regions of the kernel virtual memory are mapped to physical pages that are shared by all processes.  

For example, each process shares the kernel’s code and global data structures.  

Interestingly, Linux also maps a set of contiguous virtual pages (equal in size to the total amount of DRAM in the system)to the corresponding set of contiguous physical pages.  

This provides the kernel with a convenient way to access any specific location in physical memory,

for example, when it needs to access page tables, or to perform memory-mapped I/O operations on devices that are mapped to particular physical memory locations

Other regions of kernel virtual memory contain data that differs for each process

 

posted @ 2018-07-12 19:45  geeklove  阅读(519)  评论(0编辑  收藏  举报