虚拟内存是现代操作系统普遍使用的一种技术。

虚拟内存的基本思想是,每个进程有用独立的逻辑地址空间,内存被分为大小相等的多个块,称为(Page)。每个页都是一段连续的地址。对于进程来看,逻辑上貌似有很多内存空间,其中一部分对应物理内存上的一块(称为页框 page frame,通常页和页框大小相等),还有一些没加载在内存中的对应在硬盘上。通过引入进程的逻辑地址,把进程地址空间与实际存储空间分离,增加存储管理的灵活性。

地址空间和存储空间两个基本概念的定义如下:

 

地址空间:将源程序经过编译后得到的目标程序,存在于它所限定的地址范围内,这个范围称为地址空间。地址空间是逻辑地址的集合。

 

存储空间:指主存中一系列存储信息的物理单元的集合,这些单元的编号称为物理地址存储空间是物理地址的集合。

由此衍生出的管理方式有三种:
页式存储管理、段式存储管理和段页式存储管理。这里主要介绍页式存储。

在页式系统中进程建立时,操作系统为进程中所有的页分配页框。当进程撤销时收回所有分配给它的页框。在程序的运行期间,如果允许进程动态地申请空间,操作系统还要为进程申请的空间分配物理页框。操作系统为了完成这些功能,必须记录系统内存中实际的页框使用情况。操作系统还要在进程切换时,正确地切换两个不同的进程地址空间到物理内存空间的映射。为了理解操作系统如何完成这些需求,我们先理解页表技术。先看张图,转载自51CTO:

页表中的条目被称为页表项(page table entry),一个页表项负责记录一段虚拟地址到物理地址的映射关系。

既然页表是存储在内存中的,那么程序每次完成一次内存读取时都至少会访问内存两次,相比于不使用MMU(MMU是Memory Management Unit的缩写,它代表集成在CPU内部的一个硬件逻辑单元,主要作用是给CPU提供从虚拟地址向物理地址转换的功能,从硬件上给软件提供一种内存保护的机制)时的一次内存访问,效率被大大降低了,如果所使用的内存的性能比较差的话,这种效率的降低将会更明显。因此,如何在发挥MMU优势的同时使系统消耗尽量减小,就成为了一个亟待解决的问题。

于是,TLB产生了。TLB是什么呢?我们叫它转换旁路缓冲器,它实际上是MMU中临时存放转换数据的一组重定位寄存器。既然TLB本质上是一组寄存器,那么不难理解,相比于访问内存中的页表,访问TLB的速度要快很多。因此如果页表的内容全部存放于TLB中,就可以解决访问效率的问题了。

然而,由于制造成本等诸多限制,所有页表都存储在TLB中几乎是不可能的。这样一来,我们只能通过在有限容量的TLB中存储一部分最常用的页表,从而在一定程度上提高MMU的工作效率。

这一方法能够产生效果的理论依据叫做存储器访问的局部性原理。它的意思是说,程序在执行过程中访问与当前位置临近的代码的概率更高一些。因此,从理论上我们可以说,TLB中存储了当前时间段需要使用的大多数页表项,所以可以在很大程度上提高MMU的运行效率。

我们这里所用的是二级页表的技术,何为二级页表,即是MMU采用二级查表的方法,即首先由虚拟地址索引出第一张表的某一段内容,然后再根据这段内容搜索第二张表,最后才能确定物理地址。这里的第一张表,我们叫它一级页表,第二张表被称为是二级页表。采用二级查表法的主要目的是减小页表自身占据的内存空间,但缺点是进一步降低了内存的寻址效率。

好了,前情介绍完毕,下面上干货,用哈佛大学开发的用于教学的OS161来实现VM,OS161基于MIP-I hardware

代码位于github上:https://github.com/tian-jiang/OS161-VirtualMemory

首先看一段代码,kern/arch/mips/include/vm.h,物理内存的分配定义在此

/*
 * MIPS-I hardwired memory layout:
 *    0xc0000000 - 0xffffffff   kseg2 (kernel, tlb-mapped)
 *    0xa0000000 - 0xbfffffff   kseg1 (kernel, unmapped, uncached)
 *    0x80000000 - 0x9fffffff   kseg0 (kernel, unmapped, cached)
 *    0x00000000 - 0x7fffffff   kuseg (user, tlb-mapped)
 *
 * (mips32 is a little different)
 */

#define MIPS_KUSEG  0x00000000
#define MIPS_KSEG0  0x80000000
#define MIPS_KSEG1  0xa0000000
#define MIPS_KSEG2  0xc0000000


内存的分配用图表示如下

这张图展示了在OS161中物理内存的分配. 

让我们从头开始:kern/startup/man.c

1     /* Early initialization. */
2     ram_bootstrap();
3         .......
4 
5     /* Late phase of initialization. */
6     vm_bootstrap();
7         ........

在操作系统启动的时候,调用raw_bootstrap()以及vm_bootstrap()来启动vm管理模块。那么这两个函数是在哪里定义和使用的呢,我们接着看下面的代码。

kern/include/vm.h和kern/arch/mips/include/vm.h

/* Initialization function */
void vm_bootstrap(void);
......

  /* Allocate/free kernel heap pages (called by kmalloc/kfree) */

  void frametable_bootstrap(void);

/*
 * Interface to the low-level module that looks after the amount of
 * physical memory we have.
 *
 * ram_getsize returns the lowest valid physical address, and one past
 * the highest valid physical address. (Both are page-aligned.) This
 * is the memory that is available for use during operation, and
 * excludes the memory the kernel is loaded into and memory that is
 * grabbed in the very early stages of bootup.
 *
 * ram_stealmem can be used before ram_getsize is called to allocate
 * memory that cannot be freed later. This is intended for use early
 * in bootup before VM initialization is complete.
 */

void ram_bootstrap(void);
paddr_t ram_stealmem(unsigned long npages);
void ram_getsize(paddr_t *lo, paddr_t *hi);

这两个function是定义在这里的,那么这两个function又是干什么事情的呢

kern/arch/mips/vm/ram.c, kern/arch/mips/vm/vm.c, kern/vm/frametable.c

vaddr_t firstfree;   /* first free virtual address; set by start.S */

static paddr_t firstpaddr;  /* address of first free physical page */
static paddr_t lastpaddr;   /* one past end of last free physical page */

/*
 * Called very early in system boot to figure out how much physical
 * RAM is available.
 */
void
ram_bootstrap(void)
{
    size_t ramsize;
    
    /* Get size of RAM. */
    ramsize = mainbus_ramsize();

    /*
     * This is the same as the last physical address, as long as
     * we have less than 508 megabytes of memory. If we had more,
     * various annoying properties of the MIPS architecture would
     * force the RAM to be discontiguous. This is not a case we 
     * are going to worry about.
     */
    if (ramsize > 508*1024*1024) {
        ramsize = 508*1024*1024;
    }

    lastpaddr = ramsize;

    /* 
     * Get first free virtual address from where start.S saved it.
     * Convert to physical address.
     */
    firstpaddr = firstfree - MIPS_KSEG0;

    kprintf("%uk physical memory available\n", 
        (lastpaddr-firstpaddr)/1024);
}
/*
 * Initialise the frame table
 */
void
vm_bootstrap(void)
{
    frametable_bootstrap();
}
/*
 * Make variables static to prevent it from other file's accessing
 */
static struct frame_table_entry *frame_table;
static paddr_t frametop, freeframe;

/*
 * initialise frame table
 */
void
frametable_bootstrap(void)
{
    struct frame_table_entry *p;
    paddr_t firsta, lasta, paddr;
    unsigned long framenum, entry_num, frame_table_size, i;
    
    // get the useable range of physical memory
    ram_getsize(&firsta, &lasta);
    KASSERT((firsta & PAGE_FRAME) == firsta);
    KASSERT((lasta & PAGE_FRAME) == lasta);
    
    framenum = (lasta - firsta) / PAGE_SIZE;
    
    // calculate the size of the whole framemap
    frame_table_size = framenum * sizeof(struct frame_table_entry);
    frame_table_size = ROUNDUP(frame_table_size, PAGE_SIZE);
    entry_num = frame_table_size / PAGE_SIZE;
    KASSERT((frame_table_size & PAGE_FRAME) == frame_table_size);
    
    frametop = firsta;
    freeframe = firsta + frame_table_size;
    
    if (freeframe >= lasta) {
        // This is impossible for most of the time
        panic("vm: framemap consume physical memory?\n");
    }
    
    // keep the frame state in the top of the useable range of physical memory
    // the free frame page address started from the end of the frame map
    frame_table = (struct frame_table_entry *) PADDR_TO_KVADDR(firsta);
    
    // Initialise the frame list, each entry corrsponding to a frame,
    // and each entry stores the address of the next free frame.
    // If the next frame address of this entry equals zero, means this current frame is allocated
    p = frame_table;
    for (i = 0; i < framenum-1; i++) {
        if (i < entry_num) {
            p->next_freeframe = 0;
            p += 1;
            continue;
        }
        paddr = frametop + (i+1) * PAGE_SIZE;
        p->next_freeframe = paddr;
        p += 1;
    }
}
kern/include/vm.h
struct
frame_table_entry { // address of next free frame size_t next_freeframe; };

raw_bootstrap是系统初始化时用来查看有多少物理内存可以使用的。而vm_bootstrap只是简单的调用了frametable_bootstrap(),而frametable_bootstrap()则是将能用的物理内存分页,每页大小为4K,然后保存一个记录空白页的linked list在内存中,从free的内存的顶部开始存放,但是在存放之前,先要算出需要多少空间来存放这个frame table。所以代码的前段在计算frame table的大小,后面则是初始化frame table这个linked list。因为初始化的时候都是空的,所以直接指向下一个page的地址即可。

操作系统的vm初始化到此完毕。那vm是怎么使用的呢,请看下面

kern/include/vm.h

/* Fault handling function called by trap code */
int vm_fault(int faulttype, vaddr_t faultaddress);

vaddr_t alloc_kpages(int npages);
void free_kpages(vaddr_t addr);

kern/include/addrspace.h,实现在kern/vm/addrspace.c

/* 
 * Address space - data structure associated with the virtual memory
 * space of a process.
 *
 * You write this.
 */

/*
 * A linked list which defined to store the information for regions(code, text, bss...)
 */
struct as_region {
    vaddr_t as_vbase;    /* the started virtual address for one region */
    size_t as_npages;    /* how many pages this region occupied from the vbase */
    unsigned int as_permissions;    /* does this region readable? writable? executable? */
    struct as_region *as_next_region;    /* address of the following region */
};

struct addrspace {
#if OPT_DUMBVM
        vaddr_t as_vbase1;
        paddr_t as_pbase1;
        size_t as_npages1;
        vaddr_t as_vbase2;
        paddr_t as_pbase2;
        size_t as_npages2;
        paddr_t as_stackpbase;
#else
        /* Put stuff here for your VM system */
    struct as_region *as_regions_start;    /* header of the regions linked list */
    vaddr_t as_pagetable;               /* address of the first-level page table */
#endif
};

/*
 * The structure of PTE in page table:
 * |        address             |  PTE_VALID      |    PE_W        |    PF_R        |    PF_X
 *  the virtual address of frame | valid indicator | writeable flag | readable flag | executable flag 
 * I don't use structure to represent PTE, just use type vaddr_t, and becuase the last 12 bit is free 
 * for a virtual address of frame, some of they could be used for the flags
 */

/*
 * Functions in addrspace.c:
 *
 *    as_create - create a new empty address space. You need to make 
 *                sure this gets called in all the right places. You
 *                may find you want to change the argument list. May
 *                return NULL on out-of-memory error.
 *
 *    as_copy   - create a new address space that is an exact copy of
 *                an old one. Probably calls as_create to get a new
 *                empty address space and fill it in, but that's up to
 *                you.
 *
 *    as_activate - make the specified address space the one currently
 *                "seen" by the processor. Argument might be NULL, 
 *                meaning "no particular address space".
 *
 *    as_destroy - dispose of an address space. You may need to change
 *                the way this works if implementing user-level threads.
 *
 *    as_define_region - set up a region of memory within the address
 *                space.
 *
 *    as_prepare_load - this is called before actually loading from an
 *                executable into the address space.
 *
 *    as_complete_load - this is called when loading from an executable
 *                is complete.
 *
 *    as_define_stack - set up the stack region in the address space.
 *                (Normally called *after* as_complete_load().) Hands
 *                back the initial stack pointer for the new process.
 *
 *    as_zero_region - zero out a new allocated page.
 *
 *    as_destroy_regions - free all the space allocated for regions storeage.
 */

struct addrspace *as_create(void);
int               as_copy(struct addrspace *src, struct addrspace **ret);
void              as_activate(struct addrspace *);
void              as_destroy(struct addrspace *);

int               as_define_region(struct addrspace *as, 
                                   vaddr_t vaddr, size_t sz,
                                   int readable, 
                                   int writeable,
                                   int executable);
int               as_prepare_load(struct addrspace *as);
int               as_complete_load(struct addrspace *as);
int               as_define_stack(struct addrspace *as, vaddr_t *initstackptr);
void          as_zero_region(vaddr_t vaddr, unsigned npages);
void          as_destroy_regions(struct as_region *ar);

kern/vm/frametable.c

/*
 * Allocate n pages. 
 * Before frame table initialisation, using ram_stealmem
 */
static
paddr_t
getppages(int npages)
{
    paddr_t paddr;
    struct frame_table_entry *p;
    int i;
    
    spinlock_acquire(&frametable_lock);
    if (frame_table == 0)
        paddr = ram_stealmem(npages);
    else
    {
        if (npages > 1){
            spinlock_release(&frametable_lock);
            return 0;
        }
        
        // Freeframe equals zero means all the frames have been allocated
        // and there is no frame to use.
        if (freeframe == 0){
            spinlock_release(&frametable_lock);
            return 0;
        }
        
        // Get the current free frame's entry id 
        // and retrieve the next free frame 
        paddr = freeframe;
        i = (freeframe - frametop) / PAGE_SIZE;
        p = frame_table + i;
        
        freeframe = p->next_freeframe;
        p->next_freeframe = 0;
    }
    spinlock_release(&frametable_lock);
    
    return paddr;
}

/*
 * Allocation function for public accessing
 * Returning virtual address of frame
 */
vaddr_t
alloc_kpages(int npages)
{
    paddr_t paddr = getppages(npages);
    
    if(paddr == 0)
        return 0;
    
    return PADDR_TO_KVADDR(paddr);
}

/*
 * Free page
 * Stores the address of the current freeframe into the entry of the frame to be freed
 * and update the address of the freeframe.
 */
static
void
freeppages(paddr_t paddr)
{
    struct frame_table_entry *p;
    int i;
    spinlock_acquire(&frametable_lock);
    i = (paddr - frametop) / PAGE_SIZE;
    p = frame_table + i;
    p->next_freeframe = freeframe;
    freeframe = paddr;
    spinlock_release(&frametable_lock);
}

/*
 * Free page function for public accessing
 */
void
free_kpages(vaddr_t addr)
{
    KASSERT(addr >= MIPS_KSEG0);
    
    paddr_t paddr = KVADDR_TO_PADDR(addr);
    if (paddr <= frametop) {
        // memory leakage
    }
    else {
        freeppages(paddr);
    }
}

kern/arch/mips/vm

这是最关键的一个函数,当TLB里面找不到用户app需要的virtual page时,怎么处理

/*
 * When TLB miss happening, a page fault will be trigged.
 * The way to handle it is as follow:
 * 1. check what page fault it is, if it is READONLY fault, 
 *    then do nothing just pop up an exception and kill the process
 * 2. if it is a read fault or write fault
 *    1. first check whether this virtual address is within any of the regions
 *       or stack of the current addrspace. if it is not, pop up a exception and
 *       kill the process, if it is there, goes on. 
 *    2. then try to find the mapping in the page table, 
 *       if a page table entry exists for this virtual address insert it into TLB 
 *    3. if this virtual address is not mapped yet, mapping this address,
 *     update the pagetable, then insert it into TLB
 */
int
vm_fault(int faulttype, vaddr_t faultaddress)
{
    vaddr_t *vaddr1, *vaddr2, vaddr, vbase, vtop, faultadd = 0;
    paddr_t paddr;
    struct addrspace *as;
    struct as_region *s;
    uint32_t ehi, elo;
    int i, index1, index2, spl;
    unsigned int permis = 0;
    
    switch (faulttype) {
        case VM_FAULT_READONLY:
            return EFAULT;
        case VM_FAULT_READ:
        case VM_FAULT_WRITE:
            break;
        default:
            return EINVAL;
    }
    
    as = curthread -> t_addrspace;
    if (as == NULL) {
        return EFAULT;
    }
    
    // Align faultaddress
    faultaddress &= PAGE_FRAME;
    
    // Go through the link list of regions 
    // Check the validation of the faultaddress
    KASSERT(as->as_regions_start != 0);
    s = as->as_regions_start;
    while (s != 0) {
        KASSERT(s->as_vbase != 0);
        KASSERT(s->as_npages != 0);
        KASSERT((s->as_vbase & PAGE_FRAME) == s->as_vbase);
        vbase = s->as_vbase;
        vtop = vbase + s->as_npages * PAGE_SIZE;
        if (faultaddress >= vbase && faultaddress < vtop) {
            faultadd = faultaddress;
            permis = s->as_permissions;
            break;
        }
        s = s->as_next_region;
    }
    
    if (faultadd == 0) {
        vtop = USERSTACK;
        vbase = vtop - VM_STACKPAGES * PAGE_SIZE;
        if (faultaddress >= vbase && faultaddress < vtop) {
            faultadd = faultaddress;
            // Stack is readable, writable but not executable
            permis |= (PF_W | PF_R);
        }
        
        // faultaddress is not within any range of the regions and stack
        if (faultadd == 0) {
            return EFAULT;
        }
    }
    
    index1 = (faultaddress & TOP_TEN) >> 22;
    index2 = (faultaddress & MID_TEN) >> 12;

    vaddr1 = (vaddr_t *)(as->as_pagetable + index1 * 4);
    if (*vaddr1) {
        vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
        // If the mapping exits in page table,
        // get the address stores in PTE, 
        // translate it into physical address, 
        // check writeable flag,
        // and prepare the physical address for TLBLO
        if (*vaddr2 & PTE_VALID) {
            vaddr = *vaddr2 & PAGE_FRAME;
            paddr = KVADDR_TO_PADDR(vaddr);
            if (permis & PF_W) {
                paddr |= TLBLO_DIRTY;
            }
        }
        // If not exists, do the mapping, 
        // update the PTE of the second page table,
        // check writeable flag,
        // and prepare the physical address for TLBLO
        else {
            vaddr = alloc_kpages(1);
            KASSERT(vaddr != 0);
            
            as_zero_region(vaddr, 1);
            *vaddr2 |= (vaddr | PTE_VALID);
            
            paddr = KVADDR_TO_PADDR(vaddr);
            if (permis & PF_W) {
                paddr |= TLBLO_DIRTY;
            }
        }
    }
    // If second page table even doesn't exists, 
    // create second page table,
    // do the mapping,
    // update the PTE,
    // and prepare the physical address.
    else {
        *vaddr1 = alloc_kpages(1);
        KASSERT(*vaddr1 != 0);
        as_zero_region(*vaddr1, 1);
        
        vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
        vaddr = alloc_kpages(1);
        KASSERT(vaddr != 0);
        as_zero_region(vaddr, 1);
        *vaddr2 |= (vaddr | PTE_VALID);

        paddr = KVADDR_TO_PADDR(vaddr);
        if (permis & PF_W) {
            paddr |= TLBLO_DIRTY;
        }
    }
        
    spl = splhigh();
    
    // update TLB entry
    // if there still a empty TLB entry, insert new one in
    // if not, randomly select one, throw it, insert new one in
    for (i=0; i<NUM_TLB; i++) {
        tlb_read(&ehi, &elo, i);
        if (elo & TLBLO_VALID) {
            continue;
        }
        ehi = faultaddress;
        elo = paddr | TLBLO_VALID;
        tlb_write(ehi, elo, i);
        splx(spl);
        return 0;
    }
    
    // FIXME, TLB replacement algo.
    ehi = faultaddress;
    elo = paddr | TLBLO_VALID;
    tlb_random(ehi, elo);
    splx(spl);
    return 0;
}

在系统运行的过程中,会不断的产生page fault,这是因为,虽然系统给了运行的程序分配了页(分配的函数见kern/vm/frametable.c),但是这个TLB里面没有记录这个页面从虚拟地址到物理地址的映射,所以无法使用。所以在程序真正需要使用这个页的时候,需要首先访问TLB,从里面取出对应的物理地址。

       

 

posted on 2015-06-03 12:27  ingenuity  阅读(2609)  评论(0编辑  收藏  举报