MIT_JOS_Lab5
This Lab is mainly the part of the file system in the JOS. It is mainly related to the disk related to file storage, and the form of file storage on the disk. Then there is the file system. The implementation method of the JOS file system is to realize the file through a special process. The basic operation, and then through IPC (inter-process communication) to achieve the operation of other processes on the file. Then read the file from the disk and create a process, this process is similar to fork.
File system preliminaries
Our file system is very different from the UNIX file system, does not support multi-user and multi-user permissions, Our file system also currently does not support hard links, symbolic links, time stamps, or special device files like most UNIX file systems do.
On-Disk File System Structure
In general operating systems, when a disk stores files, the disk has two regions, namely the data regions of the file, and the inode regions. For the inode, its size of it often fixed. Sometimes, our disk obviously has space, The reason why the file cannot be saved is that the inode area of the disk is full, and new data cannot be added. However, in JOS, we have simplified the file system, there is no inode stored on the disk, and the inode information of the file is directly stored in the directory above this file. We can find it by looking at the structure of the file Struct
struct File {
char f_name[MAXNAMELEN]; // filename
off_t f_size; // file size in bytes
uint32_t f_type; // file type
// Block pointers.
// A block is allocated iff its value is != 0.
uint32_t f_direct[NDIRECT]; // direct blocks
uint32_t f_indirect; // indirect block
// Pad out to 256 bytes; must do arithmetic in case we're compiling fsformat on a 64-bit machine.
uint8_t f_pad[256 - MAXNAMELEN - 8 - 4*NDIRECT - 4];
} __attribute__((packed)); // required only on some 64-bit machines
Sectors and Blocks
The file is stored on the disk, and the unit of reading the disk is a sector, usually the size of a sector is 512 Bytes. The first program we start OS with is to read a specific sector of the disk. But the unit of the file is block. In JOS, the block size is 4096 bytes, which is equal to the size of one page.
Superblocks
The file system is stored on the disk. For the file system, the root directory is special, and other files are found through the root directory, so on the disk, the block that stores the root directory is special. This block is usually called superblock, and in JOS, superblock is the second block in the disk, it's block[1], because block[0] is save the bootloader, some OS maybe have not only one superblocks.
File Meta-data
The layout of the meta-data describing a file in our file system is described by struct File in inc/fs.h. This meta-data includes the file's name, size, type (regular file or directory), and pointers to the blocks comprising the file. As mentioned above, we do not have inodes, so this meta-data is stored in a directory entry on disk. For file Meta-data, it's stored format in memory and disk both are struct File.
For the data block occupied by the file, we use two parts to represent, where f_direct[NDIRECT] is the direct index block, and f_indirect It is a first-level index block. The first-level index block's each entry points to a block. The content of this block stores the index of 1024 blocks.
Directories versus Regular Files
A File structure in our file system can represent either a regular file or a directory; these two types of "files" are distinguished by the type field in the File structure. The file system interprets the contents of a directory-file as a series of File structures describing the files and subdirectories within the directory.
In the JOS file system, superblock contains the file structure of the root directory. The contents of this directory-file is a sequence of File structures describing the files and directories located within the root directory of the file system.
The File System
This experiment does not implement a file system from scratch, the main part of the implementation is as follows,
- Read file from disk
- Write files from memory back to disk
- Allocate disks, and manage disks
- Through IPC (inter-process communication) to achieve the process of reading and writing files, and open excuses
The x86 processor uses the IOPL bits in the EFLAGS register to determine whether protected-mode code is allowed to perform special device I/O instructions such as the IN and OUT instructions. IO independent addressing is used in X86, So only the file system can access this special IO address space. In effect, the IOPL bits in the EFLAGS register provides the kernel with a simple "all-or-nothing" method of controlling whether user-mode code can access I/O space.
Exercise 1
Create a special IO process
if (type == ENV_TYPE_FS) {
env->env_tf.tf_eflags |= FL_IOPL_MASK;
}
The Block Cache
It is incorrect to say that the virtual space and the virtual disk are not connected at all. For the file system, the file system process has its own separate system space, which is completely different from other Env system spaces, The division of this virtual address space is shown below, We can see that the key part is 0x10000000 (DISKMAP), where we start to map the disk file. the only thing the file system environment needs to do is to
implement file access, it is reasonable to reserve most of the file system environment's address space in this way.
Of course, if you can, we read the all contents of this disk into the address space, but this is stupid and impossible. Therefore, we have implemented a page fault mechanism to read the disk.
Exercise 2
Implement the bc_pgfault and flush_block functions in fs/bc.c. bc_pgfault is a page fault handler, just like the one your wrote in the previous lab for
copy-on-write fork, except that its job is to load pages in from the disk in response to a page fault. When writing this, keep in mind that (1) addr may not be aligned to a block boundary and (2) ide_read operates in sectors, not blocks.
// LAB 5: you code here:
addr = ROUNDDOWN(addr, PGSIZE);
// alloc a page for load the block
if ((r = sys_page_alloc(0, addr, PTE_W | PTE_U | PTE_P)) != 0) {
panic("bc_pgfault: %e", r);
}
// the unit read from disk is sector rather block
if ((r = ide_read(blockno * BLKSECTS, addr, BLKSECTS)) != 0) {
panic("bc_pgfault: %e", r);
}
// Clear the dirty bit for the disk block page since we just read the
// block from disk
if ((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0)
panic("in bc_pgfault, sys_page_map: %e", r);
// Check that the block we read was allocated. (exercise for
// the reader: why do we do this *after* reading the block in?)
if (bitmap && block_is_free(blockno))
panic("reading free block %08x\n", blockno);
Write the content at address addr back to disk
void flush_block(void *addr)
{
uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;
int r;
// Determine whether the range of virtual addresses is correct
if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
panic("flush_block of bad va %08x", addr);
// LAB 5: Your code here.
addr = ROUNDDOWN(addr, PGSIZE);
// If the virtual address is not mapped, it does not exist in physical memory
if (!va_is_mapped(addr) || !va_is_dirty(addr)) {
return;
}
if ((r = ide_write(blockno * BLKSECTS, addr, BLKSECTS)) != 0) {
panic("flush_block: %e", r);
}
// Clear the dirty bit for the disk block page since we have writed back the block
if ((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) != 0) {
panic("flush_block: %e", r);
}
// panic("flush_block not implemented");
}
The fs_init function in fs/fs.c is a prime example of how to use the block cache. After initializing the block cache, it simply stores pointers into the disk map region in the super global variable. After this point, we can simply read from the super structure as if they were in memory and our page fault handler will read them from disk as necessary.
The Block Bitmap
After fs_init sets the bitmap pointer, we can treat bitmap as a packed array of bits, one for each block on the disk. See, for example, block_is_free, which simply checks whether a given block is marked free in the bitmap.
Exercise 3
Use free_block as a model to implement alloc_block in fs/fs.c, which should find a free disk block in the bitmap, mark it used, and return the number of that block. When you allocate a block, you should immediately flush the changed bitmap block to disk with flush_block, to help file system consistency
int alloc_block(void)
{
// The bitmap consists of one or more blocks. A single bitmap block
// contains the in-use bits for BLKBITSIZE blocks. There are
// super->s_nblocks blocks in the disk altogether.
int i;
// look for the fisrt free block and alloc it, after that, we should flush
// the block that save the bitmap[]
for (i = 0; i < super->s_nblocks; i++) {
if (block_is_free(i)) {
bitmap[i / 32] &= ~(1 << (i % 32));
flush_block(&bitmap[i / 32]);
return i;
}
}
// panic("alloc_block not implemented");
return -E_NO_DISK;
}
File Operations
We have provided a variety of functions in fs/fs.c to implement the basic facilities you will need to interpret and manage File structures, scan and manage the entries of directory-files, and walk the file system from the root to resolve an absolute pathname. Since the files are stored on the disk, the main job of this part is to modify the files on the disk.
Exercise 4
Implement file_block_walk and file_get_block. file_block_walk maps from a block offset within a file to the pointer for that block in the struct File or the indirect block, very much like what pgdir_walk did for page tables.
Get the address of the No filebno data block of the file in the index of file f. It is different from the block's address, but the index's address.
static int file_block_walk(struct File *f, uint32_t filebno, uint32_t **ppdiskbno, bool alloc)
{
if (filebno >= NDIRECT + NINDIRECT) {
return -E_INVAL;
}
if (filebno < NDIRECT) {
*ppdiskbno = &f->f_direct[filebno];
}
else {
if (!f->f_indirect && !alloc) {
return -E_NOT_FOUND;
}
if (!f->f_indirect && alloc) {
uint32_t newbno;
if ((newbno = alloc_block()) < 0) {
return -E_NO_DISK;
}
f->f_indirect = newbno;
memset(diskaddr(newbno), 0, BLKSIZE);
}
*ppdiskbno = &((uint32_t *)diskaddr(f->f_indirect))[filebno - NDIRECT];
}
// LAB 5: Your code here.
return 0;
}
Here we need the parameter pdiskbno and the parameter blk, the pdiskbno pointer points to the index address of the file's f_direct[] or f_indirect[], and blk points to the virtual address of the block, so *pdiskbno is Block number.
int file_get_block(struct File *f, uint32_t filebno, char **blk)
{
// LAB 5: Your code here.
uint32_t *pdiskbno;
int r;
if ((r = file_block_walk(f, filebno, &pdiskbno, 1)) != 0) {
return r;
}
if (!*pdiskbno) {
uint32_t newbno;
if ((newbno = alloc_block()) < 0) {
return -E_NO_DISK;
}
*pdiskbno = newbno;
memset(diskaddr(newbno), 0, BLKSIZE);
}
// *blk = (char *)pdiskbno;
// the virtual address of block isn't the same as block's pointer
*blk = diskaddr(*pdiskbno);
return 0;
// panic("file_get_block not implemented");
}
The file system interface
This part was originally a part that puzzled me very much. After I drew a flowchart of the whole process, I found that the process and the code are very clear. I use the following diagram to explain the file reading process, as well as some used interfaces.
We must first understand a union: Fsipc , His role is to pass information between client Env and server Env, So we use the way of sharing pages in IPC, the virtual address of this page is 0x0ffff000
. The client builds Fsipc, and the server obtains the request information from Fsipc. Fsipc can express multiple requests, its structure is as follows:
union Fsipc {
struct Fsreq_open {
char req_path[MAXPATHLEN];
int req_omode;
} open;
struct Fsreq_set_size {
int req_fileid;
off_t req_size;
} set_size;
struct Fsreq_read {
int req_fileid;
size_t req_n;
} read;
struct Fsret_read {
char ret_buf[PGSIZE];
} readRet;
struct Fsreq_write {
int req_fileid;
size_t req_n;
char req_buf[PGSIZE - (sizeof(int) + sizeof(size_t))];
} write;
struct Fsreq_stat {
int req_fileid;
} stat;
struct Fsret_stat {
char ret_name[MAXNAMELEN];
off_t ret_size;
int ret_isdir;
} statRet;
struct Fsreq_flush {
int req_fileid;
} flush;
struct Fsreq_remove {
char req_path[MAXPATHLEN];
} remove;
// Ensure Fsipc is one page
char _pad[PGSIZE];
};
Therefore, the process of client make a request is:
- construct fsipcbuf
- call fsipc()
- send message to file system Env
- receive the result information
The OpenFile structure is a mapping maintained by the server process. It maps a real file struct File and the file descriptor struct Fd opened by the user client. The struct Fd corresponding to each opened file is mapped to a physical page up to FILEEVA (0xd0000000). The server and the client process that opens the file share this physical page. When the client process communicates with the file system server, 0_fileid is used to specify the file to be operated.
struct OpenFile {
uint32_t o_fileid; // file id
struct File *o_file; // mapped descriptor for open file
int o_mode; // open mode
struct Fd *o_fd; // Fd page
};
The file system defaults to a maximum of 1024 files that can be opened at the same time, so there are 1024 strcut Openfiles, corresponding to 1024 physical pages up to 0xd0000000 in the server process address space, used to map these corresponding struct Fd.
struct Fd is a file descriptor used to represent the open file. Note that this descriptor is different from the inode on the disk. This descriptor will only be available for the open file. His main information is:
struct Fd {
int fd_dev_id; // device ID,
off_t fd_offset; // offset in the file block
int fd_omode;
union {
// File server files
struct FdFile fd_file;
};
};
For all open files, we maintain an OpenFile array[], For server Env, file operations are accessed through the index of OpenFile array[]. For the server, OpenFile array[] saved in the server Env's data segment.
Exercise 5
After understanding the above mechanism, the implementation of the code is relatively clear
int serve_read(envid_t envid, union Fsipc *ipc)
{
struct Fsreq_read *req = &ipc->read;
struct Fsret_read *ret = &ipc->readRet;
struct OpenFile *o;
int r;
if (debug)
cprintf("serve_read %08x %08x %08x\n", envid, req->req_fileid, req->req_n);
// get the OpenFile struct
if ((r = openfile_lookup(envid, req->req_fileid, &o)) != 0) {
return r;
}
// from file read n blocks with offset to ret_buf
if ((r = file_read(o->o_file, ret->ret_buf, req->req_n, o->o_fd->fd_offset)) > 0) {
o->o_fd->fd_offset += r;
}
return r;
}
Exercise 6
Like other *nix systems, user applications on JOS uses a file descriptor to access a file, and it is created by sending open
request to the file server. The file server uses struct OpenFile
to record an opened file, and these records are saved in opentab[]
. Before performing file operations like read or write, the file should be opened first..
int
serve_write(envid_t envid, struct Fsreq_write *req)
{
struct OpenFile *o;
int r;
if (debug)
cprintf("serve_write %08x %08x %08x\n", envid, req->req_fileid, req->req_n);
if ((r = openfile_lookup(envid, req->req_fileid, &o)) != 0) {
return r;
}
if ((r = file_write(o->o_file, req->req_buf, req->req_n, o->o_fd->fd_offset)) > 0) {
o->o_fd->fd_offset += r;
}
return r;
}
devfile_write()
prepares the write request, and sends it to the server. Here the request and response both need to be checked. The req_buf
should never exceed PGSIZE - (sizeof(int) + sizeof(size_t))
as the definition of struct Fsreq_write
indicates, and the response value should above 0.
static ssize_t devfile_write(struct Fd *fd, const void *buf, size_t n)
{
int r;
fsipcbuf.write.req_fileid = fd->fd_file.id;
fsipcbuf.write.req_n = n;
assert(n <= PGSIZE - (sizeof(int) + sizeof(size_t)));
memmove(fsipcbuf.write.req_buf, buf, n);
if ((r = fsipc(FSREQ_WRITE, NULL)) < 0)
return r;
assert(r <= n);
return r;
}
Spawning Processes
spawn() in lib/spawn.c creates a new process, loads the user program from the file system, and then starts the process to run the program. spawn() is like fork() in UNIX followed immediately by exec().
-
Open prog program file from file system
elf = (struct Elf*) elf_buf; if (readn(fd, elf_buf, sizeof(elf_buf)) != sizeof(elf_buf) || elf->e_magic != ELF_MAGIC) { close(fd); cprintf("elf magic %08x want %08x\n", elf->e_magic, ELF_MAGIC); return -E_NOT_EXEC; }
-
Call the system call sys_exofork() to create a new Env structure
// Create new child environment if ((r = sys_exofork()) < 0) return r; child = r;
-
Call the system to call sys_env_set_trapframe() to set the Trapframe field of the new Env structure (this field contains register information).
// Set up trap frame, including initial stack. child_tf = envs[ENVX(child)].env_tf; child_tf.tf_eip = elf->e_entry; if ((r = init_stack(child, argv, &child_tf.tf_esp)) < 0) return r;
-
According to the program herder in the ELF file, read the user program into the memory as Segment and map it to the specified linear address
// Set up program segments as defined in ELF header. ph = (struct Proghdr*) (elf_buf + elf->e_phoff); for (i = 0; i < elf->e_phnum; i++, ph++) { if (ph->p_type != ELF_PROG_LOAD) continue; perm = PTE_P | PTE_U; if (ph->p_flags & ELF_PROG_FLAG_WRITE) perm |= PTE_W; if ((r = map_segment(child, ph->p_va, ph->p_memsz, fd, ph->p_filesz, ph->p_offset, perm)) < 0) goto error; } close(fd); fd = -1; // Copy shared library state. if ((r = copy_shared_pages(child)) < 0) panic("copy_shared_pages: %e", r); child_tf.tf_eflags |= FL_IOPL_3; // devious: see user/faultio.c if ((r = sys_env_set_trapframe(child, &child_tf)) < 0) panic("sys_env_set_trapframe: %e", r);
-
Call the system call sys_env_set_status() to set the new Env structure status to ENV_RUNNABLE.
if ((r = sys_env_set_status(child, ENV_RUNNABLE)) < 0) panic("sys_env_set_status: %e", r);
Exercise 7
static int
sys_env_set_trapframe(envid_t envid, struct Trapframe *tf)
{
struct Env *e;
int r;
if ((r = envid2env(envid, &e, 1)) != 0) {
return r;
}
user_mem_assert(e, tf, sizeof(struct Trapframe), PTE_W);
tf->tf_cs |= 3;
tf->tf_ss |= 3;
tf->tf_eflags |= FL_IF;
tf->tf_eflags &= ~FL_IOPL_3;
e->env_tf = *tf;
return 0;
}
case SYS_env_set_trapframe:
return sys_env_set_trapframe(a1, (struct Trapframe *)a2);
Sharing library state across fork and spawn
In Unix, the file system is very important, because a design concept of UNIX is that everything is a file.
We mentioned in the previous article that struct Fd is a file descriptor of an open file. This descriptor can describe various files. Usually the position of this descriptor in the virtual space is starting at FDTABLE(0xD0000000), In JOS there are MAXFD (currently 32) file descriptors at most the application can have open at once. Each file descriptor also has an optional "data page" in the region starting at FILEDATA ((FDTABLE + MAXFD*PGSIZE)), which devices can use if they choose.
We would like to share file descriptor state across fork and spawn, but file descriptor state is kept in user-space memory. For unmodified spawn() and fork(), we use the copy-on-write mechanism, the child process copies the mapping of the parent process, and then handles page faults when there are page faults. We will change fork to know that certain regions of memory are used by the "library operating system" and should always be shared. Rather than hard-code a list of regions somewhere, we will set an otherwise-unused bit in the page table entries (just like we did with the PTE_COW bit in fork).
We have added the PTE_SHARE bit to the page table permissions. If this bit is set, then this PTE corresponding physical page is shared in memory,
we change the duppage() :
Where the shared page is different from Copy_on_write, the shared page does not need to set the PTE_COW bit, indicating that the physical page is shared and does not need to be copied,
Exercise 8
if ((uvpt[pn] & PTE_SHARE) == PTE_SHARE) {
if ((r = sys_page_map(parent_envid, va, envid, va, uvpt[pn] & PTE_SYSCALL)) != 0) {
panic("duppage: %e", r);
}
}
else if ((uvpt[pn] & PTE_W) == PTE_W || (uvpt[pn] & PTE_COW) == PTE_COW) {
if ((r = sys_page_map(parent_envid, va, envid, va, PTE_COW | PTE_U | PTE_P)) != 0) {
panic("duppage: %e", r);
}
if ((r = sys_page_map(parent_envid, va, parent_envid, va, PTE_COW | PTE_U | PTE_P)) != 0) {
panic("duppage: %e", r);
}
} else {
if ((r = sys_page_map(parent_envid, va, envid, va, PTE_U | PTE_P)) != 0) {
panic("duppage: %e", r);
}
}
The keyboard interface
In Lab1, we have implemented the console's input and output ports and inline assembly code. Now, we need to add these to the system call, so that the user Env can read the console input and output.
Exercise 9
// Handle keyboard and serial interrupts.
// LAB 5: Your code here.
if (tf->tf_trapno == IRQ_OFFSET+IRQ_KBD) {
kbd_intr();
return;
}
if (tf->tf_trapno == IRQ_OFFSET+IRQ_SERIAL) {
serial_intr();
return;
}
The Shell
Unix OS treats everything as a file, and the console is no exception, in JOS ,Run make run-icode or make run-icode-nox. This will run your kernel and start user/icode. icode execs init, which will set up the console as file descriptors 0 and 1 (standard input and standard output). The experiment has implemented the IO redirection function, now we only need to redirect the file descriptors of the read file to standard input, which is file descriptors 0.
Exercise 10
// Open 't' for reading as file descriptor 0
// (which environments use as standard input).
// We can't open a file onto a particular descriptor,
// so open the file as 'fd',
// then check whether 'fd' is 0.
// If not, dup 'fd' onto file descriptor 0,
// then close the original 'fd'.
// LAB 5: Your code here.
if ((fd = open(t, O_RDONLY)) < 0) {
cprintf("open %s for read: %e", t, fd);
exit();
}
if (fd != 0) {
dup(fd, 0);
close(fd);
}
If the file opened from path t is not standard input, we only need to redirect. In fact, the idea of IO redirection is also very simple, mainly achieved by modifying the file descriptor:
// Make file descriptor 'newfdnum' a duplicate of file descriptor 'oldfdnum'.
// For instance, writing onto either file descriptor will affect the
// file and the file offset of the other.
// Closes any previously open file descriptor at 'newfdnum'.
// This is implemented using virtual memory tricks (of course!).
int
dup(int oldfdnum, int newfdnum)
{
int r;
char *ova, *nva;
pte_t pte;
struct Fd *oldfd, *newfd;
if ((r = fd_lookup(oldfdnum, &oldfd)) < 0)
return r;
close(newfdnum);
newfd = INDEX2FD(newfdnum);
ova = fd2data(oldfd);
nva = fd2data(newfd);
if ((uvpd[PDX(ova)] & PTE_P) && (uvpt[PGNUM(ova)] & PTE_P))
if ((r = sys_page_map(0, ova, 0, nva, uvpt[PGNUM(ova)] & PTE_SYSCALL)) < 0)
goto err;
if ((r = sys_page_map(0, oldfd, 0, newfd, uvpt[PGNUM(oldfd)] & PTE_SYSCALL)) < 0)
goto err;
return newfdnum;
err:
sys_page_unmap(0, newfd);
sys_page_unmap(0, nva);
return r;
}
图片由于cnblogs的渲染问题导致格式错乱了,贴个地址, https://app.diagrams.net/?client=1&lightbox=1&edit=_blank