LAB3 用户进程管理
相关源文件
inc/ env.h Public definitions for user-mode environments
trap.h Public definitions for trap handling
syscall.h Public definitions for system calls from user environments to the kernel
lib.h Public definitions for the user-mode support library
kern/ env.h Kernel-private definitions for user-mode environments
env.c Kernel code implementing user-mode environments
trap.h Kernel-private trap handling definitions
trap.c Trap handling code
trapentry.S Assembly-language trap handler entry-points
syscall.h Kernel-private definitions for system call handling
syscall.c System call implementation code
lib/ Makefrag Makefile fragment to build user-mode library, obj/lib/libjos.a
entry.S Assembly-language entry-point for user environments
libmain.c User-mode library setup code called from entry.S
syscall.c User-mode system call stub functions
console.c User-mode implementations of putchar and getchar, providing console I/O
exit.c User-mode implementation of exit
panic.c User-mode implementation of panic
user/ * Various test programs to check kernel lab 3 code
用户进程环境设计
JOS用一个Env结构体实时记录一个用户进程的状态。
struct Env {
struct Trapframe env_tf; // Saved registers
struct Env *env_link; // Next free Env
envid_t env_id; // Unique environment identifier
envid_t env_parent_id; // env_id of this env's parent
enum EnvType env_type; // Indicates special system environments
unsigned env_status; // Status of the environment
uint32_t env_runs; // Number of times environment has run
// Address space
pde_t *env_pgdir; // Kernel virtual address of page dir
};
各字段说明:
env_tf:
This structure, defined in inc/trap.h, holds the saved register values for the environment while that environment is not running: i.e., when the kernel or a different environment is running. The kernel saves these when switching from user to kernel mode, so that the environment can later be resumed where it left off.
env_link:
This is a link to the next Env on the env_free_list. env_free_list points to the first free environment on the list.
env_id:
The kernel stores here a value that uniquely identifiers the environment currently using this Env structure (i.e., using this particular slot in the envs array). After a user environment terminates, the kernel may re-allocate the same Env structure to a different environment - but the new environment will have a different env_id from the old one even though the new environment is re-using the same slot in the envs array.
env_parent_id:
The kernel stores here the env_id of the environment that created this environment. In this way the environments can form a “family tree,” which will be useful for making security decisions about which environments are allowed to do what to whom.
env_type:
This is used to distinguish special environments. For most environments, it will be ENV_TYPE_USER. We'll introduce a few more types for special system service environments in later labs.
env_status:
This variable holds one of the following values:
ENV_FREE:
Indicates that the Env structure is inactive, and therefore on the env_free_list.
ENV_RUNNABLE:
Indicates that the Env structure represents an environment that is waiting to run on the processor.
ENV_RUNNING:
Indicates that the Env structure represents the currently running environment.
ENV_NOT_RUNNABLE:
Indicates that the Env structure represents a currently active environment, but it is not currently ready to run: for example, because it is waiting for an interprocess communication (IPC) from another environment.
ENV_DYING:
Indicates that the Env structure represents a zombie environment. A zombie environment will be freed the next time it traps to the kernel. We will not use this flag until Lab 4.
env_pgdir:
This variable holds the kernel virtual address of this environment's page directory.
在kern/env.c
文件中,实现了这几个函数:
-
env_init()
初始化envs数组,并将它们全部加到env_free_list链表中。同时调用env_init_percpu
函数来为kernel level和user level配置互相区分开来的segment。 -
env_setup_vm()
为指定进程分配一张页表,并初始化此进程在kernel视角下的地址空间。 -
region_alloc()
分配物理内存空间并设置好物理内存和虚拟内存的映射。 -
load_icode()
有点像BootLoader,把ELF可执行文件解析加载到进程的地址空间中。
一般env_create()
紧随着load_icode()
。 -
env_create()
kernel使用这个函数创建一个用户进程。 -
env_run()
kernel转至用户态,并运行一个用户进程。
至此我们总结一下,kern/init.c
中的i386_init
先后做了:
- cons_init
- mem_init
- env_init
- trap_init
- env_create
- env_run
到这里我们编译一遍我们的OS,运行测试会发现一个问题:在用户态下,CPU遇到int指令不知道该做什么,OS重启。因为我们还没有设定用户态下的异常、中断、系统调用处理机制。处理器进入user mode后,无法回到kernel mode啊!
下面我们来继续完善,实现应对异常/中断的处理机制,支持从user mode回到kernel mode。
OS中的 Interrupt / Exception / System Call 都是什么
三者的共同点是:当发生时,CPU都会搁置目前执行的任务,转到IDT的指定位置。CPU从用户态转到内核态,或者从内核态转到内核态。
Interrupt是由外部(指软件的外部)导致,比如时钟中断、I/O设备信号;
Exception是由当前进程导致,比如除零异常,缓冲区溢出异常,往往源于用户程序开发者的疏忽、考虑不周全,是用户程序开发者应该尽力避免出现的。
system call是由当前进程导致,比如用户程序代码中使用read()
/ write()
/ fork()
函数,system call是绿色操作。
Exception和system call统称trap,trap即“用户态陷入内核态”。
可以说interrupt / exception / system call 殊途同归,从时间线上看,它们都是先查IDT表,转到内核态做一些操作,然后再转回用户态。
Interrupts/Traps 应对机制
我们需要 用户态下遇到 Interrupts/traps 时的应对机制,当用户态下遇到Interrupts/traps 时回到内核态,处理完Interrupts/traps (,并再返回用户态断点处)。
Basics of Protected Control Transfer
让我们认识一下什么是Protected Control Transfer:
Exceptions and interrupts are both "protected control transfers," which cause the processor to switch from user to kernel mode (CPL=0) without giving the user-mode code any opportunity to interfere with the functioning of the kernel or other environments. In Intel's terminology, an interrupt is a protected control transfer that is caused by an asynchronous event usually external to the processor, such as notification of external device I/O activity. An exception, in contrast, is a protected control transfer caused synchronously by the currently running code, for example due to a divide by zero or an invalid memory access.
The Interrupt Descriptor Table. The processor ensures that interrupts and exceptions can only cause the kernel to be entered at a few specific, well-defined entry-points determined by the kernel itself, and not by the code running when the interrupt or exception is taken.
The x86 allows up to 256 different interrupt or exception entry points into the kernel, each with a different interrupt vector. A vector is a number between 0 and 255. An interrupt's vector is determined by the source of the interrupt: different devices, error conditions, and application requests to the kernel generate interrupts with different vectors. The CPU uses the vector as an index into the processor's interrupt descriptor table (IDT), which the kernel sets up in kernel-private memory, much like the GDT. From the appropriate entry in this table the processor loads:
the value to load into the instruction pointer (EIP) register, pointing to the kernel code designated to handle that type of exception.
the value to load into the code segment (CS) register, which includes in bits 0-1 the privilege level at which the exception handler is to run. (In JOS, all exceptions are handled in kernel mode, privilege level 0.)
The Task State Segment. The processor needs a place to save the old processor state before the interrupt or exception occurred, such as the original values of EIP and CS before the processor invoked the exception handler, so that the exception handler can later restore that old state and resume the interrupted code from where it left off. But this save area for the old processor state must in turn be protected from unprivileged user-mode code; otherwise buggy or malicious user code could compromise the kernel.
For this reason, when an x86 processor takes an interrupt or trap that causes a privilege level change from user to kernel mode, it also switches to a stack in the kernel's memory. A structure called the task state segment (TSS) specifies the segment selector and address where this stack lives. The processor pushes (on this new stack) SS, ESP, EFLAGS, CS, EIP, and an optional error code. Then it loads the CS and EIP from the interrupt descriptor, and sets the ESP and SS to refer to the new stack.
Although the TSS is large and can potentially serve a variety of purposes, JOS only uses it to define the kernel stack that the processor should switch to when it transfers from user to kernel mode. Since "kernel mode" in JOS is privilege level 0 on the x86, the processor uses the ESP0 and SS0 fields of the TSS to define the kernel stack when entering kernel mode. JOS doesn't use any other TSS fields.
JOS中断控制流程
预备工作:
trap_init()将所有中断处理函数的起始地址放到中断向量表IDT中
当中断发生时:
- 不管是外部中断还是内部中断,处理器捕捉到该中断,进入内核态,根据中断向量查询IDT找到对应的表项。
- 保存中断点的上下文到内核堆栈中,调用这个表项中指明的中断处理函数。
- 执行中断处理函数
- 执行完成后,恢复断点处的上下文,返回用户态,继续运行这个进程。
Handling Page Faults 缺页中断
JOS遇到缺页中断,会调用\kern\trap.c
中的page_fault_handler()
来解决。
Breakpoint Exception
断点异常,异常号为3,可以让调试器给程序加上断点。即:把要加断电的语句用一个INT3指令替换,执行到这里时会触发软中断。JOS中把这个异常转换成一个伪系统调用,这样任何用户环境都可以使用这个伪系统调用来触发JOS kernel monitor。
System calls
system call作为接口,使得用户程序可以主动通过write()/fork()/execve()等函数调用kernel service。
User-mode startup
用户程序真正开始运行的地方是在lib/entry.S
文件中。lib/entry.S
会先进行一些设置,然后调用lib/libmain.c
文件中的libmain()
函数,初始化全局指针thisenv
,然后调用umain()
。
void libmain(int argc, char **argv)
{
// set thisenv to point at our Env structure in envs[]
thisenv = &envs[ENVX(sys_getenvid())];
// save the name of the program so that panic() can use it
if (argc > 0)
binaryname = argv[0];
// call user main routine
umain(argc, argv);
// exit gracefully
exit();
}
Question
- What is the purpose of having an individual handler function for each exception/interrupt? (i.e., if all exceptions/interrupts were delivered to the same handler, what feature that exists in the current implementation could not be provided?)
不同的中断/异常需要不同的处理方式,比如有些异常代表指令有错误,则不会返回被中断的命令;有些中断可能只是为了处理外部IO事件,此时执行完中断函数还要返回到被中断的程序中继续运行。
- Did you have to do anything to make the user/softint program behave correctly? The grade script expects it to produce a general protection fault (trap 13), but softint's code says int $14. Why should this produce interrupt vector 13? What happens if the kernel actually allows softint's int $14 instruction to invoke the kernel's page fault handler (which is interrupt vector 14)?
当前系统运行在user mode下, 特权级为3,而INT指令为系统指令,特权级为0。特权级为3的程序不能直接调用特权级为0的程序,会引发一个General Protection Exception,即trap 13。
-
The break point test case will either generate a break point exception or a general protection fault depending on how you initialized the break point entry in the IDT (i.e., your call to SETGATE from trap_init). Why? How do you need to set it up in order to get the breakpoint exception to work as specified above and what incorrect setup would cause it to trigger a general protection fault?
-
What do you think is the point of these mechanisms, particularly in light of what the user/softint test program does?
Page faults & memory protection
操作系统通常依赖于硬件的支持来实现内存保护。怎么讲?LAB2已经讲过了虚拟地址空间通过UTOP这个宏定义划分内核区和用户区。当用户程序尝试去访问一个无效地址,或者尝试去访问一个超出它访问权限的地址时,处理器会在这个指令处终止并触发异常,陷入内核态。同时把错误信息报告给内核。如果这个异常可被修复,内核会修复这个异常,然后程序继续运行;如果异常无法被修复,则程序永远不会继续运行。
可扩展堆栈(注意不是堆栈溢出!)就是一个可修复异常的例子。在许多系统中,内核在初始情况下只会分配一个内核堆栈页,如果程序想要访问这个内核堆栈页之外的堆栈空间的话就会触发异常,内核会自动分配一些页给这个程序,程序就可以继续运行了。
同样,缺页中断也是一个可修复异常。
大部分系统调用接口会让用户程序传递一个指针参数给内核,这些指针指向的是用户缓冲区。通过这种方式,系统调用在执行时就可以解引用这些指针,但是有两个问题:
- 内核态下发生page fault比在用户态下发生page fault的结果往往更糟糕。内核态下遇到page fault后进行异常处理时会中断整个内核。
- 内核如果随意解析这个指针,可能泄露内核的重要信息。但是我们设计OS要求用户程序的内存访问被限制在给定范围里。
在Exercise9里,要求我们修改kern/trap.c
文件使其能够实现:当在内核模式下发现page fault,trap.c文件会panic。