进程切换之奥秘解析
学号:SA12**6112
前面一篇博文分析了进程从用户态切换到内核态时,内核所做的主要的事,本文将研究在进程切换时,内核所做的事。
在内核态,进程切换主要分两步:
1:切换页全局目录
2:切换内核堆栈和硬件上下文
用prev指向被替换进程的表述符,next指向被激活进程的描述符
下面分析进程切换的第二步
第二步主要由switch_to宏实现:
3.3内核中X86体系下:/arch/x86/include/asm/system.h文件的第48行处:
48 #define switch_to(prev, next, last) \ 49 do { \ 50 /* \ 51 * Context-switching clobbers all registers, so we clobber \ 52 * them explicitly, via unused output variables. \ 53 * (EAX and EBP is not listed because EBP is saved/restored \ 54 * explicitly for wchan access and EAX is the return value of \ 55 * __switch_to()) \ 56 */ \ 57 unsigned long ebx, ecx, edx, esi, edi; \ 58 \ 59 asm volatile("pushfl\n\t" /* save flags */ \ 60 "pushl %%ebp\n\t" /* save EBP */ \ 61 "movl %%esp,%[prev_sp]\n\t" /* save ESP */ \ 62 "movl %[next_sp],%%esp\n\t" /* restore ESP */ \ 63 "movl $1f,%[prev_ip]\n\t" /* save EIP */ \ 64 "pushl %[next_ip]\n\t" /* restore EIP */ \ 65 __switch_canary \ 66 "jmp __switch_to\n" /* regparm call */ \ 67 "1:\t" \ 68 "popl %%ebp\n\t" /* restore EBP */ \ 69 "popfl\n" /* restore flags */ \ 70 \ 71 /* output parameters */ \ 72 : [prev_sp] "=m" (prev->thread.sp), \ 73 [prev_ip] "=m" (prev->thread.ip), \ 74 "=a" (last), \ 75 \ 76 /* clobbered output registers: */ \ 77 "=b" (ebx), "=c" (ecx), "=d" (edx), \ 78 "=S" (esi), "=D" (edi) \ 79 \ 80 __switch_canary_oparam \ 81 \ 82 /* input parameters: */ \ 83 : [next_sp] "m" (next->thread.sp), \ 84 [next_ip] "m" (next->thread.ip), \ 85 \ 86 /* regparm parameters for __switch_to(): */ \ 87 [prev] "a" (prev), \ 88 [next] "d" (next) \ 89 \ 90 __switch_canary_iparam \ 91 \ 92 : /* reloaded segment registers */ \ 93 "memory"); \ 94 } while (0)
一:由上面的代码可以看出,切换内核堆栈主要工作是:
1:把eflags和ebp寄存器保存到prev内核栈中。
2:把esp保存到prev->thread.sp中,eip保存到prev->thread.ip中。
3:把next指向的新进程的thread.esp保存到esp中,把next->thread.ip保存到eip中
至此已经完成了内核堆栈的切换。
二:切换内核堆栈之后,TSS段也要相应的改变:
这是因为对于linux系统来说同一个CPU上所有的进程共用一个TSS,进程切换了,因此TSS需要随之改变。
linux系统中主要从两个方面用到了TSS:
(1)任何进程从用户态陷入内核态都必须从TSS获得内核堆栈指针
(2)用户态读写IO需要访问TSS的权限位图。
所以在进程切换时也要更新TSS中的esp0和IO权位图的值,这主要在_switch_to函数中完成:
3.3内核X86体系下:/arch/x86/kernel/process_32.c文件中第296行处:
296 __notrace_funcgraph struct task_struct * 297 __switch_to(struct task_struct *prev_p, struct task_struct *next_p) 298 { 299 struct thread_struct *prev = &prev_p->thread, 300 *next = &next_p->thread; 301 int cpu = smp_processor_id(); 302 struct tss_struct *tss = &per_cpu(init_tss, cpu); 303 fpu_switch_t fpu; 304 305 /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */ 306 307 fpu = switch_fpu_prepare(prev_p, next_p, cpu); 308 309 /* 310 * Reload esp0. 311 */ 312 load_sp0(tss, next); 313 314 /* 315 * Save away %gs. No need to save %fs, as it was saved on the 316 * stack on entry. No need to save %es and %ds, as those are 317 * always kernel segments while inside the kernel. Doing this 318 * before setting the new TLS descriptors avoids the situation 319 * where we temporarily have non-reloadable segments in %fs 320 * and %gs. This could be an issue if the NMI handler ever 321 * used %fs or %gs (it does not today), or if the kernel is 322 * running inside of a hypervisor layer. 323 */ 324 lazy_save_gs(prev->gs); 325 326 /* 327 * Load the per-thread Thread-Local Storage descriptor. 328 */ 329 load_TLS(next, cpu); 330 331 /* 332 * Restore IOPL if needed. In normal use, the flags restore 333 * in the switch assembly will handle this. But if the kernel 334 * is running virtualized at a non-zero CPL, the popf will 335 * not restore flags, so it must be done in a separate step. 336 */ 337 if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl)) 338 set_iopl_mask(next->iopl); 339 340 /* 341 * Now maybe handle debug registers and/or IO bitmaps 342 */ 343 if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV || 344 task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT)) 345 __switch_to_xtra(prev_p, next_p, tss); 346 347 /* 348 * Leave lazy mode, flushing any hypercalls made here. 349 * This must be done before restoring TLS segments so 350 * the GDT and LDT are properly updated, and must be 351 * done before math_state_restore, so the TS bit is up 352 * to date. 353 */ 354 arch_end_context_switch(next_p); 355 356 /* 357 * Restore %gs if needed (which is common) 358 */ 359 if (prev->gs | next->gs) 360 lazy_load_gs(next->gs); 361 362 switch_fpu_finish(next_p, fpu); 363 364 percpu_write(current_task, next_p); 365 366 return prev_p; 367 }
由上面的代码可看出:TSS的更新主要是
1: load_sp0(tss, next); 从下一个进程的thread字段中获取它的sp0,并用它来更新TSS中的sp0
2: __switch_to_xtra(prev_p, next_p, tss);必要的时候会更新IO权位值。