linux 内核学习之五 system_call过程分析
一 使用gdb工具跟踪分析一个自添加的系统调用
应用程序的进程通常在用户空间下运行,当它调用一个系统调用时,进程进入内核空间,执行的是kernel内部的代码,从而具有执行特权指令的权限,完成特定的功能。
在上次实验的基础上修改test.c,添加自己实现的setuid系统调用,部分代码修改如下:
int uid_c() { int i=65535,k=0; i=getuid(); printf("current user id is:%d\n",i); setuid(200); k=getuid(); printf("after change uid:%d\n",k); return 0; } int uid_asm() { int i=65535,j=200,k=0; asm volatile( "mov $0,%%ebx\n\t" "mov $0x18,%%eax\n\t" "int $0x80\n\t" "mov %%eax,%0\n\t" :"=m"(i) ); printf("cureent user id is:%d\n",i); asm volatile( "mov $0,%%ebx\n\t" "mov $0x17,%%eax\n\t" "mov %1,%%ebx\n\t" "int $0x80\n\t" // "mov %1,%%ebx\n\t" "mov %%eax,%0\n\t" :"=m"(i) :"c"(j) ); asm volatile( "mov $0,%%ebx\n\t" "mov $0x18,%%eax\n\t" "int $0x80\n\t" "mov %%eax,%0\n\t" :"=m"(k) ); printf("after change user id is:%d\n",k); return 0; } int main() { PrintMenuOS(); SetPrompt("MenuOS>>"); MenuConfig("version","MenuOS V1.0(Based on Linux 3.18.6)",NULL); MenuConfig("quit","Quit from MenuOS",Quit); MenuConfig("time","Show System Time",Time); MenuConfig("time-asm","Show System Time(asm)",TimeAsm); MenuConfig("uid","Show user id",uid_c); //添加的部分 MenuConfig("uid-asm","Show user id(asm)",uid_asm); //添加的部分 ExecuteMenu(); }
重新编译执行:在原程序的基础上添加了两条命令uid和uid-asm,如下图
下面使用gdb工具进行调试:
1. 设置断点
输入c执行:
程序停在start_kernel;
继续执行:
程序停在rest_init;
接着执行:
接下来输入自己添加的命令uid和uid_asm
程序停在sys_getuid16函数的地方,执行就显示结果:
显示的结果:
二 system_call 过程分析
附上代码:
ENTRY(system_call) RING0_INT_FRAME # can't unwind into user space anyway ASM_CLAC pushl_cfi %eax # save orig_eax SAVE_ALL GET_THREAD_INFO(%ebp) # system call tracing in operation / emulation testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp) jnz syscall_trace_entry cmpl $(NR_syscalls), %eax jae syscall_badsys syscall_call: call *sys_call_table(,%eax,4) syscall_after_call: movl %eax,PT_EAX(%esp) # store the return value syscall_exit: LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt # setting need_resched or sigpending # between sampling and the iret TRACE_IRQS_OFF movl TI_flags(%ebp), %ecx testl $_TIF_ALLWORK_MASK, %ecx # current->work jne syscall_exit_work restore_all: TRACE_IRQS_IRET restore_all_notrace: #ifdef CONFIG_X86_ESPFIX32 movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS # Warning: PT_OLDSS(%esp) contains the wrong/random values if we # are returning to the kernel. # See comments in process.c:copy_thread() for details. movb PT_OLDSS(%esp), %ah movb PT_CS(%esp), %al andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax CFI_REMEMBER_STATE je ldt_ss # returning to user-space with LDT SS #endif restore_nocheck: RESTORE_REGS 4 # skip orig_eax/error_code irq_return: INTERRUPT_RETURN .section .fixup,"ax" ENTRY(iret_exc) pushl $0 # no error code pushl $do_iret_error jmp error_code .previous _ASM_EXTABLE(irq_return,iret_exc) #ifdef CONFIG_X86_ESPFIX32 CFI_RESTORE_STATE ldt_ss: #ifdef CONFIG_PARAVIRT /* * The kernel can't run on a non-flat stack if paravirt mode * is active. Rather than try to fixup the high bits of * ESP, bypass this code entirely. This may break DOSemu * and/or Wine support in a paravirt VM, although the option * is still available to implement the setting of the high * 16-bits in the INTERRUPT_RETURN paravirt-op. */ cmpl $0, pv_info+PARAVIRT_enabled jne restore_nocheck #endif /* * Setup and switch to ESPFIX stack * * We're returning to userspace with a 16 bit stack. The CPU will not * restore the high word of ESP for us on executing iret... This is an * "official" bug of all the x86-compatible CPUs, which we can work * around to make dosemu and wine happy. We do this by preloading the * high word of ESP with the high word of the userspace ESP while * compensating for the offset by changing to the ESPFIX segment with * a base address that matches for the difference. */ #define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8) mov %esp, %edx /* load kernel esp */ mov PT_OLDESP(%esp), %eax /* load userspace esp */ mov %dx, %ax /* eax: new kernel esp */ sub %eax, %edx /* offset (low word is 0) */ shr $16, %edx mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */ mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */ pushl_cfi $__ESPFIX_SS pushl_cfi %eax /* new kernel esp */ /* Disable interrupts, but do not irqtrace this section: we * will soon execute iret and the tracer was already set to * the irqstate after the iret */ DISABLE_INTERRUPTS(CLBR_EAX) lss (%esp), %esp /* switch to espfix segment */ CFI_ADJUST_CFA_OFFSET -8 jmp restore_nocheck #endif CFI_ENDPROC ENDPROC(system_call)
凭着自己的理解,简要的画了一张流程图:
三 总结
对系统调用过程的理解:从上次课我们了解到系统调用是通过用户态进程发出int $0x80,cpu从用户态切换到内核态,从这次课我们可以了解到确切的说是从system_call处开始执行。首先进行地址空间的切换和堆栈的切换,对用户空间的数据进行保存,接着根据作为参数传递的系统调用号找到对应的系统调用服务例程,在例程处理完后,对返回值进行保存,当要返回用户空间时,仍然要执行很多检测,因为一般的现代操作系听都是多任务系统,返回时就要检测申请系统调用的用户进程是否拥有执行时间,所以就有一些检测信号量,检测调度等等操作,当发生调度时,恢复的现场就是其他用户进程以前某个时刻保存的现场。就这样循环。。。。可以说,系统调用过程就是中断处理过程的典型应用实例。。。。
by:方龙伟
原创作品 转载请注明出处
《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000