深入理解系统调用
作业要求:
- 找一个系统调用,系统调用号为学号最后2位相同的系统调用
- 通过汇编指令触发该系统调用
- 通过gdb跟踪该系统调用的内核处理过程
- 重点阅读分析系统调用入口的保存现场、恢复现场和系统调用返回,以及重点关注系统调用过程中内核堆栈状态的变化
一、选择系统调用
本人学号尾数为31,但是查找syscall_32.tbl
表后发现31号系统调用为stty
,进一步搜素在系统调用描述文件里面找到此系统调用和32号gtty
都为sys_ni_syscall
,进一步查资料发现上述两个系统调用已经被淘汰,所以它所对应的服务例程就要被指定为sys_ni_syscall
。
知识拓展:
即使31号和32号系统调用已经被淘汰了,但是我们并不能将它们的位置分配给其他的系统调用,因为一些老的代码可能还会使用到它们。否则,如果某个用户应用试图调用这些已经被淘汰的系统调用,所得到的结果,比如打开了一个文件,就会与预期完全不同,这将令人感到非常奇怪。其实,sys_ni_syscall中的"ni"即表示"not implemented(没有实现)
下面转而分析31号上面的系统调用,即30号utime
。
# The format is:
# <number> <abi> <name> <entry point> <compat entry point>
30 i386 utime sys_utime32 __ia32_sys_utime32
utime
的作用为修改文件的访问时间和修改时间。其对应的32位entry point
为sys_utime32
,搜索sys_utime32
在utimes.c文件中找到了其实现,它是通过调用do_utimes
来实现的。do_utimes
的代码实现如下:
/*
* do_utimes - change times on filename or file descriptor
* @dfd: open file descriptor, -1 or AT_FDCWD
* @filename: path name or NULL
* @times: new times or NULL
* @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
*
* If filename is NULL and dfd refers to an open file, then operate on
* the file. Otherwise look up filename, possibly using dfd as a
* starting point.
*
* If times==NULL, set access and modification to current time,
* must be owner or have write permission.
* Else, update from *times, must be owner or super user.
*/
long do_utimes(int dfd, const char __user *filename, struct timespec64 *times,
int flags)
{
int error = -EINVAL;
if (times && (!nsec_valid(times[0].tv_nsec) ||
!nsec_valid(times[1].tv_nsec))) {
goto out;
}
if (flags & ~AT_SYMLINK_NOFOLLOW)
goto out;
if (filename == NULL && dfd != AT_FDCWD) {
struct fd f;
if (flags & AT_SYMLINK_NOFOLLOW)
goto out;
f = fdget(dfd);
error = -EBADF;
if (!f.file)
goto out;
error = utimes_common(&f.file->f_path, times);
fdput(f);
} else {
struct path path;
int lookup_flags = 0;
if (!(flags & AT_SYMLINK_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW;
retry:
error = user_path_at(dfd, filename, lookup_flags, &path);
if (error)
goto out;
error = utimes_common(&path, times);
path_put(&path);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
}
}
out:
return error;
}
二、 触发系统调用(直接触发+汇编触发)
使用下面的代码直接触发utime系统调用:
#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
char *pathname;
struct stat sb;
struct utimbuf utb;
if (argc != 2 || strcmp(argv[1], "--help") == 0){
printf("%s file\n", argv[0]);
return 1;
}
pathname = argv[1];
//获取当前文件时间
if (stat(pathname, &sb) == -1)
return 1;
//把最近修改时间改成访问时间
utb.actime = sb.st_atime;
utb.modtime = sb.st_atime; /* Make modify time same as access time */
// 调用utime
if (utime(pathname, &utb) == -1) /* Update file times */
return 1;
return 0;
}
对上述的程序进行修改,使用汇编来调用utime,其实就是使用汇编指令传递utime的参数,并使用系统调用通过软中断0x80陷入内核,跳转到系统调用处理程序system_call(sys_utime32)函数,并执行相应的服务例程,但由于是代表用户进程,所以这个执行过程并不属于中断上下文,而是处于进程上下文:
#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char *pathname;
struct stat sb;
struct utimbuf utb;
if (argc != 2 || strcmp(argv[1], "--help") == 0){
printf("%s file\n", argv[0]);
return 1;
}
pathname = argv[1];
//获取当前文件时间
if (stat(pathname, &sb) == -1)
return 1;
//把最近修改时间改成访问时间
utb.actime = sb.st_atime;
utb.modtime = sb.st_atime; /* Make modify time same as access time */
int flag;
asm volatile(
"movl %1, %%ebx\n\t" // 将pathname放入ebx
"movl %2, %%ecx\n\t" // 将utimbuf 的引用放入ecx
"movl $30, %%eax\n\t" //通过EAX寄存器返回系统调用值
"int $0x80\n\t" // 通过软中断0x80陷入内核
"movl %%eax, %0\n\t" // 将输出通过eax赋值给flag
:"=m"(flag)
:"b"(pathname),"c"(&utb)
);
if (flag == -1) /* Update file times */
return 1;
return 0;
}
三、 通过gdb跟踪该系统调用的内核处理过程
3.1、 gdb环境配置
首先执行qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz
启动qemu(注意路径),然后把本地使用汇编触发utime系统调用的编译过可执行程序copy到rootfs/home/目录下,然后再在rootfs/home/目录下建一个b.test文件。然后使用以下命令重新打包根文件系统镜像(rootfs下执行),再重启qemu。
find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../rootfs.cpio.gz
// 重新运行qemu
qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz
关掉qemu,在终端使用qemu-system-x86_64 -kernel ./arch/x86/boot/bzImage -initrd ./busybox-1.31.1/rootfs.cpio.gz -S -s -nographic -append "console=ttyS0"
以shell的形式运行qemu进行调试(退出使用killall qemu-system-x86_64
)。再新开一个终端,执行以下命令加载vmlinux和连接gdb server,然后尝试着在start_kernel处打断点,可以看到qemu执行到Booting the kernel会停下来:
gdb
file vmlinux
target remote:1234
b start_kernel
c
....
可能出现的错误及解决方法:
- ERROR:执行
file vmlinux
可能会报一下错误:
- 解决方法:
vi ~/.gdbinit ================添加以下内容============== add-auto-load-safe-path /home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py set auto-load safe-path / python sys.path.append("/home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py")
- ERROR:Remote 'g' packet reply is too long
- 解决方法:https://stackoverflow.com/questions/8662468/remote-g-packet-reply-is-too-long
3.2、系统调用分析
使用gcc a.c -static -m32
把a.c
编译成32位的可执行文件,然后再使用 objdump -S a.out > a32.s
反汇编查看utime的调用过程。
可以看到utime并没有使用syscall,而是调用0x80ea9f0,使用gdb 运行x 0x80ea9f0
查看该地址的值如下
无奈,只好转而分析一下64位的utime,使用上述方法重新得到64的反汇编代码如下(部分):
000000000043f250 <utime>:
43f250: b8 84 00 00 00 mov $0x84,%eax
43f255: 0f 05 syscall
43f257: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
43f25d: 0f 83 4d 52 00 00 jae 4444b0 <__syscall_error>
43f263: c3 retq
43f264: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
43f26b: 00 00 00
43f26e: 66 90 xchg %ax,%ax
从上面的代码可以看到,utime的系统调用号为0x84(132)
,查看系统调用表可以发现对应的系统调用函数为__x64_sys_utime
3.3、使用gdb调试跟踪
对__x64_sys_utime
打断点,然后在qemu运行64位的程序(注意要重新打包rootfs),可以看到成功跟踪到了utime.c文件的相关代码
可以看到调用的是do_futimesat
在utime.c中可以发现下面这段注释:
futimesat()、utimes()和utime()是utimensat()的旧版本为与传统C库兼容而提供的。
在现代体系中,我们总是使用libc包装器utimensat ()
即utime是为了对c语言库进行兼容,现在使用utimensat,其为第320号系统调用,并且不管是utime还是utimensat,都是调用的do_utimes()函数。
=======================do_utimes描述==========================
/*
* do_utimes - change times on filename or file descriptor
* @dfd: open file descriptor, -1 or AT_FDCWD
* @filename: path name or NULL
* @times: new times or NULL
* @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
*
* If filename is NULL and dfd refers to an open file, then operate on
* the file. Otherwise look up filename, possibly using dfd as a
* starting point.
*
* If times==NULL, set access and modification to current time,
* must be owner or have write permission.
* Else, update from *times, must be owner or super user.
*/
具体的跟踪过程如下两段代码所示(第一段先整体查看调用流程,并监视堆栈的变化,第二段进入部分函数内部,查看细节):
(gdb) b __x64_sys_utime
Note: breakpoints 1, 2, 3, 4, 5 and 6 also set at pc 0xffffffff81206f07.
Breakpoint 8 at 0xffffffff81206f07: file fs/utimes.c, line 204.
(gdb) c
Continuing.
(gdb) bt
#0 __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
#1 0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:290
#2 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3 0x0000000000000000 in ?? ()
(gdb) n
Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90 {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94 !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98 if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101 if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119 lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121 error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122 if (error)
(gdb) n
125 error = utimes_common(&path, times);
(gdb) n
126 path_put(&path);
(gdb) n
127 if (retry_estale(error, lookup_flags)) {
(gdb) n
135 }
(gdb) bt
#0 do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:119
#1 0xffffffff81206f64 in __do_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:215
#2 __se_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:204
#3 __x64_sys_utime (regs=<optimized out>) at fs/utimes.c:204
#4 0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x4a1024) at arch/x86/entry/common.c:290
#5 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#6 0x0000000000000000 in ?? ()
(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) bt
#0 __x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
#1 0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x0 <fixed_percpu_data>) at arch/x86/entry/common.c:290
#2 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3 0x0000000000000000 in ?? ()
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) n
301 }
(gdb) bt
#0 do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:301
#1 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#2 0x0000000000000000 in ?? ()
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184 movq RCX(%rsp), %rcx
(gdb) bt
#0 entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
#1 0x0000000000000000 in ?? ()
(gdb) n
185 movq RIP(%rsp), %r11
(gdb) n
187 cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */
(gdb) n
188 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
205 shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206 sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210 cmpq %rcx, %r11
(gdb) n
211 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
213 cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
(gdb) n
214 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
216 movq R11(%rsp), %r11
(gdb) n
217 cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
(gdb) n
218 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
238 testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239 jnz swapgs_restore_regs_and_return_to_usermode
(gdb) n
243 cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
(gdb) n
244 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253 POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) bt
#0 syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
#1 0x0000000000000000 in ?? ()
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259 movq %rsp, %rdi
(gdb) n
260 movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262 pushq RSP-RDI(%rdi) /* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263 pushq (%rdi) /* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271 SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273 popq %rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274 popq %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275 USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb)
Breakpoint 1, __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n
Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90 {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94 !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98 if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101 if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119 lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121 error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122 if (error)
(gdb) n
125 error = utimes_common(&path, times);
(gdb) n
126 path_put(&path);
(gdb) n
127 if (retry_estale(error, lookup_flags)) {
(gdb) s
retry_estale (flags=<optimized out>, error=<optimized out>) at ./include/linux/namei.h:91
91 return error == -ESTALE && !(flags & LOOKUP_REVAL);
(gdb) n
do_utimes (dfd=118112576, filename=0x64 <error: Cannot access memory at address 0x64>, times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:135
135 }
(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) s
get_current () at ./arch/x86/include/asm/current.h:15
15 return this_cpu_read_stable(current_task);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:256
256 u32 cached_flags = READ_ONCE(ti->flags);
(gdb) n
270 if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS))
(gdb) n
273 local_irq_disable();
(gdb) s
arch_local_irq_disable () at arch/x86/entry/common.c:273
273 local_irq_disable();
(gdb) s
native_irq_disable () at ./arch/x86/include/asm/irqflags.h:49
49 asm volatile("cli": : :"memory");
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:274
274 prepare_exit_to_usermode(regs);
(gdb) n
do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) n
301 }
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184 movq RCX(%rsp), %rcx
(gdb) n
185 movq RIP(%rsp), %r11
(gdb) n
187 cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */
(gdb) n
188 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
205 shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206 sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210 cmpq %rcx, %r11
(gdb) n
211 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
213 cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
(gdb) n
214 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
216 movq R11(%rsp), %r11
(gdb) n
217 cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
(gdb) n
218 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
238 testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239 jnz swapgs_restore_regs_and_return_to_usermode
(gdb) n
243 cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
(gdb) n
244 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253 POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259 movq %rsp, %rdi
(gdb) n
260 movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262 pushq RSP-RDI(%rdi) /* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263 pushq (%rdi) /* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271 SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273 popq %rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274 popq %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275 USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb)
四、 分析总结
utime的系统调用触发大致过程如下(错误之处望指正):
utime
函数触发系统调用__x64_sys_utime
,其主要通过调用do_utimes
来完成相应的功能。- do_utimes通过文件描述符引用一个打开的文件,然后操作文件。If times==NULL,就将访问和修改设置为当前时间。然后调用
do_syscall_64
从寄存器%rax里面取出系统调用号,然后根据系统调用号,在系统调用表sys_call_table中找到相应的函数进行调用并将寄存器中保存的参数取出来,作为函数参数,然后陷入内核。- 最后系统调用结束前,一般会调用
prepare_exit_to_usermode
进行准备工作,然后使用jne条件转移指令等进行一系列的restore,恢复到用户态。e.g:jne swapgs_restore_regs_and_return_to_usermode
参考文章:
- https://blog.csdn.net/CSLQM/article/details/53202225
- http://www.daileinote.com/computer/linux_sys/13
- https://stackoverflow.com/questions/31062010/ubuntu-14-04-gcc-4-8-4-gdb-pretty-printing-doesnt-work-because-of-python-issu
- https://www.cnblogs.com/guxuanqing/p/5638363.html
- https://blog.csdn.net/zhaoxd200808501/article/details/77838933
- https://www.binss.me/blog/the-analysis-of-linux-system-call/