结合自己的实践和网上的文章,介绍手工调试内核bug的通用方法。
1.步骤
1).Collect oops output, System.map, /proc/ksyms, vmlinux, /proc/modules
2).Use ksymoops to interpret oops
Instructions is /usr/src/linux/Documentation/oops-tracing.txt
Ksymoops(8) man page (http://www.die.net/doc/linux/man/man8/ksymoops.8.html)
2.简单分析
1)Ksymoops disassembles the code section
2)The EIP points to the failing instruction
3)The call trace section shows how you got there
Caution: Noise on the stack?
3.找到出错代码
oops 例子
C Source Code
int test_read_proc(char *buf, char **start, off_t offset, int count, int *eof, void*data)
{ int *ptr; ptr=0; printk("%d/n",*ptr);
return 0;
}
写个内核模块程序test.c,使用上面的代码来在/proc/下创建文件。
读去文件的时候会调用这个函数,从而产生下面的oops:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
c2483069 <---EIP (Instruction Pointer or Program Counter)
*pde = 00000000
Oops: 0000
CPU:
0
EIP: 0010:[ipv6:__insmod_ipv6_O/lib/modules/2.4.10-4GB/kernel/net/ipv6
ipv6+-472895383/96]
EFLAGS: 00010283
eax: db591f98 ebx: de2aeb60 ecx: de2aeb80 edx: c2483060
esi: 00000c00 edi: d41d0000 ebp: db591f5c esp: db591f4c
ds: 0018 es: 0018 ss: 0018
Process cat (pid: 1986, stackpage=db591000)
Stack: c012ca65 000001f0 ffffffea 00000000 00001000 c014e878 d41d0000 db591f98
00000000 00000c00 db591f94 00000000 de2aeb60 ffffffea 00000000 00001000
deae6f40 00000000 00000000 00000000 c01324d6 de2aeb60 0804db50 00001000
Call Trace: [__alloc_pages+65/452] [proc_file_read+204/420] [sys_read+146/200]
[system_call+51/64]
Code:a1 00 00 00 00 50 68 10 31 48 c2 e8 67 38 c9 fd 31 c0 89 ec
使用ksysmoops来获取内河函数地址,(也可以读取/proc/kallsym或者/proc/ksyms文件察看export出来的函数地址)
Using defaults from ksymoops -t elf32-i386 -a i386
Code; 00000000 Before first symbol
00000000 <_EIP>
:
Code; 00000000 Before first symbol
0: a1 00 00 00 00 mov 0x0,%eax
Code; 00000004 Before first symbol
5: 50 push %eax
Code; 00000006 Before first symbol
6: 68 10 31 48 c2 push $0xc2483110
Code; 0000000a Before first symbol
b: e8 67 38 c9 fd call fdc93877
<_EIP+0xfdc93877> fdc93876 <END_OF_CODE+1e1fa3d8/????>
Code; 00000010 Before first symbol
10: 31 c0 xor %eax,%eax
Code; 00000012 Before first symbol
12: 89 ec mov %ebp,%esp
c2483060 test_read_proc [test]
c2483000 __insmod_test_O/home/ross/prog/test.o_M3 [test]
c2483110 __insmod_test_S.rodata_L68 [test]
c2483060 __insmod_test_S.text_L176 [test]
c2483080 foo [test]
de79c340 ip6_frag_mem [ipv6]
de783d00 addrconf_del_ifaddr [ipv6]
de78a5bc ipv6_packet_init [ipv6]
de78fd70 ipv6_sock_mc_drop [ipv6]
de781ee4 ip6_call_ra_chain [ipv6]
可以看出EIP的c2483069是在[test]模块的test_read_proc 函数中。
(EIP) - (Base addr of routine)
c2483069 - c2483060 = 9
下一步反汇编test.o, 找到偏移量为9行即位代码出错行。
找到出错代码,使用objdump:
Excerpt from "objdump -D test.o "
test.o:
file
format elf32-i386
Disassembly of section .text:
00000000 <test_read_proc>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 08 sub $0x8,%esp
6: 83 c4 f8 add $0xfffffff8,%esp
9: a1 00 00 00 00 mov 0x0,%eax
e: 50 push %eax
f: 68 00 00 00 00 push $0x0
C Source Code
int test_read_proc(char *buf, char **start, off_t offset, int count, int *eof, void*data)
{ int *ptr; ptr=0; printk("%d/n",*ptr);
return 0;
}