菠菜

敏感而豁达

Understanding a Kernel Oops!(翻译)

image

Understanding a kernel panic and doing the forensics to trace the bug is considered a hacker’s job. This is a complex task that requires sound knowledge of both the architecture you are working on, and the internals of the Linux kernel. Depending on type of error detected by the kernel, panics in the Linux kernel are classified as hard panics (Aiee!) and soft panics (Oops!). This article explains the workings of a Linux kernel ‘Oops’, helps to create a simple version, and then debug it. It is mainly intended for beginners getting into Linux kernel development, who need to debug the kernel. Knowledge of the Linux kernel, and C programming, is assumed.

理解内核恐慌并跟踪取证BUG这事很多人认为是黑客做的。这个很复杂,它需要你在当前系统构架与Linux内核底层两方面有很深的知识积淀。根据内核所探测到的错误类型,Linux内核的恐慌分为硬恐慌(Aiee!)和软恐慌(Oops!)。本文将解释Linux内核的Oops的工作原理,并创建一个简单的例子,然后去调试它。本文主要面向需要调试内核的初学者,以深入Linux内核开发。阅读本文时已假定你对Linux内核和C编程语言有了解。

An “Oops” is what the kernel throws at us when it finds something faulty, or an exception, in the kernel code. It’s somewhat like the segfaults of user-space. An Oops dumps its message on the console; it contains the processor status and the CPU registers of when the fault occurred. The offending process that triggered this Oops gets killed without releasing locks or cleaning up structures. The system may not even resume its normal operations sometimes; this is called an unstable state. Once an Oops has occurred, the system cannot be trusted any further.

Let’s try to generate an Oops message with sample code, and try to understand the dump.

Oops是在内核代码遇到错误或者异常时,内核抛给我们。它就像用户究竟的段错误一样。一个Oops会将它的消息转储在控制台上。它包含了错误发生时处理器的状态和CPU寄存器值。引发Oops的进程将不管是否释放了锁或者清理相关结构而直接被杀死。有时系统也可能不能恢复到正常状态,也叫不稳定状态。一旦有Oops发生,系统将不再被信任。

让我们用示例代码去试着生成一个Oops消息,然后去学习转储的数据。

Setting up the machine to capture an Oops

设置机器可以抓取Oops

The running kernel should be compiled with CONFIG_DEBUG_INFO, and syslogd should be running. To generate and understand an Oops message, Let’s write a sample kernel module, oops.c:

运行中的内核在编译时应设置CONFIG_DEBUG_INFO选项,同时syslogd应该是处于运行状态。为了生成并理解一个Oops消息,让我们写一个示例内核模块,oops.c:

#include <linux/kernel.h> 
#include <linux/module.h> 
#include <linux/init.h> 
 
static void create_oops() { 
        *(int *)0 = 0; 
} 
 
static int __init my_oops_init(void) { 
        printk("oops from the module\n"); 
        create_oops(); 
       return (0); 
} 
static void __exit my_oops_exit(void) { 
        printk("Goodbye world\n"); 
} 
 
module_init(my_oops_init); 
module_exit(my_oops_exit);
#include <linux/kernel.h> 
#include <linux/module.h> 
#include <linux/init.h> 
 
static void create_oops() { 
        *(int *)0 = 0; 
} 
 
static int __init my_oops_init(void) { 
        printk("oops from the module\n"); 
        create_oops(); 
       return (0); 
} 
static void __exit my_oops_exit(void) { 
        printk("Goodbye world\n"); 
} 
 
module_init(my_oops_init); 
module_exit(my_oops_exit);

The associated Makefile for this module is as follows:

与本模块相关的Makefile文件如下:

obj-m := oops.o 
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd) 
SYM=$(PWD) 

all: 
        $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
obj-m := oops.o 
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd) 
SYM=$(PWD) 

all: 
        $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

Once executed, the module generates the following Oops:

一经执行,本模块将生成如下Oops:

BUG: unable to handle kernel NULL pointer dereference at (null) 
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
PGD 7a719067 PUD 7b2b3067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/virtual/misc/kvm/uevent 
CPU 1 
Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64 
RIP: 0010:[<ffffffffa03e1012>] [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
RSP: 0018:ffff88007ad4bf08 EFLAGS: 00010292 
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010 
FS: 00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000 
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0) 
Stack: 
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 
0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 
ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000 
Call Trace: 
[<ffffffff8100205f>] do_one_initcall+0x59/0x154 
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b 
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00 
RIP [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
RSP <ffff88007ad4bf08> 
CR2: 0000000000000000
BUG: unable to handle kernel NULL pointer dereference at (null) 
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
PGD 7a719067 PUD 7b2b3067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/virtual/misc/kvm/uevent 
CPU 1 
Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64 
RIP: 0010:[<ffffffffa03e1012>] [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
RSP: 0018:ffff88007ad4bf08 EFLAGS: 00010292 
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010 
FS: 00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000 
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0) 
Stack: 
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 
0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 
ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000 
Call Trace: 
[<ffffffff8100205f>] do_one_initcall+0x59/0x154 
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b 
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00 
RIP [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
RSP <ffff88007ad4bf08> 
CR2: 0000000000000000

Understanding the Oops dump

理解Oops转储

Let’s have a closer look at the above dump, to understand some of the important bits of information.

让我们仔细看一下上面的转储信息,一点一点地理解信息中的细节。

BUG: unable to handle kernel NULL pointer dereference at (null)
BUG: unable to handle kernel NULL pointer dereference at (null)

The first line indicates a pointer with a NULL value.

第一行表示有一个指针的值为空。

IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]

IP is the instruction pointer.

IP是指令指针。

Oops: 0002 [#1] SMP
Oops: 0002 [#1] SMP

This is the error code value in hex. Each bit has a significance of its own:

这是十六进制的错误代码。每一个比特位有它独有的意义:

bit 0 == 0 means no page found, 1 means a protection fault

bit 1 == 0 means read, 1 means write

bit 2 == 0 means kernel, 1 means user-mode

[#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a cascading effect of the first one.

bit 0为0表示页面没有找到,为1表示产生保护性错误

bit 1为0表示读,为1表示写

bit 2为0表示内核空间,为1表示用户空间

[#1] — 这个值表示Oops产生的次数。第一个Oops可能因为级联效应而产生多个Oops。

CPU 1
CPU 1

This denotes on which CPU the error occurred.

这表示错误发生在哪个CPU上。

Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64
Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64

The Tainted flag points to P here. Each flag has its own meaning. A few other flags, and their meanings, picked up from kernel/panic.c:

这里受污点的标志指向P。每个标志都有它独有的意义。下面是从kernel/panic.c中找到的一些其他标志以及意义:

P — Proprietary module has been loaded.

F — Module has been forcibly loaded.

S — SMP with a CPU not designed for SMP.

R — User forced a module unload.

M — System experienced a machine check exception.

B — System has hit bad_page.

U — Userspace-defined naughtiness.

A — ACPI table overridden.

W — Taint on warning.

P — 专有模块已经被加载。

F — 模块已经被强制加载。

S — SMP不被CPU所支持。

R — 用户强制模块卸载。

M — 系统遇到一个机器检查异常。

B — 系统集中了坏页。

U — 用户空间定义的顽皮。

A — ACPI表被覆盖。

W — 被警告的污点。

RIP: 0010:[<ffffffffa03e1012>]? [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RIP: 0010:[<ffffffffa03e1012>]? [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]

RIP is the CPU register containing the address of the instruction that is getting executed. 0010 comes from the code segment register. my_oops_init+0x12/0x21 is the <symbol> + the offset/length.

RIP是包含即将要被执行的指令的地址的CPU寄存器。0010来源于段寄存器的值。my_oops_init+0x12/0x21的格式是<符号>+偏移/长度。

RSP: 0018:ffff88007ad4bf08 EFLAGS: 00010292 
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
RSP: 0018:ffff88007ad4bf08 EFLAGS: 00010292 
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010

This is a dump of the contents of some of the CPU registers.

这是部分CPU寄存器的内容转储。

Stack: 
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 
0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 
ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
Stack: 
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 
0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 
ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000

The above is the stack trace.

以上是堆栈踪迹。

Call Trace: 
[<ffffffff8100205f>] do_one_initcall+0x59/0x154 
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Call Trace: 
[<ffffffff8100205f>] do_one_initcall+0x59/0x154 
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b

The above is the call trace — the list of functions being called just before the Oops occurred.

以上是调用踪迹——在Oops发生时被调用的函数列表

Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00

The Code is a hex-dump of the section of machine code that was being run at the time the Oops occurred.

Code是当Oops发生时的机器码当前节的十六进制转储。

Debugging an Oops dump

调试Oops转储

The first step is to load the offending module into the GDB debugger, as follows:

第一步就是加载有问题的模块到GDB调试器,如下所示:

[root@DELL-RnD-India oops]# gdb oops.ko 
GNU gdb (GDB) Fedora (7.1-18.fc13) 
Reading symbols from /code/oops/oops.ko...done. 
(gdb) add-symbol-file oops.o 0xffffffffa03e1000 
add symbol table from file "oops.o" at 
.text_addr = 0xffffffffa03e1000
[root@DELL-RnD-India oops]# gdb oops.ko 
GNU gdb (GDB) Fedora (7.1-18.fc13) 
Reading symbols from /code/oops/oops.ko...done. 
(gdb) add-symbol-file oops.o 0xffffffffa03e1000 
add symbol table from file "oops.o" at 
.text_addr = 0xffffffffa03e1000

Next, add the symbol file to the debugger. The add-symbol-file command’s first argument is oops.o and the second argument is the address of the text section of the module. You can obtain this address from /sys/module/oops/sections/.init.text (where oops is the module name):

下一步,将符号文件添加到调试器。add-symbol-file命令的第一个参数是oops.o,第二个参数是该模块文本区域的地址。你可以从/sys/module/oops/sections/.init.text(oops是模块的名字)这里得到这个地址。

(gdb) add-symbol-file oops.o 0xffffffffa03e1000
add symbol table from file "oops.o" at 
.text_addr = 0xffffffffa03e1000 
(y or n) y 
Reading symbols from /code/oops/oops.o...done.
(gdb) add-symbol-file oops.o 0xffffffffa03e1000
add symbol table from file "oops.o" at 
.text_addr = 0xffffffffa03e1000 
(y or n) y 
Reading symbols from /code/oops/oops.o...done.

From the RIP instruction line, we can get the name of the offending function, and disassemble it.

我们可以从RIP指令行得到有问题函数的名字,然后反汇编它。

(gdb) disassemble my_oops_init 
Dump of assembler code for function my_oops_init: 
0x0000000000000038 <+0>: push %rbp 
0x0000000000000039 <+1>: mov $0x0,%rdi 
0x0000000000000040 <+8>: xor %eax,%eax 
0x0000000000000042 <+10>: mov %rsp,%rbp 
0x0000000000000045 <+13>: callq 0x4a <my_oops_init+18> 
0x000000000000004a <+18>: movl $0x0,0x0 
0x0000000000000055 <+29>: xor %eax,%eax 
0x0000000000000057 <+31>: leaveq 
0x0000000000000058 <+32>: retq
End of assembler dump.
(gdb) disassemble my_oops_init 
Dump of assembler code for function my_oops_init: 
0x0000000000000038 <+0>: push %rbp 
0x0000000000000039 <+1>: mov $0x0,%rdi 
0x0000000000000040 <+8>: xor %eax,%eax 
0x0000000000000042 <+10>: mov %rsp,%rbp 
0x0000000000000045 <+13>: callq 0x4a <my_oops_init+18> 
0x000000000000004a <+18>: movl $0x0,0x0 
0x0000000000000055 <+29>: xor %eax,%eax 
0x0000000000000057 <+31>: leaveq 
0x0000000000000058 <+32>: retq
End of assembler dump.

Now, to pin point the actual line of offending code, we add the starting address and the offset. The offset is available in the same RIP instruction line. In our case, we are adding 0x0000000000000038 + 0x012 = 0x000000000000004a. This points to the movl instruction.

现在,为了定位真正出问题的代码行,我们加上起始地址和偏移。偏移可以从RIP指令行找到。在本例中,我们添加的是0x0000000000000038 + 0x012 = 0x000000000000004a。它指向movl指令。

(gdb) list *0x000000000000004a 
0x4a is in my_oops_init (/code/oops/oops.c:6). 
1 #include <linux/kernel.h> 
2 #include <linux/module.h> 
3 #include <linux/init.h> 
4
5 static void create_oops() { 
6 *(int *)0 = 0; 
7 }
(gdb) list *0x000000000000004a 
0x4a is in my_oops_init (/code/oops/oops.c:6). 
1 #include <linux/kernel.h> 
2 #include <linux/module.h> 
3 #include <linux/init.h> 
4
5 static void create_oops() { 
6 *(int *)0 = 0; 
7 }

This gives the code of the offending function.

上面就是有问题的函数代码。

References

参考文献

The kerneloops.org website can be used to pick up a lot of Oops messages to debug. The Linux kernel documentation directory has information about Oops — kernel/Documentation/oops-tracing.txt. This, and numerous other online resources, were used while creating this article.

kerneloops.org网站可以找到很多Oops消息以便调试。Linux内核文件目录下有关于Oops的信息——kernel/Documentation/oops-tracing.txt。在写本文时,笔者参考了这个文件,以及许多其他的在线资源。

本文英文源文http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/

posted on 2012-09-05 08:48  ~菠菜~  阅读(1165)  评论(0编辑  收藏  举报

导航