内核驱动系列--内核调试方法

本文主要参考LDD3 第四章及linux设备驱动开发详解(宋宝华)的第22章

调试嵌入式内核的方法如下:

一、 首先要在编译内核时打开许多调试选项,这样在驱动程序出错时就会打印出尽可能多的调试信息。

二、 通过printk, oops, strace, /proc,等进行观察调试(最常用)

三、 目标机插桩,如打上KGDB补丁,利用gdb, kgdb工具在主机端调试目标机。

四、 利用仿真器,仿真器可以直接连接目标机的JTAG/BDM,这样主机的GDB就可以通过与仿真器的通信调试目标机。

下面逐项简单介绍一下:

1 内核选项(这部分直接从ELDD上面copy过来)

Several options exist under Kernel hacking in the kernel configuration menu that can emit valuable debug information. If you enable an option, corresponding debug code gets compiled when you build the kernel.Here are a few examples:

1.
Show Timing information on printks (CONFIG_PRINTK_TIME) adds timing instrumentation to printk() output, so you can use printks as checkpoints for measuring execution times and identifying slow code regions.

2.
Using freed memory results in memory poisoning. Debug slab memory allocations (CONFIG_DEBUG_SLAB) helps you detect such problems.

3.
Spinlock and rw-lock debugging: basic checks (CONFIG_DEBUG_SPINLOCK) finds lock-related problems such as uninitialized spinlock usage and helps catch code that is not SMP-safe.

4.
You have already worked with Magic SysRq key(CONFIG_MAGIC_SYSRQ) when you learned to use kdump. If you turn this on, you will have some avenues left even if the kernel crashes during debugging. For example, pressing Alt-Sysrq-t produces a dump of current tasks, whereas Alt-Sysrq-p prints the contents of processor registers.

5.
Detect Soft Lockups (CONFIG_DETECT_SOFTLOCKUP) utilizes the services of a watchdog to detect tight loops in kernel code that last for more than 10 seconds. We looked at this when we analyzed a kernel hang using kdump. Note that hardware lockups cannot be found this way. For that, use the services of a Non-Maskable Interrupt (NMI)-watchdog if your platform supports it.

6.
If you enable CONFIG_DEBUG_SLAB, CONFIG_DEBUG_HIMEM, or CONFIG_DEBUG_PAGE_ALLOC while configuring your kernel, additional error-checking code gets compiled that help debug problems related to memory management.

7.
Check for stack overflows (CONFIG_DEBUG_STACKOVERFLOW) adds code to emit warnings if the available stack space falls below a threshold. Stack utilization instrumentation (CONFIG_DEBUG_STACK_USAGE) adds stack space instrumentation to the magic Sysrq key output. Another related option, CONFIG_4KSTACKS, lets you set the kernel stack size to 4KB rather than 8KB.

   
8.
Verbose BUG() reporting (CONFIG_DEBUG_BUGVERBOSE) produces extra debug information when any kernel code invokes BUG(), assuming that you have CONFIG_BUG turned on during kernel configuration.

Some debug options live outside the Kernel hacking submenu, too. For example, we enabled CONFIG_KALLSYMS in this chapter to debug an "oops" message using gdb and to kprobe a kernel module. This option lives under General setup Configure Standard Kernel Features (for small systems) in the configuration menu.

Kernel hacking options result in overhead and increased footprint, so don't leave them on in production-level kernels.

2 关于printk & strace & Oops

  1) printk 

  printk会将内核信息输出到内核信息缓冲区中,内核消息缓冲区是一个ring buffer, 因此如果塞入消息过多,就会把先前的消息覆盖掉。可以通过cat /proc/kmsg命令来显示内核信息,或者通过dmesg来读取ring buffer中的信息。

  printk定义了8个消息级别,为0-7,数字越小越紧急。通过echo >8 /proc/sys/kernel/printk可以让内核的任何printk都被输出(字符界面环境下)。

  实际驱动中经常用封装了printk的宏,如pr_debug, dev_debug, dev_dbg, dev_err, dev_info等宏。

  2) Oops

  当内核出现Segmentation Fault时, Oops 会被打印在控制台,并被写入ring buffer.

  在Oops信息中,第一行给出原因, 另外比较重要的一行是给出pc寄存器(x86体系是eip)的那一行,因为这一行给出了事发现场,通过objdump工具,找到相应的偏移,就可以直接找到对应的汇编代码行。后面的backstrace信息也比较有用。

  3)strace

  strace 也是一个比较神奇的工具,以前写应用程序的时候就开始感叹这个工具之神奇,它能够跟踪一个程序或者进程所进行的系统调用,格式是"左边=右边",左边是函数调用,右边是函数返回值。虽然无法直接追踪到设备驱动中的函数,但是可以帮助工程师推演出错的地方,对应用开发者简直就是无价之宝了。

  4)proc

  尽管内核的那些牛人不鼓励驱动工程师向proc添加节点输出信息,因为proc文件系统有被滥用的嫌疑。但是对于调试来说,proc仍然是用户空间与内核空间进行交互的不错选择。

  对应的一组函数操作:

  create_proc_entry, create_proc_read_entry, proc_mkdir, read_proc_t write_proc_t, remove_proc_entry

  ldd3和宋书上都有简单的例子,这里就不贴出来了。

3 利用调试器

  --------以前俺也没用过,休息一下,以后补上----------

posted @ 2011-10-28 20:10  jialejiahi  阅读(3233)  评论(0编辑  收藏  举报