如何使用crash分析vmcore - 之基础思路case1
dmesg查看内核日志
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
[
-
crash>
通过dmesg日志,我们可以通过两个方法判断 bug的代码位置:
-
1. [2493420.219336] kernel BUG at fs/ext4/super.c:879!
-
-
2. [2493420.273425] RIP: 0010:[<ffffffffa031a8df>] [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]
-
其中(0x36f代表和ext4_put_super函数入口的偏移量,0x3c0是基准地址 )
从2找到代码crash的具体位置:
-
(gdb) p 0x36f
-
$11 = 879
反汇编函数,找到位置
crash> dis -l ext4_put_super
在crash中查看代码
crash本身是可以查看代码的,前提是你需要加载模块, 比如:
加载模块ext4:
-
crash> mod -s ext4
-
crash> mod <<----列出所有的模块
第879行:
-
crash> l *ext4_put_super+0x36f
-
0xffffffffa031a8df is in ext4_put_super (fs/ext4/super.c:879).
-
874 * isn't empty. The on-disk one can be non-empty if we've
-
875 * detected an error and taken the fs readonly, but the
-
876 * in-memory list had better be clean by this point. */
-
877 if (!list_empty(&sbi->s_orphan))
-
878 dump_orphan_list(sb, sbi);
-
879 J_ASSERT(list_empty(&sbi->s_orphan));
-
880
-
881 sync_blockdev(sb->s_bdev);
-
882 invalidate_bdev(sb->s_bdev);
-
883 if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {
只有当我们找到具体的代码,才能进一步分析代码,究竟为什么会crash,比如,这个函数的参数(可能是某个struct)的值到底是什么?
bt打印栈
bt栈[exception RIP: ext4_put_super+879]
有可以看到是在 函数ext4_put_super
的第879行
-
crash> bt
-
PID: 1 TASK: ffff887e45918000 CPU: 58 COMMAND: "systemd-shutdow"
-
#0 [ffffc90000017a58] machine_kexec at ffffffff810603e8
-
#1 [ffffc90000017ab8] __crash_kexec at ffffffff811211cd
-
#2 [ffffc90000017b80] __crash_kexec at ffffffff811212a5
-
#3 [ffffc90000017b98] crash_kexec at ffffffff811212eb
-
#4 [ffffc90000017bb8] oops_end at ffffffff81030905
-
#5 [ffffc90000017be0] die at ffffffff81030ddb
-
#6 [ffffc90000017c10] do_trap at ffffffff8102df02
-
#7 [ffffc90000017c60] do_error_trap at ffffffff8102e2d9
-
#8 [ffffc90000017d20] do_invalid_op at ffffffff8102e830
-
#9 [ffffc90000017d30] invalid_op at ffffffff8171b63e
-
[exception RIP: ext4_put_super+879]
-
RIP: ffffffffa031a8df RSP: ffffc90000017de8 RFLAGS: 00010206
-
RAX: ffff88490edabf50 RBX: ffff887e43299000 RCX: 00000001949b336d
-
RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000206
-
RBP: ffffc90000017e18 R8: 00000000000081a4 R9: 0000000000000000
-
R10: 0000000000000cb8 R11: 0000000000001e92 R12: ffff887e43299278
-
R13: ffff887e43298800 R14: ffff887e43299278 R15: ffffffffa034ff88
-
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
-
#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
-
#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
-
#12 [ffffc90000017e40] kill_block_super at ffffffff81244e37
-
#13 [ffffc90000017e60] deactivate_locked_super at ffffffff81244f73
-
#14 [ffffc90000017e80] deactivate_super at ffffffff8124547a
-
#15 [ffffc90000017e98] cleanup_mnt at ffffffff81264b2f
-
#16 [ffffc90000017eb0] __cleanup_mnt at ffffffff81264bc2
-
#17 [ffffc90000017ec0] task_work_run at ffffffff810a7b50
-
#18 [ffffc90000017f00] exit_to_usermode_loop at ffffffff810032ba
-
#19 [ffffc90000017f30] syscall_return_slowpath at ffffffff81003baa
-
#20 [ffffc90000017f50] entry_SYSCALL_64_fastpath at ffffffff8171a783
-
RIP: 00007f3241195c47 RSP: 00007fffb3db5438 RFLAGS: 00000246
-
RAX: 0000000000000000 RBX: 0000560b87fbd920 RCX: 00007f3241195c47
-
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b87fbdd10
-
RBP: 0000560b87fbda00 R8: 0000000000000000 R9: 00007f32410e416d
-
R10: 0000000000000021 R11: 0000000000000246 R12: 0000560b87fbdd10
-
R13: 00007fffb3db5538 R14: 00007fffb3db5523 R15: 0000000000000000
-
ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
-
crash>
反汇编上下函数
当我们,分析到了出错的具体的代码行,下一步需要分析,传入的参数和struct
首先,我们需要看下 函数 ext4_put_super
的原型,发现是static void ext4_put_super(struct super_block *sb)
,只有一个参数, 而且是一个结构体struct super_block
, 现在我们需要知道 *sb
指针的地址是多少呢? 那这个地址肯定是 上个函数 generic_shutdown_super
传递给它的.
现在分析的关键是,我们需要知道,当generic_shutdown_super
在 ffffffff81244aaf
处,调用到 ext4_put_super
的时候,传给 ext4_put_super
的指针地址是多少?
首先,需要 反汇编 函数generic_shutdown_super
找到地址ffffffff81244aaf
-
crash> dis -l generic_shutdown_super
-
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 436
-
0xffffffff81244aa0 <generic_shutdown_super+96>: mov 0x30(%r12),%rax
-
0xffffffff81244aa5 <generic_shutdown_super+101>: test %rax,%rax
-
0xffffffff81244aa8 <generic_shutdown_super+104>: je 0xffffffff81244aaf <generic_shutdown_super+111>
-
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 437
-
0xffffffff81244aaa <generic_shutdown_super+106>: mov %rbx,%rdi <===rbx 和 rdi 数据一致
-
0xffffffff81244aad <generic_shutdown_super+109>: callq *%rax <===在这里调用下个函数
-
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/include/linux/compiler.h: 243
-
0xffffffff81244aaf <generic_shutdown_super+111>: mov 0x608(%rbx),%rax
-
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 439
-
0xffffffff81244ab6 <generic_shutdown_super+118>: lea 0x608(%rbx),%rdx
-
0xffffffff81244abd <generic_shutdown_super+125>: cmp %rax,%rdx
-
0xffffffff81244ac0 <generic_shutdown_super+128>: jne 0xffffffff81244b1f <generic_shutdown_super+223>
接着,反汇编ext4_put_super
, 你会发现push了很多的寄存器的值到stack
-
crash> dis -l ext4_put_super
-
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 824
-
0xffffffffa031a570 <ext4_put_super>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
-
0xffffffffa031a575 <ext4_put_super+5>: push %rbp
-
0xffffffffa031a576 <ext4_put_super+6>: mov %rsp,%rbp
-
0xffffffffa031a579 <ext4_put_super+9>: push %r15 <===第1个寄存器入栈
-
0xffffffffa031a57b <ext4_put_super+11>: push %r14 <===第2个寄存器入栈
-
0xffffffffa031a57d <ext4_put_super+13>: push %r13 <===第3个寄存器入栈
-
0xffffffffa031a57f <ext4_put_super+15>: push %r12 <===第4个寄存器入栈
-
0xffffffffa031a581 <ext4_put_super+17>: mov %rdi,%r13
-
0xffffffffa031a584 <ext4_put_super+20>: push %rbx <===第5个寄存器入栈(rbx是在上个函数的时候,就有值的,所以,ext4_put_super函数的第一个参数的指针的地址就是这个寄存器的值)
-
0xffffffffa031a585 <ext4_put_super+21>: sub $0x8,%rsp
-
0xffffffffa031a589 <ext4_put_super+25>: mov 0x460(%rdi),%rbx
-
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 826
-
0xffffffffa031a590 <ext4_put_super+32>: mov 0xe0(%rbx),%r14
-
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 830
-
0xffffffffa031a597 <ext4_put_super+39>: callq 0xffffffffa03133f0 <ext4_unregister_li_request>
-
crash> bt -f
-
#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
-
ffffc90000017de8: 9cbae75a00000000( ) ffff887e43298800(第5个寄存器的值)
-
ffffc90000017df8: ffffffffa034a5e0(第4个寄存器的值) ffff887e3818c7b8(第3个寄存器的值)
-
ffffc90000017e08: 0000000000000000(第2个寄存器的值) ffff887e45918bb0(第1个寄存器的值)
-
ffffc90000017e18: ffffc90000017e38 ffffffff81244aaf(这两个是不代表寄存器的)
-
#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
-
ffffc90000017e28: 0000000000000083 ffff887e357b8680
-
ffffc90000017e38: ffffc90000017e58 ffffffff81244e37
-
crash> struct super_block ffff887e43298800
-
struct super_block {
-
s_list = {
-
next = 0xffffffff81cb3db0 <super_blocks>, <=======这里也验证了,就是地址ffff887e43298800表示的就是 struct super_block
-
prev = 0xffff887e43968800
-
},
-
s_dev = 271581185,
-
s_blocksize_bits = 12 '\f',
-
s_blocksize = 4096,
-
s_maxbytes = 17592186040320,
-
s_type = 0xffffffffa03589c0 <ext4_fs_type>,
-
s_op = 0xffffffffa034a5e0 <ext4_sops>,
-
dq_op = 0xffffffffa034a720 <ext4_quota_operations>,
-
s_qcop = 0xffffffff81843f60 <dquot_quotactl_sysfile_ops>,
-
s_export_op = 0xffffffffa034a580 <ext4_export_ops>,
-
s_flags = 805371904,
-
s_iflags = 1,
-
s_magic = 61267,
-
s_root = 0x0,
-
s_umount = {
-
count = {
-
counter = -4294967295
-
},
-
wait_list = {
-
next = 0xffff887e43298878,
-
prev = 0xffff887e43298878
-
},
-
wait_lock = {
-
raw_lock = {
-
val = {
-
counter = 0
-
}
-
}
Refers
转载于:https://www.cnblogs.com/muahao/p/9925629.html