Linux基础——BClinux8.2 排查vmcore异常宕机问题

 

一、无法/var/crash生成文件

1、参考配置:

https://cloud.tencent.cn/developer/article/2367955

 

2、BCoe8.2调整配置

 

 

 

3、手动生成crash

i.参考:参数详解

https://blog.csdn.net/tombaby_come/article/details/134038949

echo 1 > /proc/sys/kernel/sysrq

echo c > /proc/sysrq-trigger

注意:执行上述配置,主机重启,开始转储内存中数据到/var/crash目录中。

 

4、检查kdump

i.参考:kdump原理

https://zhuanlan.zhihu.com/p/684699511

 

二、crash工具和vmlinux内核一致性检查

1、检查/boot/vmlinuz-4.19.0-240.23.35.el8_2.bclinux.x86_64和/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux的md5值必需保持一致

 

2、主机内核vmlinux位置

/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux

 

3、异常宕机vmcore文件所在位置

/var/crash/127.0.0.1-2024-05-06-03\:24\:36/vmcore

 

 

 

三、分析vmcore

 

1、crash工具打开vmcore

 

[root@NewOSBC8 127.0.0.1-2024-05-06-03:24:36]# crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux

crash 7.2.7-3.el8.1
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [178MB]: patching 97096 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Mon May  6 03:24:31 2024
      UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03
       TASKS: 346
    NODENAME: NewOSBC8.2
     RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64
     VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023
     MACHINE: x86_64  (1796 Mhz)
      MEMORY: 2 GB
       PANIC: "sysrq: SysRq : Trigger a crash"
         PID: 2289
     COMMAND: "bash"
        TASK: ffff8d1122bf0000  [THREAD_INFO: ffff8d1122bf0000]
         CPU: 0
       STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 2289   TASK: ffff8d1122bf0000  CPU: 0   COMMAND: "bash"
 #0 [ffffa2ab80cefbe8] machine_kexec at ffffffff8c25fabe
 #1 [ffffa2ab80cefc40] __crash_kexec at ffffffff8c3658ba
 #2 [ffffa2ab80cefd00] crash_kexec at ffffffff8c36678d
 #3 [ffffa2ab80cefd18] oops_end at ffffffff8c2259fd
 #4 [ffffa2ab80cefd38] no_context at ffffffff8c26fd4e
 #5 [ffffa2ab80cefd90] do_page_fault at ffffffff8c270872
 #6 [ffffa2ab80cefdc0] page_fault at ffffffff8cc0122e
    [exception RIP: sysrq_handle_crash+18]
    RIP: ffffffff8c74eb12  RSP: ffffa2ab80cefe78  RFLAGS: 00010246
    RAX: ffffffff8c74eb00  RBX: 0000000000000063  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8d1131017108  RDI: 0000000000000063
    RBP: 0000000000000004   R8: 00000000000005ce   R9: 000000000000002d
    R10: 0000000000000000  R11: ffffa2ab80cefd30  R12: 0000000000000000
    R13: 0000000000000000  R14: ffffffff8d53c3e0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa2ab80cefe78] __handle_sysrq.cold.10 at ffffffff8c74f6f8
 #8 [ffffa2ab80cefea8] write_sysrq_trigger at ffffffff8c74f5bb
 #9 [ffffa2ab80cefeb8] proc_reg_write at ffffffff8c55de29
#10 [ffffa2ab80cefed0] vfs_write at ffffffff8c4e0db5
#11 [ffffa2ab80ceff00] ksys_write at ffffffff8c4e102f
#12 [ffffa2ab80ceff38] do_syscall_64 at ffffffff8c2041ab
#13 [ffffa2ab80ceff50] entry_SYSCALL_64_after_hwframe at ffffffff8cc000ad
    RIP: 00007f515c78ab28  RSP: 00007ffc1172a678  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007f515c78ab28
    RDX: 0000000000000002  RSI: 000055b65d8c05c0  RDI: 0000000000000001
    RBP: 000055b65d8c05c0   R8: 000000000000000a   R9: 00007f515c81bc80
    R10: 000000000000000a  R11: 0000000000000246  R12: 00007f515ca5b6c0
    R13: 0000000000000002  R14: 00007f515ca56880  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> dis -l sysrq_handle_crash+18
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> dis -l 0xffffffff8c74eb12
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   458790       1.8 GB         ----
         FREE   194411     759.4 MB   42% of TOTAL MEM
         USED   264379         1 GB   57% of TOTAL MEM
       SHARED    50717     198.1 MB   11% of TOTAL MEM
      BUFFERS      530       2.1 MB    0% of TOTAL MEM
       CACHED   103545     404.5 MB   22% of TOTAL MEM
         SLAB    31239       122 MB    6% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP   532479         2 GB         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE   532479         2 GB  100% of TOTAL SWAP

 COMMIT LIMIT   761874       2.9 GB         ----
    COMMITTED   511634         2 GB   67% of TOTAL LIMIT
crash> sys
      KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Mon May  6 03:24:31 2024
      UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03
       TASKS: 346
    NODENAME: NewOSBC8.2
     RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64
     VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023
     MACHINE: x86_64  (1796 Mhz)
      MEMORY: 2 GB
       PANIC: "sysrq: SysRq : Trigger a crash"
crash> p cpu_info:1
per_cpu(cpu_info, 1) = $1 = {
  x86 = 23 '\027',
  x86_vendor = 2 '\002',
  x86_model = 104 'h',
  x86_stepping = 1 '\001',
  x86_tlbsize = 3072,
  x86_virt_bits = 48 '0',
  x86_phys_bits = 45 '-',
  x86_coreid_bits = 0 '\000',
  cu_id = 255 '\377',
  extended_cpuid_level = 2147483680,
  cpuid_level = 16,
  x86_capability = {126614527, 802421759, 0, 129319184, 4277678595, 0, 4195321, 376123396, 557056, 563872169, 15, 0, 0, 17584641, 4, 0, 4194308, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 229696, 0},
  x86_vendor_id = "AuthenticAMD\000\000\000",
  x86_model_id = "AMD Ryzen 7 5700U with Radeon Graphics\000        \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  x86_cache_size = 512,
  x86_cache_alignment = 64,
  x86_cache_max_rmid = -1,
  x86_cache_occ_scale = -1,
  x86_power = 256,
  loops_per_jiffy = 1796624,
  x86_max_cores = 1,
  apicid = 2,
  initial_apicid = 2,
  x86_clflush_size = 64,
  booted_cores = 1,
  phys_proc_id = 2,
  logical_proc_id = 1,
  cpu_core_id = 0,
  cpu_index = 1,
  microcode = 0,
  x86_cache_bits = 45 '-',
  initialized = 1,
  cpuinfo_x86_extended_size_rh = 0,
  _rh = {
    cpu_die_id = 0,
    logical_die_id = 1,
    vmx_capability = {0, 0, 0}
  }
}
crash>  ps 1489
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
   1489   1382   0  ffff8d110eb20000  IN  11.9 3106588 249348  llvmpipe-1
crash>

 

crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vm                          linux

vmcore生成时间:DATE: Mon May  6 03:24:31 2024

中断原因:PANIC: "sysrq: SysRq : Trigger a crash"

 

2、查看中断寄存器地址和函数RIP

i.分析当时正在运行哪些应用调用函数sysrq_handle_crash,导致中断卡死问题;

ii.参考:

https://blog.csdn.net/weixin_43564241/article/details/130692946

 

3、查看用户层应用的调用代码

i.通过“[exception RIP: sysrq_handle_crash+18]”标黄部分查看调用代码;

 

 

4、查看宕机时内存使用情况

 

5、用户侧触发

i.手动触发了内存中数据的转储到/var/crash中。

 

posted on 2024-05-06 17:47  gkhost  阅读(338)  评论(0编辑  收藏  举报

导航