安庆

导航

suse 奇怪的crash 问题

最近遇到一个suse的crash 问题:

我没有使用 mkswap /dev/磁盘路径 来制作swap分区,我有很多剩余内存,我设置nr_swapfiles为0,可是我还是遇到了关于swap的crash。

We got more kernel crashes at swapin_readahead() on our kernel:
[  948.894273] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
[  948.894283] IP: [<ffffffff81133662>] valid_swaphandles+0x72/0x150
[  948.894292] PGD 3e4b007067 PUD 3e4b008067 PMD 0 
[  948.894296] Oops: 0000 [#1] SMP 
[  948.894301] CPU 29 
[  948.894302] Modules linked in: xfs w83627dhg(EN) af_packet tipc(EX) ossmod(EN) iptable_filter ip_tables x_tables bonding edd cpufreq_conservative cpufreq_userspace cpufreq_po
wersave acpi_cpufreq mperf binfmt_misc fuse loop dm_mod ipv6 ipv6_lib pcspkr i40e(EX) igb ses enclosure dca iTCO_wdt sg ptp pps_core i2c_i801 iTCO_vendor_support mei mptctl mptb
ase rtc_cmos button acpi_power_meter container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea ehci_hcd usbcore usb_co
mmon sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh ahci libahci libata mpt3sas(EX) configfs scsi_transport_sas raid_c
lass scsi_mod [last unloaded: witdriver]
[  948.894346] Supported: No, Unsupported modules are loaded
[  948.894348] 
[  948.894350] Pid: 31427, comm: nginx Tainted: G           ENX 3.0.101-0.47.52-default #1 ZTE Grantley/S1008
[  948.894354] RIP: 0010:[<ffffffff81133662>]  [<ffffffff81133662>] valid_swaphandles+0x72/0x150
[  948.894357] RSP: 0000:ffff883e4b021ca8  EFLAGS: 00010216
[  948.894359] RAX: 0000000000000008 RBX: 00181818182f98a0 RCX: 0000000000000003
[  948.894362] RDX: 000000000000408e RSI: 0000000000000000 RDI: ffffffff81e56ce0
[  948.894364] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000029
[  948.894366] R10: ffff883e4b011218 R11: ffff883e4dac4bc0 R12: 00181818182f9898
[  948.894368] R13: 00181818182f989a R14: 0000000000000000 R15: ffff883e4b021d30
[  948.894371] FS:  00007f1c66ca7720(0000) GS:ffff88407dda0000(0000) knlGS:0000000000000000
[  948.894373] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  948.894375] CR2: 000000000000000c CR3: 0000003e4b006000 CR4: 00000000001407e0
[  948.894377] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  948.894380] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  948.894382] Process nginx (pid: 31427, threadinfo ffff883e4b020000, task ffff883e4b01e540)
[  948.894384] Stack:
[  948.894385]  ffff883e4b021cc4 30181818182f989a 0000000000000000 00007f1ac86072fc
[  948.894391]  ffff883f89f0c038 ffff883fbe860d78 00000000000200da ffffffff81132bd6
[  948.894397]  0000000000000000 30181818182f989a 0000000000000000 0000000000000000
[  948.894401] Call Trace:
[  948.894411]  [<ffffffff81132bd6>] swapin_readahead+0x26/0xd0
[  948.894417]  [<ffffffff81122a8d>] do_swap_page+0xed/0x5f0
[  948.894422]  [<ffffffff81123ab1>] handle_pte_fault+0x1e1/0x230
[  948.894429]  [<ffffffff8146873d>] do_page_fault+0x1fd/0x4c0
[  948.894434]  [<ffffffff81465345>] page_fault+0x25/0x30
[  948.894440]  [<00007f1c65877dab>] 0x7f1c65877daa
[  948.894441] Code: ff ff ff ff 01 49 21 c5 4c 89 eb 48 d3 eb 48 d3 e3 48 85 db 4c 0f 45 e3 e8 cc 15 33 00 b8 01 00 00 00 89 e9 d3 e0 48 98 48 01 c3 
[  948.894455]  8b 46 0c 48 39 c3 48 89 c7 49 8d 45 01 48 0f 46 fb 48 39 f8 
[  948.894462] RIP  [<ffffffff81133662>] valid_swaphandles+0x72/0x150
[  948.894465]  RSP <ffff883e4b021ca8>
[  948.894467] CR2: 000000000000000c

We also collect vmcore.
crash> dis -l valid_swaphandles+0x72
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2318
0xffffffff81133662 <valid_swaphandles+114>:     mov    0xc(%r14),%eax

crash> dis -l valid_swaphandles
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2305
0xffffffff811335f0 <valid_swaphandles>: push   %r15
0xffffffff811335f2 <valid_swaphandles+2>:       mov    %rsi,%r15
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2312
0xffffffff811335f5 <valid_swaphandles+5>:       xor    %esi,%esi
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2305
0xffffffff811335f7 <valid_swaphandles+7>:       push   %r14
0xffffffff811335f9 <valid_swaphandles+9>:       push   %r13
0xffffffff811335fb <valid_swaphandles+11>:      push   %r12
0xffffffff811335fd <valid_swaphandles+13>:      push   %rbp
0xffffffff811335fe <valid_swaphandles+14>:      push   %rbx
0xffffffff811335ff <valid_swaphandles+15>:      sub    $0x8,%rsp
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2307
0xffffffff81133603 <valid_swaphandles+19>:      mov    0xd233c3(%rip),%ebp        # 0xffffffff81e569cc
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2312
0xffffffff81133609 <valid_swaphandles+25>:      test   %ebp,%ebp
0xffffffff8113360b <valid_swaphandles+27>:      je     0xffffffff81133723 <valid_swaphandles+307>
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2315
0xffffffff81133611 <valid_swaphandles+33>:      mov    %rdi,%rax
/home/chengry/linux-3.0.101-0.47.52/include/linux/swapops.h: 49
0xffffffff81133614 <valid_swaphandles+36>:      mov    %rdi,%r13
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2317
0xffffffff81133617 <valid_swaphandles+39>:      mov    %ebp,%ecx
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2315
0xffffffff81133619 <valid_swaphandles+41>:      shr    $0x39,%rax
/home/chengry/linux-3.0.101-0.47.52/include/linux/spinlock.h: 293
0xffffffff8113361d <valid_swaphandles+45>:      mov    $0xffffffff81e56ce0,%rdi
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2319
0xffffffff81133624 <valid_swaphandles+52>:      mov    $0x1,%r12d
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2315
0xffffffff8113362a <valid_swaphandles+58>:      mov    -0x7e1a9300(,%rax,8),%r14
/home/chengry/linux-3.0.101-0.47.52/include/linux/swapops.h: 49
0xffffffff81133632 <valid_swaphandles+66>:      movabs $0x1ffffffffffffff,%rax
0xffffffff8113363c <valid_swaphandles+76>:      and    %rax,%r13
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2317
0xffffffff8113363f <valid_swaphandles+79>:      mov    %r13,%rbx
0xffffffff81133642 <valid_swaphandles+82>:      shr    %cl,%rbx
0xffffffff81133645 <valid_swaphandles+85>:      shl    %cl,%rbx
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2319
0xffffffff81133648 <valid_swaphandles+88>:      test   %rbx,%rbx
0xffffffff8113364b <valid_swaphandles+91>:      cmovne %rbx,%r12
/home/chengry/linux-3.0.101-0.47.52/include/linux/spinlock.h: 293
0xffffffff8113364f <valid_swaphandles+95>:      callq  0xffffffff81464c20 <_raw_spin_lock>
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2318
0xffffffff81133654 <valid_swaphandles+100>:     mov    $0x1,%eax
0xffffffff81133659 <valid_swaphandles+105>:     mov    %ebp,%ecx
0xffffffff8113365b <valid_swaphandles+107>:     shl    %cl,%eax
0xffffffff8113365d <valid_swaphandles+109>:     cltq   
0xffffffff8113365f <valid_swaphandles+111>:     add    %rax,%rbx
0xffffffff81133662 <valid_swaphandles+114>:     mov    0xc(%r14),%eax
0xffffffff81133666 <valid_swaphandles+118>:     cmp    %rax,%rbx
0xffffffff81133669 <valid_swaphandles+121>:     mov    %rax,%rdi
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2331
0xffffffff8113366c <valid_swaphandles+124>:     lea    0x1(%r13),%rax
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2318
0xffffffff81133670 <valid_swaphandles+128>:     cmovbe %rbx,%rdi
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2331
0xffffffff81133674 <valid_swaphandles+132>:     cmp    %rdi,%rax
0xffffffff81133677 <valid_swaphandles+135>:     jae    0xffffffff81133734 <valid_swaphandles+324>
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2333
0xffffffff8113367d <valid_swaphandles+141>:     mov    0x10(%r14),%rcx
0xffffffff81133681 <valid_swaphandles+145>:     movzbl 0x1(%rcx,%r13,1),%edx
0xffffffff81133687 <valid_swaphandles+151>:     test   %dl,%dl
0xffffffff81133689 <valid_swaphandles+153>:     je     0xffffffff81133734 <valid_swaphandles+324>
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2335
0xffffffff8113368f <valid_swaphandles+159>:     and    $0xffffffbf,%edx
0xffffffff81133692 <valid_swaphandles+162>:     cmp    $0x3f,%dl
0xffffffff81133695 <valid_swaphandles+165>:     je     0xffffffff81133734 <valid_swaphandles+324>
0xffffffff8113369b <valid_swaphandles+171>:     add    %r13,%rcx
0xffffffff8113369e <valid_swaphandles+174>:     xor    %esi,%esi
0xffffffff811336a0 <valid_swaphandles+176>:     jmp    0xffffffff811336bc <valid_swaphandles+204>
0xffffffff811336a2 <valid_swaphandles+178>:     nopw   0x0(%rax,%rax,1)
/home/chengry/linux-3.0.101-0.47.52/mm/swapfile.c: 2333
0xffffffff811336a8 <valid_swaphandles+184>:     movzbl 0x2(%rcx),%edx
0xffffffff811336ac <valid_swaphandles+188>:     test   %dl,%dl
0xffffffff811336ae <valid_swaphandles+190>:     je     0xffffffff811336c8 <valid_swaphandles+216>

and the source code should be:
int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
{
    struct swap_info_struct *si;
    int our_page_cluster = page_cluster;
    pgoff_t target, toff;
    pgoff_t base, end;
    int nr_pages = 0;

    if (!our_page_cluster)    /* no readahead */
        return 0;

    si = swap_info[swp_type(entry)];
    target = swp_offset(entry);
    base = (target >> our_page_cluster) << our_page_cluster;
    end = base + (1 << our_page_cluster);
    if (!base)        /* first page is swap header */
        base++;

    spin_lock(&swap_lock);
    if (frontswap_test(si, target)) {
        spin_unlock(&swap_lock);
        return 0;
    }
    if (end > si->max)    /* don't go beyond end of map */
        end = si->max;

it means that we get  (swap_info_struct  si ) is null。

and i get the related value :
crash> p swap_info
swap_info = $11 = 
 {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
0x0}

crash>     p swap_list
swap_list = $14 = {
  head = -1, 
  next = -1
}

crash>     p nr_swapfiles
nr_swapfiles = $17 = 0

crash> p total_swap_pages
total_swap_pages = $19 = 0
crash> p least_priority
least_priority = $20 = 0

crash> p nr_swap_pages
nr_swap_pages = $7 = 0
crash> p vm_swappiness
vm_swappiness = $8 = 0
crash> p min_free_kbytes
min_free_kbytes = $9 = 2048000

crash>  kmem -i
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  65934474     251.5 GB         ----
      FREE  28076248     107.1 GB   42% of TOTAL MEM
      USED  37858226     144.4 GB   57% of TOTAL MEM
    SHARED  2194009       8.4 GB    3% of TOTAL MEM
   BUFFERS    16214      63.3 MB    0% of TOTAL MEM
    CACHED  24341721      92.9 GB   36% of TOTAL MEM
      SLAB  1415022       5.4 GB    2% of TOTAL MEM

TOTAL SWAP        0            0         ----
 SWAP USED        0            0  100% of TOTAL SWAP
 SWAP FREE        0            0    0% of TOTAL SWAP

#5 [ffff883e4b021bf0] page_fault at ffffffff81465345
    [exception RIP: valid_swaphandles+114]
    RIP: ffffffff81133662  RSP: ffff883e4b021ca8  RFLAGS: 00010216
    RAX: 0000000000000008  RBX: 00181818182f98a0  RCX: 0000000000000003
    RDX: 000000000000408e  RSI: 0000000000000000  RDI: ffffffff81e56ce0
    RBP: 0000000000000003   R8: 0000000000000000   R9: 0000000000000029
    R10: ffff883e4b011218  R11: ffff883e4dac4bc0  R12: 00181818182f9898
    R13: 00181818182f989a  R14: 0000000000000000  R15: ffff883e4b021d30
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
    ffff883e4b021bf8: ffff883e4b021d30 0000000000000000 
    ffff883e4b021c08: 00181818182f989a 00181818182f9898 
    ffff883e4b021c18: 0000000000000003 00181818182f98a0 
    ffff883e4b021c28: ffff883e4dac4bc0 ffff883e4b011218 
    ffff883e4b021c38: 0000000000000029 0000000000000000 
    ffff883e4b021c48: 0000000000000008 0000000000000003 
    ffff883e4b021c58: 000000000000408e 0000000000000000 
    ffff883e4b021c68: ffffffff81e56ce0 ffffffffffffffff 
    ffff883e4b021c78: ffffffff81133662 0000000000000010 
    ffff883e4b021c88: 0000000000010216 ffff883e4b021ca8 
    ffff883e4b021c98: 0000000000000000 ffffffff81133654 
    ffff883e4b021ca8: ffff883e4b021cc4 30181818182f989a 
    ffff883e4b021cb8: 0000000000000000 00007f1ac86072fc 
    ffff883e4b021cc8: ffff883f89f0c038 ffff883fbe860d78 
    ffff883e4b021cd8: 00000000000200da ffffffff81132bd6 
 #6 [ffff883e4b021ce0] swapin_readahead at ffffffff81132bd6
    ffff883e4b021ce8: 0000000000000000 30181818182f989a 
    ffff883e4b021cf8: 0000000000000000 0000000000000000 
    ffff883e4b021d08: 0000000000000000 ffffffff81257735 
    ffff883e4b021d18: 0000000000000000 ffffffff8146508e 
    ffff883e4b021d28: ffff88407ddb3a20 ffff881fcf625800 
    ffff883e4b021d38: ffffffff81103809 30181818182f989a 
    ffff883e4b021d48: 0000000000000000 0000000000000000 
    ffff883e4b021d58: ffff883f89f0c038 0000000000000029 
    ffff883e4b021d68: ffff883e4dac4bc0 ffffffff81122a8d 
 #7 [ffff883e4b021d70] do_swap_page at ffffffff81122a8d
    ffff883e4b021d78: ffff883e4dac4bc0 ffff883e4b011218 
    ffff883e4b021d88: ffff883e4b011218 00007f1ac86072fc 
    ffff883e4b021d98: ffff883fbe860d78 ffff883f89f0c038 
    ffff883e4b021da8: ffffea00de62cab0 ffff883fbe860d78 
    ffff883e4b021db8: ffffea00de62cab0 303030305f313430 
    ffff883e4b021dc8: 0000000000000000 ffff883fbe860d78 
    ffff883e4b021dd8: ffff883f89f0c038 0000000000000029 
    ffff883e4b021de8: 00007f1ac86072fc ffffffff81123ab1 
 #8 [ffff883e4b021df0] handle_pte_fault at ffffffff81123ab1
    ffff883e4b021df8: 303030305f313430 ffffffff81123b52 --303030305f313430  orig_pte should be 
    ffff883e4b021e08: 000000014b011218 ffff880000000358 
    ffff883e4b021e18: 00000029de62caa0 ffff883fbe860d78 
    ffff883e4b021e28: 00007f1ac86072fc 0000000000000006 
    ffff883e4b021e38: ffff883e4b021f58 0000000000000029 
    ffff883e4b021e48: ffff883e4dac4bc0 ffffffff8146873d 
 #9 [ffff883e4b021e50] do_page_fault at ffffffff8146873d
    ffff883e4b021e58: ffff883e4dac4c20 ffff883e4b01e540 
    ffff883e4b021e68: ffff883e4b021fd8 ffff883e4b021fd8 
    ffff883e4b021e78: 0000000000010900 ffff883e4b01e540 
    ffff883e4b021e88: ffff883e4dd18480 ffff883e4b01e540 
    ffff883e4b021e98: ffffffff81055e30 dead000000100100 
    ffff883e4b021ea8: dead000000200200 ffffffff8146da6e 
    ffff883e4b021eb8: ffff883e4b021f70 ffffffff8146508e 
    ffff883e4b021ec8: 00000000000006fe ffffffff8146da6e 
    ffff883e4b021ed8: ffff883e4dac4bc0 ffff883e4b011218 
    ffff883e4b021ee8: 0000000000000029 0000000000000001 
    ffff883e4b021ef8: 0000000000000000 0000000400000063 
    ffff883e4b021f08: 0000000000000001 00007f1ac86072fc

we have a lots of
free memory,and i set the nr_swapfiles be zero when the machine start,although i know that nr_swapfiles be 0 is not mean we close the swap ,but i still be refused why the machine should be swap now ? and why i get NULL pointer? i didn't set the swap partition enable。

 

posted on 2018-02-13 10:19  _备忘录  阅读(781)  评论(0编辑  收藏  举报