Linux OOM Killer造成数据库访问异常排查

服务器上的服务器访问异常,查看/va/log/messages发现如下:

Sep 22 16:08:21 safeserver kernel: java invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Sep 22 16:08:21 safeserver kernel: java cpuset=/ mems_allowed=0
Sep 22 16:08:21 safeserver kernel: Pid: 14859, comm: java Not tainted 2.6.32-754.30.2.el6.x86_64 #1

OOM Killer机制是怎样?又如何设置防止此种情况发生?Linux内存如何排查?

首先看内存:
$ free
                                  total           used                 free    shared    buffers    cached
Mem:                         4040360    4012200       28160         0     176628   3571348
-/+ buffers/cache:                        264224     3776136
Swap:                         0                         0                 0

注意要看红色的部分,上面的哪个free 28160不是真正的free,有如下说明:
In this example the total amount of available memory is 4040360 KB. 264224 KB are used by processes and 3776136 KB are free for other applications. Do not get confused by the first line which shows that 28160KB are free! If you look at the usage figures you can see that most of the memory use is for buffers and cache. Linux always tries to use RAM to speed up disk operations by using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices). This helps the system to run faster because disk information is already in memory which saves I/O operations. If space is needed by programs or applications like Oracle, then Linux will free up the buffers and cache to yield memory for the applications. If your system runs for a while you will usually see a small number under the field "free" on the first line.
--from redhat

发现服务器没有设置Swap导致OOM killer频繁发生。

那又如何查看swap设置呢?

检查是否启用swap:
cat /proc/swaps
grep Swap /proc/meminfo
swapon -s
free -m
vmstat

Swap到底该设置多大呢?

https://access.redhat.com/solutions/15244

redhat 6,7一般推荐和内存一致(4~8G),具体参考上面链接。

启用swap:

swap:可以用逻辑卷或者文件方式。下面是采用文件方式。

[root@safedemo bin]# dd if=/dev/zero of=/swapfile bs=1G count=4
4+0 records in
4+0 records out
4294967296 bytes (4.3 GB) copied, 37.4051 s, 115 MB/s
[root@safedemo bin]# chmod 600 /swapfile
[root@safedemo bin]# mkswap /swapfile
mkswap: /swapfile: warning: don't erase bootbits sectors
        on whole disk. Use -f to force.
Setting up swapspace version 1, size = 4194300 KiB
no label, UUID=96e8b638-b36c-4660-8667-5654a92dc520
[root@safedemo bin]# swapon /swapfile
[root@safedemo bin]# vi /etc/fstab
/swapfile    swap    swap   defaults 0 0

做了一个例子来重现OOM killer

import java.util.Scanner;

public class OOMTest {

    private static Scanner scanner = new Scanner(System.in);

    public static void main(String[] args) {
        java.util.List<int[]> l = new java.util.ArrayList();
        
        try {
            for (int i = 0; i < 1000; i++) {
                System.out.println("Please press any text to allocate ~100M memory:");
                String input = scanner.nextLine();
                System.out.println("new memory(~100M)");
                l.add(new int[26107200]);
            }
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

}

运行:
[root@safedemo bin]# java -Xmx2g OOMTest
Picked up JAVA_TOOL_OPTIONS: -Dhttps.protocols=TLSv1.2
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Killed <-它自己触发系统oom killer,结果把自己杀死了。


//check /var/log/messages.
Sep 22 16:08:21 safeserver kernel: java invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Sep 22 16:08:21 safeserver kernel: java cpuset=/ mems_allowed=0
Sep 22 16:08:21 safeserver kernel: Pid: 14859, comm: java Not tainted 2.6.32-754.30.2.el6.x86_64 #1
//14859就是引发oom killer的进程(上面的OOMTest)
....
Sep 22 16:08:21 safeserver kernel: Out of memory: Kill process 14857 (java) score 142 or sacrifice child
Sep 22 16:08:21 safeserver kernel: Killed process 14857, UID 0, (java) total-vm:3191104kB, anon-rss:676096kB, file-rss:68kB

OOM能不能禁用?
//Disable OOM killer  in redhat
Red Hat Enteprise Linux 5, 6 and 7 do not have the ability to completely disable OOM-KILLER. Please see the following section for tuning OOM-KILLER operation within RHEL 5, RHEL 6 and RHEL 7.

答案是不完全能够禁用。


可以通过调整某个进程的score来避免oom killer
There is also a special value of -17, which disables oom_killer for that process. In the example below, oom_score returns a value of O,indicating that this process would not be killed.
Raw

    # cat /proc/12465/oom_score
    78           
    # echo -17 > /proc/12465/oom_adj           
    # cat /proc/12465/oom_score
    0


也可以通过调整overcommit_memory来调整

,如果设置为2,内存不够时会报错,达到间接控制oom killer的目的(官方文档提到某些情况下也会trigger oom killer)
The /etc/sysctl.conf file consists
vm.overcommit_memory = 2
vm.overcommit_ratio = 100






over

posted on 2020-09-23 12:49  bjfarmer  阅读(639)  评论(0编辑  收藏  举报