RHCA rh442 008 oom 故障触发器 内存溢出 swap numa tmpfs shm
OOM out-of-memory killer
进程被莫名其妙杀死
所有内存加swap为活动
zone_normal 没空间 (目前64G时代可以不考虑这个)
溢出就死机
内存溢出前,去杀死进程
因此需要oom
[root@servera tuned]# pidof vsftpd
828
[root@servera tuned]# cd /proc/828/
[root@servera 828]# ls
attr cpuset limits net personality smaps_rollup timerslack_ns
autogroup cwd loginuid ns projid_map stack uid_map
auxv environ map_files numa_maps root stat wchan
cgroup exe maps oom_adj sched statm
clear_refs fd mem oom_score schedstat status
cmdline fdinfo mountinfo oom_score_adj sessionid syscall
comm gid_map mounts pagemap setgroups task
coredump_filter io mountstats patch_state smaps timers
[root@servera 828]# cat oom_score
0
[root@servera 828]# cat oom_adj
0
[root@servera 828]# cat oom_score_adj
0
[root@servera 828]#
针对于每个进程,系统会分配一个oom值,默认为0,取值范围 -1000 ~ 1000,数字越小,被杀的概率就越小
[root@servera 828]# echo -1000 > oom_score_adj
[root@servera 828]# cat oom_score_adj
-1000
[root@servera 828]# cat oom_adj
-17
oom_adj随着oom_score_adj 变化 (两个调的东西都是一样,他们互相影响变化。)
/proc/PID/oom_score_adj -1000~1000
/proc/PID/oom_score -17~-15 表示进程被杀概率 2^-17 --- 2^15
先比小数字,然后再比大数字
默认都是平等的,内存溢出,先杀谁。都平等
yum -y install kernel-doc
[root@servera Documentation]# ls -lR | grep sysrq
-r--r--r--. 1 root root 12491 Mar 13 2019 sysrq.rst
-r--r--r--. 1 root root 35958 Mar 13 2019 sysrq.html
-r--r--r--. 1 root root 12491 Mar 13 2019 sysrq.rst.txt
[root@servera Documentation]# pwd
/usr/share/doc/kernel-doc-4.18.0/Documentation
[root@servera Documentation]#
[root@servera admin-guide]# vi sysrq.rst
[root@servera admin-guide]# pwd
/usr/share/doc/kernel-doc-4.18.0/Documentation/admin-guide
[root@servera admin-guide]#
模拟系统死机,崩溃,溢出
sysrq.rst
正常关机,系统会执行一遍sync
[root@servera admin-guide]# echo f > /proc/sysrq-trigger
[root@servera admin-guide]# systemctl status vsftpd.service
● vsftpd.service - Vsftpd ftp daemon
Loaded: loaded (/usr/lib/systemd/system/vsftpd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/vsftpd.service.d
└─01-limits.conf
Active: failed (Result: signal) since Mon 2022-07-04 19:21:43 CST; 2min 22s ago
Process: 827 ExecStart=/usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf (code=exited, status=0/SUCCESS)
Main PID: 828 (code=killed, signal=KILL)
Jul 4 19:21:43 servera kernel: Out of memory: Kill process 828 (vsftpd) score 999 or sacrifice child
Jul 4 19:21:43 servera kernel: Killed process 828 (vsftpd) total-vm:26952kB, anon-rss:28kB, file-rss:0kB, shmem-rss:0kB
Jul 4 19:21:43 servera kernel: oom_reaper: reaped process 828 (vsftpd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
查看日志
sysrq.rst 模拟各种系统故障
系统认为你是内存消耗的大户,把你杀了。
不想被杀死,那你就把那个值调大一点 oom_score_adj
Out of memory: Kill process 785 (firewalld) score 2 or sacrifice child
Jul 4 19:31:42 servera kernel: Killed process 785 (firewalld) total-vm:454412kB, anon-rss:112kB, file-rss:76kB, shmem-rss:0kB
Jul 4 19:31:42 servera kernel: oom_reaper: reaped process 785 (firewalld), now anon-rss:0kB, file-rss:8kB, shmem-rss:0kB
Jul 4 19:31:42 servera systemd[1]: firewalld.service: Main process exited, code=killed, status=9/KILL
我把vsftpd调到-1000,那么系统不杀vsftpd。转而杀死防火墙
/proc/sysrq-trigger 这是很有用的触发器
启动服务就可以 设置oom_score_adj 设置被内存杀手杀死的权值
写在tuned脚本,开机生效,或者手动profile生效
[root@servera supermao12]# tail -n 4 tuned.conf
[my_script2]
type=script
script=vsftpd.sh
[root@servera supermao12]# cat vsftpd.sh
#!/bin/bash
echo 909 > /proc/$(pidof vsftpd)/oom_score_adj
第二种方法,启动服务时生效
[root@servera system]# cat vsftpd.service
Unit]
Description=Vsftpd ftp daemon
After=network.target
[Service]
Type=forking
ExecStart=/usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf
ExecStartPost=/usr/local/bin/vsftpd.sh
[Install]
WantedBy=multi-user.target
[root@servera system]# cat /usr/local/bin/vsftpd.sh
#!/bin/bash
echo 1000 > /proc/$(pidof vsftpd)/oom_score_adj
内存溢出:所有内存用光
内存泄漏:当运行一个应用程序,可能需要100MB内存,但关闭改应用程序时,内存不能被完全释放
windows 内存释放的不行,必须关机。
linux 做的好一些三年,不重启,依然保证性能
yum -y install valgrind
[root@servera ~]# valgrind --tool=memcheck ls
==2739== Memcheck, a memory error detector
==2739== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2739== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==2739== Command: ls
==2739==
anaconda-ks.cfg bigmem cache-lab-8.0 cache-lab-8.0.tar.gz oom_score_adj~ original-ks.cfg sar1.data sar-disk.log
==2739==
==2739== HEAP SUMMARY:
==2739== in use at exit: 22,016 bytes in 15 blocks
==2739== total heap usage: 50 allocs, 35 frees, 59,971 bytes allocated
==2739==
==2739== LEAK SUMMARY:
==2739== definitely lost: 0 bytes in 0 blocks
==2739== indirectly lost: 0 bytes in 0 blocks
==2739== possibly lost: 0 bytes in 0 blocks
==2739== still reachable: 22,016 bytes in 15 blocks
==2739== suppressed: 0 bytes in 0 blocks
==2739== Rerun with --leak-check=full to see details of leaked memory
==2739==
==2739== For counts of detected and suppressed errors, rerun with: -v
==2739== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[root@servera ~]#
如果ls每次都泄露一点内存,那就完蛋了,好在lack为0
[root@servera ~]# cp /usr/local/bin/bigmem .
[root@servera ~]# valgrind --tool=memcheck ./bigmem 512M
==2863== Memcheck, a memory error detector
==2863== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2863== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==2863== Command: ./bigmem 512M
==2863==
Process PID: 2863
Allocating 512 MiB of resident memory (in 1 MiB chunks)...
Done
Press <Enter> to exit
==2863==
==2863== HEAP SUMMARY:
==2863== in use at exit: 536,870,912 bytes in 512 blocks
==2863== total heap usage: 514 allocs, 2 frees, 536,872,960 bytes allocated
==2863==
==2863== LEAK SUMMARY:
==2863== definitely lost: 531,628,032 bytes in 507 blocks
==2863== indirectly lost: 0 bytes in 0 blocks
==2863== possibly lost: 5,242,880 bytes in 5 blocks
==2863== still reachable: 0 bytes in 0 blocks
==2863== suppressed: 0 bytes in 0 blocks
==2863== Rerun with --leak-check=full to see details of leaked memory
==2863==
==2863== For counts of detected and suppressed errors, rerun with: -v
==2863== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[root@servera ~]#
definitely lost: 531,628,032 bytes in 507 blocks
bigmem得执行第一条copy到其他文件夹下
重启系统,就可以释放
windows两年不重启,就很慢
oom-kill is not a fix for memory leaks
内存泄漏,kill机制无法解决 (kill保护系统自救而已,内存不够还泄漏,kill就不行了)
应用程序设计有问题
swap
当内存不够用的时候,会把一部分资源放到swap里去
swap空间最好放在固态里
swap最大才合适
用分区不用文件
[root@servera system]# mkswap /dev/vdd1
Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=9a5b7ced-ec93-4e77-a54d-4706b59afef0
[root@servera system]# mkswap /dev/vdc1
Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=f4791546-5242-4ca1-be7a-604f928a5985
[root@servera system]# vi /etc/fstab
[root@servera system]# free -m
total used free shared buff/cache available
Mem: 1829 188 1478 16 161 1482
Swap: 0 0 0
[root@servera system]# vi /etc/fstab
UUID=f4791546-5242-4ca1-be7a-604f928a5985 swap swap default 0 0
UUID=9a5b7ced-ec93-4e77-a54d-4706b59afef0 swap swap default 0 0
[root@servera system]# swapon -a
[root@servera system]# free -m
total used free shared buff/cache available
Mem: 1829 189 1477 16 161 1481
Swap: 2047 0 2047
[root@servera system]# swapon -s
Filename Type Size Used Priority
/dev/vdc1 partition 1048572 0 -2
/dev/vdd1 partition 1048572 0 -3
[root@servera system]#
数字越大优秀越高
UUID=f4791546-5242-4ca1-be7a-604f928a5985 swap swap default,pri=1 0 0
UUID=9a5b7ced-ec93-4e77-a54d-4706b59afef0 swap swap default,pri=2 0 0
[root@servera system]# swapon -s
Filename Type Size Used Priority
/dev/vdc1 partition 1048572 0 1
/dev/vdd1 partition 1048572 0 2
当我bigmem2000M时,swap优先级为2的开始运行,1的没用
[root@servera ~]# watch -n 1 'swapon -s'
Filename Type Size Used Priority
/dev/vdc1 partition 1048572 0 1
/dev/vdd1 partition 1048572 532480 2
[root@servera ~]# watch -n 1 'swapon -s'
Filename Type Size Used Priority
/dev/vdc1 partition 1048572 0 1
/dev/vdd1 partition 1048572 532480 1
Filename Type Size Used Priority
/dev/vdc1 partition 1048572 250632 1
/dev/vdd1 partition 1048572 250944 1
均衡的使用,减少每个硬盘的压力
没swap要不申请不了2000 要不kill杀进程 要不死机
[root@servera system]# swapoff -a 这个命令很慢是因为,他得把swap的东西放回内存
kill也会杀死内存消耗很高的家伙
256G的内存一旦紧张,那么2Gswap远远不够。瞬间很高时,把暂时不用的放到swap里来
create up to 32 swap
最多32个swap设备。(可以实现均衡 但32有些夸张)
NUMA
非一致性内存访问
正常是通过前端总线来交换,前端总线是瓶颈
[root@servera system]# yum -y install numactl
[root@servera system]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1
node 0 size: 1829 MB
node 0 free: 1400 MB
node distances:
node 0
0: 10
[root@workstation ~]# lab memtuning-numa start
[root@servera grub2]# numastat -c
Per-node numastat info (in MBs):
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ -----
Numa_Hit 929 1429 27 28 2413
Numa_Miss 185 0 1873 1216 3273
Numa_Foreign 0 3273 0 0 3273
Interleave_Hit 26 26 26 26 104
Local_Node 905 1425 1 1 2333
Other_Node 209 4 1899 1242 3354
[root@servera ~]# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1
node 0 size: 478 MB
node 0 free: 305 MB
node 1 cpus: 0 1
node 1 size: 343 MB
node 1 free: 118 MB
node 2 cpus: 0 1
node 2 size: 503 MB
node 2 free: 499 MB
node 3 cpus: 0 1
node 3 size: 502 MB
node 3 free: 497 MB
node distances:
node 0 1 2 3
0: 10 10 10 10
1: 10 10 10 10
2: 10 10 10 10
3: 10 10 10 10
numa 他们之间的距离不可能都是10
这个是假的
真的,可以通过这个来判断numa之间的距离
红帽的lab命令改了grub,生成了grub2文件。并重启
[root@foundation0 boot]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=UUID=c3393fce-c67b-4d64-962c-bfc6f9f0fbfa rhgb quiet rd.shell=0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
[root@servera ~]# bigmem 1204M
[root@servera grub2]# numastat -c bigmem
Per-node process memory usage (in MBs) for PID 1638 (bigmem)
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ -----
Huge 0 0 0 0 0
Heap 0 0 0 0 0
Stack 0 0 0 0 0
Private 182 96 467 465 1210
------- ------ ------ ------ ------ -----
Total 182 97 467 465 1210
numa生效了,这个内存在四个numa里
[root@servera grub2]# numactl --membind 1 -- bigmem 128M
[root@servera ~]# numastat -c bigmem
Per-node process memory usage (in MBs) for PID 1686 (bigmem)
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ -----
Huge 0 0 0 0 0
Heap 0 0 0 0 0
Stack 0 0 0 0 0
Private 0 130 0 0 130
------- ------ ------ ------ ------ -----
Total 0 130 0 0 130
[root@servera ~]#
[root@servera grub2]# numactl --membind 1,2 -- bigmem 128M
指定numa在1,3里运行
1500多兆的内存分给4个numa
每个numa没多大
你非要分500在一个numanode,那么会失败
倾向于在numanode1
[root@servera grub2]# numactl --preferred=1 -- bigmem 210M
Process PID: 1746
Allocating 210 MiB of resident memory (in 1 MiB chunks)...
[root@servera ~]# numastat -c bigmem
Per-node process memory usage (in MBs) for PID 1746 (bigmem)
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ -----
Huge 0 0 0 0 0
Heap 0 0 0 0 0
Stack 0 0 0 0 0
Private 0 141 72 0 212
------- ------ ------ ------ ------ -----
Total 0 141 72 0 212
星型结构,numa之间距离相等
numactl --cpunodebind=2 --prefered=2 -- mygrogram
cpu绑定2 内存倾向使用2
numactl --cpunodebind=2 --membind=2,3 -- mygrogram
cpu绑定2,内存使用2,3
numactl --interleave all -- mydatabase
numaoff 使用所有numa资源
numa结构自动调整
选中这个功能,虚拟机运行时,运行在一个numanode里
tmpfs
[root@servera shm]# dd if=/dev/zero of=test1 bs=1M count=512 oflag=direct
dd: failed to open 'test1': Invalid argument
[root@servera shm]# dd if=/dev/zero of=test1 bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.468107 s, 1.1 GB/s
[root@servera shm]# ls
test1
[root@servera shm]#
[root@servera grub2]# free -m
total used free shared buff/cache available
Mem: 1828 183 1007 528 636 927
Swap: 0 0 0
[root@servera grub2]#
缓存并大
且sysctl vm.drop_caches=3 清理不掉缓存
tmpfs虚拟内存文件系统
访问速度更快
把内存当做磁盘来使用
提升速度很快
cache服务器
第一次去internet里读,随后存到缓存
之后就一直去缓存里读
只加速,可靠性不高,但是它只是做缓存呢,丢缓存不要紧
[root@servera shm]# yum -y install squid
[root@servera shm]# vim /etc/squid/squid.conf
:65
#cache_dir ufs /var/spool/squid 100 16 256
100M 16个一级目录 256个二级目录
[root@servera shm]# mount --bind /dev/shm/ /var/spool/squid/
[root@servera shm]# cd /var/spool/squid/
[root@servera squid]# ls
test1
/etc/fstab
/dev/shn /var/spool/squid none default,bind 0 0
将一个已被挂载目录挂载给另一个目录
[root@servera ~]# mount --bind /dev/shm/ /var/spool/squid/
[root@servera ~]# cd /var/spool/squid/
[root@servera squid]# ls
[root@servera squid]# ls
[root@servera squid]# systemctl start squid.service
[root@servera squid]# ls
squid-cf__metadata.shm squid-cf__queues.shm squid-cf__readers.shm
重要
mkdir /data /redhat
mount --bind /data /redhat
有时候就是需要通过cp拷贝,这个就可以通过挂载
tmpfs默认为内存的一半
如何改变呢
nodev /dev/shm tmpfs defaults,size=1400M 0 0
[root@servera data]# mount -o remount /dev/shm/
[root@servera data]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 891M 0 891M 0% /dev
tmpfs tmpfs 1.4G 12K 1.4G 1% /dev/shm
cgroup与tmpfs都是内存中机制
shm
shm: shared memory
[root@servera data]# free -m
total used free shared buff/cache available
Mem: 1828 199 1395 16 232 1369
Swap: 2047 0 2047
shared共享的内存
多个进程用同一个内存
进程间通讯
[root@servera data]# sysctl -a | grep shm
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399 整个系统只能设置多少内存用于共享内存 单位:页 shmmax * shmmni < shmall
kernel.shmmax = 18446744073692774399 每个内存段最最多申请多少内存 单位:字节 一个进程最多申请多少内存
kernel.shmmni = 4096 系统中最多允许的共享内存段的数据 最多允许多少个进程来申请共享内存 单位:个数
vm.hugetlb_shm_group = 0
一个进程为一个内存段
给数据库用,正常的应用也会使用
man proc
/shmall
/proc/sys/kernel/shmall (since Linux 2.2)
This file contains the system-wide limit on the total number of pages of System V shared memory.
多个进程运行同一个资源
shm案例
一个机器运行了500多天
服务器内存只有32G 用了26G 6G可用
swap32G swap也用了18G
DB2数据库用了40G内存
pmap 11345
哪个文件占用了太多内存
shm 共享内存
共享内存给到oracle,系统中只要好资源的都可以用
[root@servera data]# bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
16777216*4096/1024/1024/1024
64
设置高了shmall oracle跑起来就会消耗很多资源,不用oracle,它也会一直持续消耗资源
多少内存设置为共享内存给oracle来用?
共享内存16G给oracle
正常内存16G给其他应用
降低资源调整shmall
** vi 忽略大小写 set ignorecase**
man systemd.service