Linux 进程管理与负载
1.进程管理
1.1 进程
- 进程:process 运行起来的命令或代码或程序. 进程本质运行在内存中.
cpu:处理
内存:软件
磁盘:永久存放数据. - 守护进程(服务):持续运行的进程.
1.2 僵尸进程 ⭐️⭐️⭐️⭐️⭐️
1.2.1 概述
异常进程
僵尸进程:进程因为一些原因脱离系统控制,但是进程又没有正常的退出(结束),进程运行中但是不受控制.占用系统的资源(内存资源,cpu资源).
僵尸进程要及时排查与处理,否则僵尸进程增多会导致系统大量资源被占用,系统负载高.
1.2.2 处理
# 一般命令kill,甚至kill -9强制结束进程,一般都是失效的.
# 解决方案:结束僵尸进程的上级进程
🅰 如果僵尸进程的上级进程是主进程,则只能重启Linux.
🅱 如果僵尸进程的上级进程不是主进程(pid1),则通过kill命令结束即可
1.2.3 查看
查看系统中是否有僵尸进程
[root@Kylin-V10-sp3 ~/app/packages]# top
top - 08:43:01 up 2 days, 9:30, 3 users, load average: 0.00, 0.00, 0.00
Tasks: 160 total, 1 running, 159 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 948.2 total, 149.4 free, 246.3 used, 552.5 buff/cache
MiB Swap: 2156.0 total, 2106.5 free, 49.5 used. 524.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 106680 7336 4696 S 0.0 0.8 0:14.10 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
过滤出系统僵尸进程
# ps aux 结果里状态信息Z表示僵尸进程
[root@Kylin-V10-sp3 ~/app/packages]# ps aux | grep Z
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 24185 0.0 0.0 213156 892 pts/2 R+ 08:44 0:00 grep Z
[root@Kylin-V10-sp3 ~/app/packages]#
1.2.4 模拟僵尸进程
模拟
# 通过gcc zombie.c -o zombie #-o生成指定的二进制文件
[root@Kylin-V10-sp3 ~/app/packages]# gcc zombie.c -o zombie
[root@Kylin-V10-sp3 ~/app/packages]# ll
总用量 2520
-rwxr-xr-x 1 root root 17104 8月 31 08:48 zombie
-rw-r--r-- 1 root root 591 9月 3 2024 zombie.c
[root@Kylin-V10-sp3 ~/app/packages]#
# 运行./zombie
[root@Kylin-V10-sp3 ~/app/packages]# ./zombie
I am parent,24237
sleep....
I am child,24238
Child exits
^C
# 后台运行 ./zombie &
[root@Kylin-V10-sp3 ~/app/packages]# ./zombie &
[1] 24241
[root@Kylin-V10-sp3 ~/app/packages]# I am parent,24241
sleep....
I am child,24260
Child exits
排查
#1.检查是否有僵尸进程
[root@Kylin-V10-sp3 ~/app/packages]# top
top - 08:58:05 up 2 days, 9:45, 3 users, load average: 0.00, 0.00, 0.00
Tasks: 166 total, 1 running, 162 sleeping, 0 stopped, 3 zombie
%Cpu(s): 0.0 us, 6.2 sy, 0.0 ni, 93.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 948.2 total, 148.9 free, 246.8 used, 552.5 buff/cache
MiB Swap: 2156.0 total, 2106.5 free, 49.5 used. 524.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 106680 7336 4696 S 0.0 0.8 0:14.15 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
#2.查看哪个进程是僵尸进程 ps aux |grep Z #记录下pid
[root@Kylin-V10-sp3 ~/app/packages]# ps aux | grep Z
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 24242 0.0 0.0 0 0 pts/2 Z 08:50 0:00 [zombie] <defunct>
root 24258 0.0 0.0 0 0 pts/2 Z 08:56 0:00 [zombie] <defunct>
root 24260 0.0 0.0 0 0 pts/2 Z 08:57 0:00 [zombie] <defunct>
root 24270 0.0 0.0 213156 892 pts/1 R+ 08:59 0:00 grep Z
[root@Kylin-V10-sp3 ~/app/packages]#
#3.查看僵尸进程上级进程 pstree -p 直接看即可 或ps -ef看第3列
[root@Kylin-V10-sp3 ~/app/packages]# pstree -p
systemd(1)─┬─NetworkManager(843)─┬─{NetworkManager}(851)
│ └─{NetworkManager}(852)
├─abrtd(753)─┬─{abrtd}(817)
│ └─{abrtd}(825)
├─atd(878)
├─chronyd(740)
├─crond(879)
├─dbus-daemon(723)
├─gssproxy(861)─┬─{gssproxy}(868)
│ ├─{gssproxy}(869)
│ ├─{gssproxy}(870)
│ ├─{gssproxy}(871)
│ └─{gssproxy}(872)
├─kylin_sock_serv(732)
├─login(2962)───bash(2984)
├─lsmd(733)
├─mdadm(717)
├─nginx(22441)───nginx(22442)
├─polkitd(736)─┬─{polkitd}(802)
│ ├─{polkitd}(806)
│ ├─{polkitd}(807)
│ ├─{polkitd}(808)
│ └─{polkitd}(828)
├─rngd(739)───{rngd}(779)
├─rsyslogd(1014)─┬─{rsyslogd}(1036)
│ └─{rsyslogd}(1040)
├─smartd(743)
├─sshd(856)─┬─sshd(23287)───sshd(23297)───bash(23302)───pstree(24272)
│ └─sshd(23442)───sshd(23452)───bash(23457)─┬─zombie(24241)───zombie(24242)
│ ├─zombie(24257)───zombie(24258)
│ └─zombie(24259)───zombie(24260)
├─sssd(744)─┬─sssd_be(819)
│ ├─sssd_nss(826)
│ └─sssd_pam(827)
├─systemd(12999)───(sd-pam)(13004)
├─systemd(2969)───(sd-pam)(2974)
├─systemd-journal(610)
├─systemd-logind(842)
├─systemd-udevd(628)
├─tuned(857)─┬─{tuned}(1153)
│ ├─{tuned}(1154)
│ └─{tuned}(1168)
└─zabbix_agentd(16013)─┬─zabbix_agentd(16014)
├─zabbix_agentd(16015)
├─zabbix_agentd(16016)
├─zabbix_agentd(16017)
└─zabbix_agentd(16018)
[root@Kylin-V10-sp3 ~/app/packages]#
#4.如果上级进程不是主进程 ,则通过kill + pid结束进程 alt+左键可选中一列
[root@Kylin-V10-sp3 ~/app/packages]# kill 24241 24257 24259
[root@Kylin-V10-sp3 ~/app/packages]#
#5.检查僵尸进程数量
[root@Kylin-V10-sp3 ~/app/packages]# ps aux | grep Z
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 24302 0.0 0.0 213156 892 pts/1 S+ 09:03 0:00 grep Z
1.通过top,ps命令检查是否有僵尸进程,具体哪个是僵尸进程.
2.如果上级进程不是主进程 ,则通过kill + pid结束进程
1.3 孤儿进程
对系统影响不大.
某个子进程的父进程,因为特殊原因退出了,但是子进程还在.子进程就是孤儿进程.
检查:事前列好你要监控的服务,通过pstree -p 或ps -ef
解决: 重启服务
1.4 进程监控指令 ⭐️⭐️⭐️⭐️⭐️
1.4.1 ps命令内容与输出
1.ps -ef
2.ps aux
3.对进程占用cpu或内存使用率进行排序
# CPU
[root@Kylin-V10-sp3 ~/app/packages]# ps aux | awk 'NR>1' | sort -rnk3 | head
zabbix 16018 0.0 0.5 38436 5200 ? S 8月30 0:01 zabbix_agentd: active checks #1 [idle 1 sec]
zabbix 16017 0.0 0.5 38436 5204 ? S 8月30 0:00 zabbix_agentd: listener #3 [waiting for connection]
zabbix 16016 0.0 0.5 38436 5204 ? S 8月30 0:00 zabbix_agentd: listener #2 [waiting for connection]
zabbix 16015 0.0 0.5 38436 5204 ? S 8月30 0:00 zabbix_agentd: listener #1 [waiting for connection]
zabbix 16014 0.0 0.2 38292 2688 ? S 8月30 0:04 zabbix_agentd: collector [idle 1 sec]
zabbix 16013 0.0 0.3 38292 2980 ? S 8月30 0:00 zabbix_agentd
xk2 2984 0.0 0.2 223484 2020 tty1 Ss+ 8月29 0:00 -bash
xk2 2974 0.0 0.0 125240 324 ? S 8月29 0:00 (sd-pam)
xk2 2969 0.0 0.3 20572 3800 ? Ss 8月29 0:00 /usr/lib/systemd/systemd --user
root 99 0.0 0.0 0 0 ? S 8月28 0:00 [irq/30-pciehp]
[root@Kylin-V10-sp3 ~/app/packages]#
# MEM
[root@Kylin-V10-sp3 ~/app/packages]# ps aux | awk 'NR>1' | sort -rnk4 | head
root 826 0.0 3.1 264528 30188 ? S 8月28 0:01 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root 23442 0.0 1.0 250384 9776 ? Ss 05:35 0:00 sshd: root [priv]
root 23287 0.0 1.0 250384 9836 ? Ss 05:18 0:00 sshd: root [priv]
root 610 0.0 0.9 49180 9220 ? Ss 8月28 0:03 /usr/lib/systemd/systemd-journald
root 1 0.0 0.7 106680 7336 ? Ss 8月28 0:14 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
root 23302 0.0 0.6 224124 6084 pts/1 Ss 05:18 0:00 -bash
zabbix 16018 0.0 0.5 38436 5200 ? S 8月30 0:01 zabbix_agentd: active checks #1 [idle 1 sec]
zabbix 16017 0.0 0.5 38436 5204 ? S 8月30 0:00 zabbix_agentd: listener #3 [waiting for connection]
zabbix 16016 0.0 0.5 38436 5204 ? S 8月30 0:00 zabbix_agentd: listener #2 [waiting for connection]
zabbix 16015 0.0 0.5 38436 5204 ? S 8月30 0:00 zabbix_agentd: listener #1 [waiting for connection]
[root@Kylin-V10-sp3 ~/app/packages]#
4.ps aux结果中 VSZ和RSS区别
#VSZ : virtual 虚拟内存=物理内存+swap
# RSS: 物理内存
'''
VSZ RSS进程占用的内存大小,单位是KB.
物理内存:通过 free -h查看.
内存不足的时候临时充当内存:
windows: 虚拟内存
Linux: swap (交换分区)
'''
1.4.2 进程状态⭐️
进程状态基础: | 说明 |
---|---|
Z | zombie僵尸进程 |
R | running 进程运行中,占用CPU. |
S | Sleeping 休眠,没有运行. |
D | 不可中断进程,一般进行IO(读写磁盘). Input Output |
T | 挂起的进程,后台运行并且暂停状态. |
组合的状态 | 说明 |
R+ 或S+ 或D+ (带+) | 进程前台运行. |
Ss xxxxs (带s) | 进程是管理进程(父进程) |
R< 或S< (带<) | 高优先级进程 |
RN 或SN (带N) | 低优先级进程 |
Sl (带l) | 进程是多线程 (进程,线程) |
其他进程状态
[root@Kylin-V10-sp3 ~/app/packages]# man ps
# 搜索 stat
PROCESS STATE CODES
Here are the different values that the s, stat and
state output specifiers (header "STAT" or "S") will display to
describe the state of a process:
D uninterruptible sleep (usually IO)
I Idle kernel thread
R running or runnable (on run queue)
S interruptible sleep (waiting for an event
to complete)
T stopped by job control signal
t stopped by debugger during the tracing
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but
not reaped by its parent
For BSD formats and when the stat keyword is used,
additional characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for realtime and custom IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD,
like NPTL pthreads do)
+ is in the foreground process group
1.4.3 后台运行(&) ⭐️⭐️⭐️⭐️⭐️
前台运行的命令,如果远程连接断开,前台的命令就就结束了.
让命令去后台运行,后台运行的命令一般就不会因为链接断开而结束.
方案 | 说明 | 应用场景 |
---|---|---|
命令+& | 最常用,当前连接结束后会退出. | 大部分时候 |
nohup+命令+& | 最常用,比较常用,命令输出保留到一个文件中nohup.out | 稳定,记录输出 |
ctrl + z | 后台挂起(暂停),回到前台运行fg,后台运行bg | 较少使用. |
screen命令 | 命令需要安装,screen命令创建虚机窗口,命令可以在里面运行. | 自己研究 |
# &方法 :命令后台运行 ⭐ ⭐ ⭐ ⭐ ⭐
[root@Kylin-V10-sp3 ~/app/packages]# ping qq.com &
[1] 24763
[root@Kylin-V10-sp3 ~/app/packages]# PING qq.com (203.205.254.157) 56(84) bytes of data.
64 bytes from 203.205.254.157 (203.205.254.157): icmp_seq=1 ttl=128 time=247 ms
64 bytes from 203.205.254.157 (203.205.254.157): icmp_seq=2 ttl=128 time=248 ms
64 bytes from 203.205.254.157 (203.205.254.157): icmp_seq=3 ttl=128 time=249 ms
64 bytes from 203.205.254.157 (203.205.254.157): icmp_seq=4 ttl=128 time=247 ms
64 bytes from 203.205.254.157 (203.205.254.157): icmp_seq=5 ttl=128 time=247 ms
# nohup+ 命令+& :命令后台运行并把输出写入到文件 ⭐ ⭐ ⭐ ⭐ ⭐
# 默认写入到当前目录 nohup.out
[root@Kylin-V10-sp3 ~/app/packages]# nohup ping qq.com >>/var/log/ping.log &
[1] 24787
[root@Kylin-V10-sp3 ~/app/packages]# nohup: 忽略输入重定向错误到标准输出端
[root@Kylin-V10-sp3 ~/app/packages]# tail /var/log/ping.log
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=46 ttl=128 time=50.0 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=47 ttl=128 time=46.5 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=48 ttl=128 time=47.1 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=49 ttl=128 time=47.8 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=50 ttl=128 time=45.7 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=51 ttl=128 time=46.1 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=52 ttl=128 time=46.4 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=53 ttl=128 time=48.1 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=54 ttl=128 time=46.9 ms
64 bytes from 157.255.219.143 (157.255.219.143): icmp_seq=55 ttl=128 time=46.8 ms
使用过yum/apt安装软件卡住了,ctrl+c无法取消.(前台)
#1.进入后台挂起 ctrl+z
[root@Kylin-V10-sp3 ~/app/packages]# yum reinstall tree
上次元数据过期检查:0:41:46 前,执行于 2024年08月31日 星期六 10时34分32秒。
^Z
[1]+ 已停止 yum reinstall tree
[root@Kylin-V10-sp3 ~/app/packages]#
#2.jobs查看后台的进程
[root@Kylin-V10-sp3 ~/app/packages]# jobs
[1]+ 已停止 yum reinstall tree
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# fg
yum reinstall tree
依赖关系解决。
===========================================================================================================================================================================
Package Architecture Version Repository Size
===========================================================================================================================================================================
重新安装:
tree x86_64 1.8.0-2.ky10 ks10-adv-os 51 k
事务概要
===========================================================================================================================================================================
总计:51 k
安装大小:115 k
确定吗?[y/N]:
1.4.4 top命令格式与用法
1.4.5 ps,top应用案例 ⭐️⭐️⭐️⭐️⭐️
1:过滤出叫crond的进程或sshd的进程
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef | egrep 'crond|sshd'
root 856 1 0 8月28 ? 00:00:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root 879 1 0 8月28 ? 00:00:00 /usr/sbin/crond -n
root 23287 856 0 05:18 ? 00:00:00 sshd: root [priv]
root 23297 23287 0 05:18 ? 00:00:00 sshd: root@pts/1
root 23442 856 0 05:35 ? 00:00:00 sshd: root [priv]
root 23452 23442 0 05:35 ? 00:00:00 sshd: root@pts/2
root 24877 23302 0 11:31 pts/1 00:00:00 grep -E crond|sshd
[root@Kylin-V10-sp3 ~/app/packages]#
2:统计crond进程数量
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef | grep crond
root 879 1 0 8月28 ? 00:00:00 /usr/sbin/crond -n
root 24885 23302 0 11:32 pts/1 00:00:00 grep crond
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef | grep crond | grep -v grep
root 879 1 0 8月28 ? 00:00:00 /usr/sbin/crond -n
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef | grep crond | grep -v grep | wc -l
1
[root@Kylin-V10-sp3 ~/app/packages]#
# 进阶
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef |grep '[c]rond' |wc -l
1
3:输出rsyslog进程的pid
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef | grep rsyslog
root 1014 1 0 8月28 ? 00:00:07 /usr/sbin/rsyslogd -n -i/var/run/rsyslogd.pid
root 24903 23302 0 11:35 pts/1 00:00:00 grep rsyslog
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef | grep rsyslog | grep -v grep
root 1014 1 0 8月28 ? 00:00:07 /usr/sbin/rsyslogd -n -i/var/run/rsyslogd.pid
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# ps -ef | grep rsyslog | grep -v grep | wc -l
1
[root@Kylin-V10-sp3 ~/app/packages]#
4:top命令的快捷键
q quit退出
空格 立刻更新
P 默认,按照CPU使用率排序. 核心
M 按照进程的内存使用率排序. 核心
1 显示所有核心的cpu使用情况. 核心
先按z 显示颜色 然后按x 然后按 > 或 <
5:top命令非交互模式与过滤指定内容
# top命令变成非交互模式.top -bn1
-b 显示所有信息,不要仅仅显示头部信息和部分的进程信息.
-n1 显示1次.
[root@Kylin-V10-sp3 ~/app/packages]# top -bn1 |grep 'zombie'
Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# top -bn1 |grep 'zombie' |awk '{print $(NF-1)}'
0
[root@Kylin-V10-sp3 ~/app/packages]#
6:取出系统登录用户数量.
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | head
top - 11:36:23 up 2 days, 12:23, 3 users, load average: 0.00, 0.00, 0.00
Tasks: 160 total, 1 running, 159 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 6.2 sy, 0.0 ni, 93.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 948.2 total, 149.9 free, 246.0 used, 552.3 buff/cache
MiB Swap: 2156.0 total, 2106.5 free, 49.5 used. 525.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 106680 7336 4696 S 0.0 0.8 0:14.59 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | awk 'NR==1'
top - 11:37:23 up 2 days, 12:24, 3 users, load average: 0.00, 0.00, 0.00
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | awk 'NR==1' | awk '{print $8}'
3
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | grep users
top - 11:38:36 up 2 days, 12:25, 3 users, load average: 0.00, 0.00, 0.00
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | grep users | awk '{print $(NF-6)}'
3
[root@Kylin-V10-sp3 ~/app/packages]#
7:取出僵尸进程数量
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | grep zombie
Tasks: 161 total, 1 running, 160 sleeping, 0 stopped, 0 zombie
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | grep zombie | awk '{print $(NF-1)}'
0
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | awk 'NR==2'
Tasks: 161 total, 1 running, 160 sleeping, 0 stopped, 0 zombie
[root@Kylin-V10-sp3 ~/app/packages]#
[root@Kylin-V10-sp3 ~/app/packages]# top bn1 | awk 'NR==2 {print $(NF-1)}'
0
[root@Kylin-V10-sp3 ~/app/packages]#
1.4.6 小结
ps,top核心内容(pid,ppid,cpu使用率,内存使用率,VSZ,RSS)
进程状态.Z,R,D,T
ps,top与三剑客进行过滤
统计次数
取行取列
2.系统负载故障案例
2.1 系统负载
2.1.1 概述
- 系统负载:平均值,最近1分钟,最近5分钟,最近15分钟平均值. 一般也叫平均负载.
- 它在Linux中一般用于衡量系统繁忙程度.
- 定义:在单位时间内,系统进程处于可以运行状态(R,S状态)和处于不可中断状态(D磁盘读写)的进程数量.
2.1.2 如何衡量系统负载值高低:
- 系统负载的值与cpu核心总数比较(lscpu).
- 接近于CPU核心总数就表示系统负载有些高.
- 一般进阶cpu核心总数的60%-70%我们就要警惕. 阈值
4核心 负载警告范围:2.4 3.2
4核心 cpu使用率最多可以达到400% 某个进程而言.
2.1.3 如果系统负载高了,常见原因:
- CPU
- 进行磁盘IO
2.1.4 系统负载高排查流程 ⭐️⭐️⭐️⭐️⭐️
1.通过监控软件发现出现故障
2.通过堡垒机连接故障机器,确定是否有故障 # w,top,lscpu(查看核心数)
3.定位cpu导致还是io导致 # top命令
top命令的%CPU部分如果us(user)或sy(system)高,表示CPU占用导致的负载高 # top命令看us和sy
top命令的%CPU这行的wa(iowait)系统的进程正在进行读写操作,排队,表示磁盘读写导致. # top命令看wa
4.定位到具体的进程
如果是CPU导致,top命令cpu排序,ps aux.过滤排序
如果是io导致,iotop -o
5.找出凶手后(问题进程),根据进程找出服务,查看服务
2.1.5 模拟系统负载高 ⭐⭐⭐⭐⭐
stress压力测试(跑分)
点击查看代码
# 临时改语言为英文
[root@Kylin-V10-sp3 ~]# export LANG=en_US
[root@Kylin-V10-sp3 ~]#
[root@Kylin-V10-sp3 ~]# top -bn1
top - 17:18:06 up 5:26, 2 users, load average: 0.02, 1.27, 1.29
Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 5.6 sy, 0.0 ni, 83.3 id, 5.6 wa, 0.0 hi, 5.6 si, 0.0 st
MiB Mem : 948.2 total, 508.1 free, 267.0 used, 173.1 buff/cache
MiB Swap: 2156.0 total, 2115.1 free, 40.9 used. 539.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 105780 4584 3080 S 0.0 0.5 0:01.97 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
# 查看cpu核心数Socket(s): 1表示1核心
# top指令结果中 load average: 0.02, 1.27, 1.29 不超过1的70%为正常,两个核心就是2的70%
[root@Kylin-V10-sp3 ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 45 bits physical, 48 bits virtual
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
#1.安装
[root@Kylin-V10-sp3 ~]# yum install -y stress
#2. 模拟CPU导致的负载高 看top中的 %Cpu(s): 0.0 us, 5.6 sy
[root@Kylin-V10-sp3 ~]# stress --cpu 2 --timeout 1000
#3. 模拟IO导致的系统负载高 看top中的 %Cpu(s): 5.6 wa,
[root@Kylin-V10-sp3 ~]# stress --io 2 --hdd 3 --hdd-bytes 1g --timeout 10000s