安天防病毒,麒麟系统,内存溢出,问题排查

问题:

八角

 

今天客户遇到这个问题,导致系统的业务登录不进去,ssh也登录不上,用显示屏,发现一直报错

莱芜

 


解决步骤:

咱这两台跑的一样的应用吗?

我看刚才这台是nginx服务?

oom了

out of memory 了

命令:

收个sosreport -a ,打包下/var/log

sosreport -a 嗯

 

在17:30 左右没有效信息

不是kms激活的话,可以停掉这个激活服务

是因为kms这个问题导致的吗

不是,日志中发生问题的时间端没有记录 ,只是这个服务一直在刷,想关闭它

买的正版授权

 

这个是其他地方的截图

可能别的问题,out of memory 应该有日志记录的,咱这边在问题发生的时间段没有相关日志呢

需要怎么操作,能找找问题

先看下 rpm -qa | grep audit 和 rpm -qa| grep mate-indi

 

时间点是昨天的17:30左右是吗

昨天下午3点就有问题了

sar -r 5 -o /var/log/sarcpu.log

老师,您执行下这条命令,实时监控下内存,待再出现问题的时候,把那个log文件给我们发下看看

给个靠谱的解决方案,你这能叫找问题的方法吗

 

老师帮忙看下sysstat的版本;执行rpm -qa |grep sysstat

 

蕃茄你个西红柿: 老师,咱这个设备是内网环境还是外网环境?怎么会有el6的包呢?

zxg: 内网

蕃茄你个西红柿: 好的,那咱先更新下这个sysstat包吧,后续方便定位问题

zxg: el6是什么

蕃茄你个西红柿: rhel6版本的包

zxg: 有什么影响

蕃茄你个西红柿: 收集的系统资源数据,在我们系统上无法解码,el6的版本太低了

zxg: 装了防病毒

zxg: 你需要怎么看看

zxg: 现在还在报错,web页面和ssh等登录不上

zxg: 键盘没有反应

蕃茄你个西红柿: nginx服务里跑的应用把内存消耗了

蕃茄你个西红柿: 触发了oom

zxg: 能确定吗

zxg: 有没有什么能确定的日志,我让软件的厂家看看

蕃茄你个西红柿: 我们研发上午初步分析是这样的

zxg: 得有能确定的东西,我先找的软件的,他们看日志找不到问题,才找的你们

蕃茄你个西红柿: 

 

蕃茄你个西红柿: 您先把sysstat的el6包卸载了,装上我们ky10版本的

蕃茄你个西红柿: nginx的日志里一直在抛进程异常退出

 

zxg: 怎么操作

蕃茄你个西红柿: rpm -e sysstat

蕃茄你个西红柿: rpm -e sysstat-9.0.4-27加上版本号试试

zxg: 卸载了这个,后面就有日志了是吧

蕃茄你个西红柿: 再看看rpm -qa|grep sysstat

 

蕃茄你个西红柿: 在看下sar -V看看版本

蕃茄你个西红柿: 大写的V

 

蕃茄你个西红柿: 有网络源吗?

zxg: 没有

蕃茄你个西红柿: 这个是那个版本?nkvers

 

我给您找个安装包吧,稍等

蕃茄你个西红柿: 重装下这个包

zxg: 怎么装

蕃茄你个西红柿: rpm -ivh sysstat-12.2.1-6.ky10.x86_64.rpm

 

rpm -Uvh 呢

 


然后再看下sar -V

 


好了,可以了

刚才那个sysstat,再执行个命令;systemctl restart sysstat;systemctl status sysstat

查看top

top -c -M

查看内存

free -hm

 -------------------------------------------------------------------------------------------------------------------------------------------------

补充:

每次重启记得手动 执行下  nohup /usr/bin/nmon -ft -d 256 -s 60 -c 1440 -m /opt/nmondata/ &   这个  

命令合集:

测试写入一个大文件需要多长时间

任意目录执行就行,方便帮我看看这个命令多长时间吗

没啥影响  这个命令就是生成一个1G的文件,测试磁盘读写用的

dd if=/dev/zero of=largefile bs=1G count=10

 看下 现在的内存使用情况呢 ,

sar   -r   2

这条命令,实时监控下内存,把那个log文件给我们发下看看

sar -r 5 -o /var/log/sarcpu.log

看看sar版本

sar -V

查看系统版本

nkvers

--------------------------------------------------------------------------------------------------

已执行命令:

[root@localhost home]# history
1 ifconfig
2 vi /etc/sysconfig/network-scripts/ifcfg-em1
3 Z
4 ifdown em1
5 ifup em1
6 ifconfig
7 poweroff
8 cd /data0/app_new/newhtdocs/linuxsetup/x86_64
9 ls
10 ll
11 cd /home/
12 tar -zxvf 20221207.tar.gz
13 \cp -rf /home/20221207/* /data0/cloud/hashdb/
14 /data0/bin/supervisorctl.sh restart server_cloud
15 poweroff
16 cd /data0/app_new/newhtdocs/download
17 ls
18 du -sh *
19 vi /etc/sysconfig/network-scripts/em1
20 vi /etc/sysconfig/network-scripts/
21 vi /etc/sysconfig/network-scripts/em1
22 cd /etc/sysconfig/network-scripts/
23 ls
24 vi ifcfg-em1
25 ifdown em1
26 ifdown ifcfg-em1
27 qu
28 ifconfig
29 cd /etc/sysconfig/network-scripts/
30 ls
31 vi ifcfg-em1
32 ls
33 ifdown ifcfg-em1
34 ifdown em1
35 ls
36 vi em1
37 ls
38 reboot
39 declare -x SUPPORTCONF="/data0/server_manage/support/conf/support.conf"
40 cd /data0/app_new/
41 LD_LIBRARY_PATH=./env/posix/lib7/ python share/patch.pyc -f /data0/server_manage/newhtdocs/download/
42 vi /etc/nginx/conf.d/apsc.conf
43 systemctl restart nginx
44 cd /data0/conf/sysconf/
45 vi patch_version
46 ifconfig
47 ifdown em1
48 ifup em1
49 ping 1.1.1.100
50 ping 1.1.1.2
51 ip add
52 vi /etc/sysconfig/network-scripts/ifcfg-em1
53 ifdown em1
54 ifup em1
55 ip add
56 poweroff
57 cd /etc/sysconfig/network-scripts/
58 ls
59 vi ifcfg-em2
60 cd /etc/sysconfig/network-scripts/
61 ls
62 bi ifcfg-em2
63 vi ifcfg-em2
64 ifdown em2
65 ifup em2
66 ifconfig
67 vi ifcfg-em2
68 reboot
69 ip addr
70 cd /etc/sysconfig/network-scripts/
71 ls
72 vi ifcfg-em2
73 vim -r ifcfg-em2
74 vi ifcfg-em2
75 vi ifcfg-em1
76 ifdown em2
77 ifup em2
78 ifconfig
79 vi ifcfg-em2
80 ls
81 ifdown em2
82 if dowm ifcfg-em2
83 ifdown em2
84 ifuo em2
85 ifup em2
86 vi ifcfg-em2
87 vi ifcfg-em1
88 vi ifcfg-em2
89 ifdown em2
90 ls
91 rm .ifcfg-em2.swp
92 vi ifcfg-em2
93 ifdown em2
94 ifup em2
95 ip addr
96 vi ifcfg-em2
97 ifup em2
98 ifdown em2
99 ifup em2
100 ipaddr
101 ip addr
102 vi ifcfg-em2
103 ifdown em2
104 ifup em2
105 ip addr
106 vi ifcfg-em2
107 ifdown em2
108 ifup em2
109 ip addr
110 ping 10.54.142.32
111 ping 10.54.142.253
112 cd /etc/sysconfig/network-scripts/
113 vi ifcfg-em1
114 ifdown em1
115 ls
116 ifup em1
117 ping 1.1.1.1
118 ping 10.54.142.253
119 cd /
120 cd home/
121 ls
122 tar -xf packages-x86_64-0524-20230216.tar.gz
123 ls
124 cd packages
125 yum localinstall *rpm -y
126 systemctl daemon-reload
127 systemctl restart sshd
128 systemctl restart auditd
129 vi /etc/security/pwquality.conf
130 chage -M 90 root
131 chage -m 1 root
132 chage -W 3 root
133 vi /etc/pam.d/system-auth
134 security-switch --set default
135 systemctl enable --now firewalld
136 firewall-cmd --add-service=http --permanent
137 firewall-cmd --add-service=https --permanent
138 firewall-cmd --add-service=ssh --permanent
139 firewall-cmd reload
140 chmod -R 700 /var/log/audit
141 chown root:root -R /var/log/audit
142 vi /etc/sysconfig/network-scripts/
143 cd /etc/sysconfig/network-scripts/
144 vi ifcfg-em1
145 vi ifcfg-em2
146 systemctl status firewall
147 systemctl status firewalld
148 firewalld-cmd --add-service=ssh --permanent
149 firewall-cmd --add-service=ssh --permanent
150 systemctl status firewalld
151 firewall-cmd --reload
152 systemctl status firewalld
153 chage -l root
154 vi /etc/security/pwquality.conf
155 curl http://10.52.9.168:80/cloud/
156 curl http://10.52.9.168:443/cloud/
157 cd /data0/server_manage/newhtdocs/download/
158 du -sh
159 getstatus
160 cd /data0/app_new/newhtdocs/download/
161 ls
162 cd
163 /data0/bin/supervisorctl.sh status soft_es
164 vi /data0/conf/sysconf/createindex.conf
165 /data0/bin/supervisorctl.sh status soft_es
166 cd /data0/app_new/newhtdocs/download/
167 ls
168 cd
169 vi /data0/conf/sysconf/createindex.conf
170 getstatus
171 ls
172 cd /hoe
173 cd /home
174 ls
175 chmod 755 client\(10.54.142.32_80_20_1_1\).deb
176 ./client\(10.54.142.32_80_20_1_1\).deb
177 .//client\(10.54.142.32_80_20_1_1\).deb'
178 ./'client\(10.54.142.32_80_20_1_1\).deb'
179 .client\(10.54.142.32_80_20_1_1\).deb
180 sudo dpki client\(10.54.142.32_80_20_1_1\).deb
181 sudo dpki -i client\(10.54.142.32_80_20_1_1\).deb
182 sudo dpkg -i client\(10.54.142.32_80_20_1_1\).deb
183 ls
184 ll
185 ./client\(10.54.142.32_80_20_1_1\).deb
186 ./"client(10.54.142.32_80_20_1_1).deb"
187 dpkg -i client\(10.54.142.32_80_20_1_1\).deb
188 vi client\(10.54.142.32_80_20_1_1\).deb
189 chmod 755 client\(10.54.142.32_80_20_1_1\).rpm
190 ./client\(10.54.142.32_80_20_1_1\).rpm
191 ./'client\(10.54.142.32_80_20_1_1\).rpm'
192 ./"client\(10.54.142.32_80_20_1_1\).rpm"
193 ./client\(10.54.142.32_80_20_1_1\).rpm
194 ll
195 chmod 755 client\(10.54.142.32_80_20_1_1\).deb
196 ./client\(10.54.142.32_80_20_1_1\).deb
197 ./client\(10.54.142.32_80_20_1_1\).rpm
198 sudo ./client\(10.54.142.32_80_20_1_1\).deb
199 chmod 777 client\(10.54.142.32_80_20_1_1\).deb
200 ./client\(10.54.142.32_80_20_1_1\).deb
201 vi client\(10.54.142.32_80_20_1_1\).deb
202 chmod 755 client\(10.54.142.32_80_20_1_1\).deb
203 ./client\(10.54.142.32_80_20_1_1\).deb
204 chmod 755 client\(10.54.142.32_80_20_1_1\).deb
205 ./client\(10.54.142.32_80_20_1_1\).deb
206 dpkg -i client\(10.54.142.32_80_20_1_1\).deb
207 ls
208 rpm -ivh client\(10.54.142.32_80_20_1_1\).rpm
209 ps -ef | grep iep
210 ps -ef | grep antiy
211 ifconfig
212 df -h
213 ps -ef
214 pstree
215 df -h
216 ll
217 curl http://10.52.9.168:443/clooud
218 curl http://10.52.9.168:80/cloud/
219 du -sh /data0/tmp/*
220 passwd
221 exit
222 ls
223 top
224 df -h
225 du -sh /data0/tmp/*
226 cd /tmp/
227 ls
228 free -h
229 vi /etc/ssh/sshd_config
230 vi /etc/login.defs
231 sudo chage -M 90 username
232 vi /etc/login.defs
233 sudo chage -M 90 username
234 sudo chage -M 99999 root
235 ll
236 cd /root
237 ll
238 sosreport -a
239 ll
240 cd /var/tmp/
241 ll
242 rpm -qa | grep audit 和 rpm -qa| grep mate-indi
243 rpm -qa | grep audit
244 rpm -qa| grep mate-indi
245 message
246 sar -r 5 -o /var/log/sarcpu.log
247 cd "/var/tmp"
248 rpm -qa |grep sysstat
249 rpm -e sysstat-9.0.4-27
250 rpm -qa |grep sysstat
251 rpm -e sysstat-9.0.4-27.el6.x86_64
252 rpm -qa|grep sysstat
253 top
254 ps aux |grep
255 ps aux |grep 769236
256 sar -V
257 sar -r 5 -o /1.txt
258 nkvers
259 rpm -ivh sysstat-12.2.1-6.ky10.x86_64.rpm
260 cd /root
261 ll
262 rpm -ivh sysstat-12.2.1-6.ky10.x86_64.rpm
263 sar -V
264 rpm -Uvh sysstat-12.2.1-6.ky10.x86_64.rpm
265 sar -V
266 a
267 top
268 top -c -M
269 systemctl restart sysstat
270 systemctl status sysstat
271 top
272 sar -r 2
273 free -hm
274 cat /var/log/sarcpu.log
275 tail -50f /var/log/sarcpu.log
276 free -hm
277 top
278 top -h -m
279 top -c -M
280 cd /home
281 df -h
282 ll
283 date
284 date
285 dd if=/dev/zero of=largefile bs=1G count=10
286 date
287 all
288 ll
289 ip a
290 ps aux --sort=-pmem
291 ps aux --sort=-pmem | head 11
292 ps aux --sort=-pmem | head -11
293 free -hm
294 top
295 free -h
296 crontab -l
297 top
298 sar -w 2
299 ps -aux
300 ps -aux |sort -k 4nr |head
301 journalctl
302 rpm -qa | grep audit
303 rpm -qa | grep mate-indic
304 runlevel
305 dmesg -T
306 cpupower --help
307 cpupower info
308 lscpu
309 top
310 ps -ef |grep python
311 ss -tnlp | grep 9090
312 ps -ef |grep cockpit
313 top
314 vmstat -h
315 vmstat
316 vmstat 1 10
317 vmstat -w 1 10
318 vmstat -w 1 1000
319 iostat
320 iostat 2
321 cd /etc/yum.repos.d/
322 ls
323 cat kylin_x86_64.repo
324 lsblk
325 df -h
326 du -sh /var
327 du -sh /data0/
328 df -h
329 ll /var/spool/cron/
330 crontab -l
331 systemctl status auditd.service
332 nkvers
333 cd /home/
334 ls
335 rpm -ivh nmon-16m-1.ky10.x86_64.rpm
336 which nmon
337 mkdir /opt/nmondata
338 nmon -h
339 nohup /usr/bin/nmon -ft -d 256 -s 60 -c 1440 -m /opt/nmondata/ &
340 ps -ef |grep nmon
341 ll /opt/nmondata/
342 ll /opt/nmondata/ -h
343 df -h
344 ll /opt/nmondata/ -h
345 history
[root@localhost home]#

-------------------------------------------------------------------------------------------------------------

系统厂家所做改动:

老师  再这台机器上部署了个atop监控  当前看没有问题,需要等问题复现再继续排查下

我们这边做的变动
依赖包:ncurses-devel-6.2-1.ky10.x86_64
atop程序:/home/atop-2.8.1目录下
yum源:/yumrpm目录及/etc/yum.repo.d/kylin_x86_64.repo文件

 

-----------------------------------------------------------------------------------

安天脚本

脚本1:status_supervisorctl.sh

#!/bin/bash

#status supervisorctl

pushd . cd /data0/app_new/

os=/data0/sharedir/share/check_os.sh

LD_LIBRARY_PATH=./env/posix/${os}/ python /data0/bin/supervisorctl -c /data0/conf/sysconf/supervisord.conf status

popd

 脚本2:check.sh

#!/bin/bash -x
time=`date +%Y%m%d%H%M%S`
echo $time
free -mh
ps aux
top -c -M -b -n 1
w
/data0/bin/status_supervisorctl.sh

posted on 2023-12-01 14:27  叶子在行动  阅读(175)  评论(0编辑  收藏  举报

导航