随笔 - 17, 文章 - 0, 评论 - 1, 阅读 - 14316
  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Linux系统异常

Posted on   Sxcan  阅读(2919)  评论(0编辑  收藏  举报

2021-05-09 维护重启后通过ILO无法登入系统,显示黑屏,SSH软件无法连接,报警报宕机。

数据库信息
Oracle 11.2.0.4
Linux Red Hat Enterprise Linux Server release 7.6 (Maipo)
解决:

1.查看服务器是否有硬件问题,未发现异常

2.尝试禁用HBA卡,成功登陆系统,但隔一段时间后无法打开新的连接窗口

3.怀疑是系统内某进程佔用负荷过高(CPU、内存)禁用HBA卡或拔掉光纤线,查看vmstat日志,发现无异常飙高

zzz ***Sun May 9 12:11:56 CST 2021
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 15717112     24 82470040    0    0  2908   695    0    0 11  3 83  3  0
 1  0      0 15719288     24 82470080    0    0  5000    45 4250 4337  1  0 98  0  0
 1  2      0 15783448     24 82472392    0    0  3332  3291 6834 6997  1  1 97  1  0
zzz ***Sun May 9 12:12:29 CST 2021
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 21664552     24 79940448    0    0  2908   695    0    0 11  3 83  3  0
 3  7      0 21657500     24 79940624    0    0  1144 42016 15363 30440  2  1 97  1  0
 1  3      0 21653532     24 79943136    0    0     0 210522 44690 130180  1  1 95  2  0
zzz ***Sun May 9 12:13:02 CST 2021
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 99840808     80 2259040    0    0  2908   695    0    0 11  3 83  3  0
 2  0      0 99844848     80 2259056    0    0    64     0 8102 7758  2  1 98  0  0
 0  0      0 99848672     80 2259492    0    0    24     0 3535 2067  1  0 98  0  0
zzz ***Sun May 9 12:13:36 CST 2021
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 99858632     80 2258984    0    0  2908   695    0    0 11  3 83  3  0
 1  0      0 99859952     80 2259144    0    0     0     0 3764 2371  1  0 98  0  0
 1  0      0 99858624     80 2258960    0    0     0     0 3970 2551  1  0 98  0  0
zzz ***Sun May 9 12:14:10 CST 2021
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 99838520     80 2267132    0    0  2908   695    0    0 11  3 83  3  0
10  0      0 99829304     80 2267564    0    0     0    20 18294 20611  2  4 94  0  0
 1  0      0 99835808     80 2265804    0    0     0    24 16739 17576  2  3 95  0  0
zzz ***Sun May 9 12:14:44 CST 2021
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 99859136     80 2262756    0    0  2908   695    0    0 11  3 83  3  0
 1  0      0 99857792     80 2262628    0    0     0     0 5279 1988  1  0 98  0  0
 0  0      0 99854688     80 2262912    0    0     0     4 4837 2356  1  0 98  0  0
zzz ***Sun May 9 12:15:17 CST 2021
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 99857664     80 2262476    0    0  2908   695    0    0 11  3 83  3  0
 1  0      0 99854832     80 2262628    0    0     0     0 4546 3287  1  0 98  0  0
 1  0      0 99855280     80 2262864    0    0     0     0 3563 2420  1  0 98  0  0

4.怀疑是系统的某个关键进程没起来,执行命令查看,和正常系统比多发现此服务器少了个进程

root      9830     1  0 10:48 ?        00:00:01 /usr/lib/systemd/systemd-journald
[root@host01 ~]$ ps -ef|grep -i systemd
root         1     0  0 10:48 ?        00:00:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
root      9859     1  0 10:48 ?        00:00:02 /usr/lib/systemd/systemd-udevd
dbus     22465     1  0 10:48 ?        00:00:02 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root     22481     1  0 10:48 ?        00:00:00 /usr/lib/systemd/systemd-logind
Monitor+ 42713 42233  0 12:01 pts/0    00:00:00 grep --color=auto -i systemd

5.查看该服务器的状态,发现是masked,被锁定了,查看服务启动状态,同样是masked

[root@host01 ~]# systemctl status systemd-journald
● systemd-journald.service
   Loaded: masked (/dev/null; bad)
   Active: inactive (dead) since Thu 2021-05-13 09:04:48 CST; 46min ago
 Main PID: 452 (code=exited, status=0/SUCCESS)
   Status: "Processing requests..."

May 13 09:04:33 localhost.localdomain systemd-journal[452]: Runtime journal is using 8.0M (ma…G).
May 13 09:04:33 localhost.localdomain systemd-journal[452]: Journal started
May 13 09:04:48 localhost.localdomain systemd-journal[452]: Journal stopped
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
Warning: systemd-journald.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Hint: Some lines were ellipsized, use -l to show in full.

[root@host01 ~]# systemctl list-unit-files|grep -i journald
systemd-journald.service                      masked  
systemd-journald.socket                       static  

6.执行命令unmask,然后开启服务

[root@host01 083729c7194e4009b815519a35942f9b]# systemctl unmask systemd-journald 
Removed symlink /etc/systemd/system/systemd-journald.service.

[root@host01 083729c7194e4009b815519a35942f9b]# systemctl status systemd-journald -l
 systemd-journald.service - Journal Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static; vendor preset: disabled)
   Active: inactive (dead) since Thu 2021-05-13 09:04:48 CST; 1h 25min ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
 Main PID: 452 (code=exited, status=0/SUCCESS)
   Status: "Processing requests..."

May 13 09:04:33 localhost.localdomain systemd-journal[452]: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 125.7G available  current limit 4.0G).
May 13 09:04:33 localhost.localdomain systemd-journal[452]: Journal started
May 13 09:04:48 localhost.localdomain systemd-journal[452]: Journal stopped
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

[root@host01 083729c7194e4009b815519a35942f9b]# systemctl restart systemd-journald       

[root@host01 083729c7194e4009b815519a35942f9b]# systemctl status systemd-journald 
 systemd-journald.service - Journal Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2021-05-13 10:30:33 CST; 3s ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
 Main PID: 56691 (systemd-journal)
   Status: "Processing requests..."
   CGroup: /system.slice/systemd-journald.service
           └─56691 /usr/lib/systemd/systemd-journald

May 13 10:30:33 host01 systemd-journal[56691]: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 125.7G available  curr…imit 4.0G).
May 13 10:30:33 host01 systemd-journal[56691]: Journal started
May 13 09:04:48 host01 systemd[1]: Current command vanished from the unit file, execution of the command list won't be resumed.
May 13 09:04:48 host01 systemd[1]: Cannot add dependency job for unit systemd-journald.service, ignoring: Unit is masked.
May 13 09:04:48 host01 systemd[1]: Stopped systemd-journald.service.
Hint: Some lines were ellipsized, use -l to show in full.

systemd-journald服务介绍:

http://www.jinbuguo.com/systemd/systemd-journald.service.html

编辑推荐:
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· PostgreSQL 和 SQL Server 在统计信息维护中的关键差异
· C++代码改造为UTF-8编码问题的总结
阅读排行:
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· 【.NET】调用本地 Deepseek 模型
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
点击右上角即可分享
微信分享提示