Linux 系统出现异常排查思路
16 系统出现异常排查思路
16.1 查看用户信息
16.1.1查看当前的用户
# who
04:39:39 up 1:30, 1 user, load average: 0.01, 0.01, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 192.168.215.1 04:27 0.00s 0.16s 0.02s w
16.1.2查看最近登录的用户
# last
***************
root pts/2 hadoop2 Sun Oct 16 15:52 - 15:52 (00:00)
root pts/1 192.168.215.1 Sun Oct 16 15:39 - down (00:23)
hadoop pts/0 :0.0 Sun Oct 16 00:33 - down (15:30)
hadoop tty1 :0 Sun Oct 16 00:31 - down (15:31)
reboot system boot 2.6.32-573.el6.x Sun Oct 16 08:16 - 16:03 (07:47)
16.2 查看直线执行的命令
# history
***************
683 last
684 clear
685 last
686 clear
687 history
16.3查看现在运行的进程
# pstree -a
init
├─NetworkManager --pid-file=/var/run/NetworkManager/NetworkManager.pid
├─abrtd
├─acpid
├─atd
├─auditd
│ └─{auditd}
├─bonobo-activati --ac-activate --ior-output-fd=12
*******************
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 19352 1544 ? Ss 03:09 0:02 /sbin/init
root 2 0.0 0.0 0 0 ? S 03:09 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 03:09 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S 03:09 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S 03:09 0:00 [stopper/0]
16.4查看网络服务的进程
16.4.1查看正在运行的端口
# netstat -nltl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN
tcp 0 0 :::2181 :::* LISTEN
tcp 0 0 :::37129 :::* LISTEN
16.4.2正在活跃的端口
# netstat -nulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 0.0.0.0:631 0.0.0.0:* 2089/cupsd
16.4.3 查看UNIX活跃的端口
# netstat -nxlp
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 13954 2136/hald @/var/run/hald/dbus-WAkpL6y5o7
unix 2 [ ACC ] STREAM LISTENING 16245 2614/gnome-session @/tmp/.ICE-unix/2614
unix 2 [ ACC ] STREAM LISTENING 15966 2524/Xorg @/tmp/.X11-unix/X0
unix 2 [ ACC ] STREAM LISTENING 13947 2136/hald @/var/run/hald/dbus-QUMwKtSaJ5
unix 2 [ ACC ] STREAM LISTENING 13818 2089/cupsd /var/run/cups/cups.sock
*********************
16.5查看CPU与内存
16.5.1查看空闲的内存以及内存与硬盘之间的SWAP
# free -m
total used free shared buffers cached
Mem: 1862 475 1386 1 27 202
-/+ buffers/cache: 245 1616
Swap: 2047 0 2047
# free -g
总计 已用 空闲 共享 缓冲/缓存 可用
内存: 15 7 1 0 6 6
交换: 1 0 1
16.6查看运行的详细信息
# uptime
04:59:59 up 1:50, 1 user, load average: 0.00, 0.00, 0.00
当前时间 04:59:59
系统已运行的时间 1:50
当前在线用户 1 user
平均负载:0.00, 0.00, 0.00,最近1分钟、5分钟、15分钟系统的负载
16.7动态查看运行的内存,CPU等信息
# top
top - 12:26:46 up 16:21, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 82 total, 1 running, 81 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.7%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 1895288k total, 665188k used, 1230100k free, 20628k buffers
Swap: 2097144k total, 0k used, 2097144k free, 80392k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2269 root 20 0 15056 1080 832 R 2.0 0.1 0:00.01 top
1 root 20 0 19356 1536 1228 S 0.0 0.1 0:01.81 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:01.13 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.14 watchdog/0
7 root 20 0 0 0 0 S 0.0 0.0 0:41.30 events/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
***********************
16.8 硬件信息
16.8.1系统中所有PCI总线设备或连接到该总线上的所有设备
# lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 01)
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
16.8.2查看硬件方面的信息
# ethtool eth0
*******************
Handle 0x0229, DMI type 33, 31 bytes
64-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x022A, DMI type 126, 4 bytes
Inactive
Handle 0x022B, DMI type 127, 4 bytes
End Of Table
16.9 IO的性能
16.9.1 查看磁盘的使用情况
# iostat
Linux 2.6.32-573.el6.x86_64 (hadoop1) 10/21/2016 _x86_64_(1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.17 0.00 0.56 2.15 0.00 97.11
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.49 75.27 10.68 645224 91568
16.9.2 动态的查看服务器的状态值
# vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 1322196 30688 298892 0 0 37 5 39 57 0 1 97 2 0
0 0 0 1322140 30688 298920 0 0 0 0 57 84 1 1 99 0 0
*********************
16.9.3实时的对系统的监控
# mpstat 2 10
Linux 2.6.32-573.el6.x86_64 (hadoop1) 10/21/2016 _x86_64_(1 CPU)
05:37:26 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
05:37:28 AM all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:37:30 AM all 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50
05:37:32 AM all 0.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 99.50
*********************
16.9.4动态显示当前的操作IO的进程
# yum -y install dstat
# dstat --top-io --top-bio
----most-expensive---- ----most-expensive----
i/o process | block i/o process
bash 53k 316B|init 19k 198B
sshd: root@ 301B 340B|tpvmlpd2 0 4096B
sshd: root@ 136B 180B|jbd2/sda2-8 0 56k
16.10文件系统以及外接磁盘的信息
16.10.1查看当前的挂在的设备
# mount
/dev/sda2 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/sda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
vmware-vmblock on /var/run/vmblock-fuse type fuse.vmware-vmblock (rw,nosuid,nodev,default_permissions,allow_other)
16.10.2查看是否有专用的文件系统
打开一下文件进行编辑
# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Sun Oct 16 07:55:57 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=b89c0aae-3284-4835-9b1b-04986146cd96 / ext4 defaults 1 1
UUID=a1313d92-6873-402d-95a6-add6cd1321c6 /boot ext4 defaults 1 2
UUID=6a5cde98-2fc5-4d8f-976c-92acb39ab2a9 swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
16.10.3查看文件系统的挂在的选项
# vgs
16.10.4查看物理卷的信息
# pvs
16.11查看磁盘的剩余情况
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 18G 6.2G 11G 38% /
tmpfs 932M 72K 932M 1% /dev/shm
/dev/sda1 283M 41M 228M 16% /boot
16.12列出当前系统打开文件的工具
# lsof +D / /* beware not to kill your box */
***************
lsof 3907 root mem REG 8,2 22536 265965 /lib64/libdl-2.12.so
lsof 3907 root mem REG 8,2 1926480 265960 /lib64/libc-2.12.so
lsof 3907 root mem REG 8,2 124624 265966 /lib64/libselinux.so.1
lsof 3907 root mem REG 8,2 99158576 394281 /usr/lib/locale/locale-archive
16.12 内核与网络
16.12.1显示在/proc/sys目录中的内核参数
**************
net.ipv6.nf_conntrack_frag6_high_thresh = 4194304
net.ipv6.ip6frag_secret_interval = 600
net.ipv6.mld_max_msf = 64
net.nf_conntrack_max = 65536
net.unix.max_dgram_qlen = 10
abi.vsyscall32 = 1
crypto.fips_enabled = 0
16.12.2 显示设备的详细信息
irq的序号, 在各自cpu上发生中断的次数,可编程中断控制器,设备名称(request_irq的dev_name字段)
# cat /proc/interrupts
CPU0
0: 261 IO-APIC-edge timer
1: 8 IO-APIC-edge i8042
4: 4838 IO-APIC-edge
8: 1 IO-APIC-edge rtc0
9: 0 IO-APIC-fasteoi acpi
查看链接数据库的信息
# cat /proc/net/ip_conntrack /* may take some time on busy servers */
**************
cat: sys/: Is a directory
cat: tmp/: Is a directory
cat: usr/: Is a directory
cat: var/: Is a directory
16.13查看网络套接字连接情况
# netstat
************
unix 3 [ ] STREAM CONNECTED 13648
unix 3 [ ] STREAM CONNECTED 13647
unix 3 [ ] DGRAM 10073
unix 3 [ ] DGRAM 10072
16.14获取socket统计信息
# ss -s
Total: 602 (kernel 610)
TCP: 15 (estab 4, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 8
Transport Total IP IPv6
* 610 - -
RAW 0 0 0
UDP 1 1 0
TCP 15 5 10
INET 16 6 10
FRAG 0 0 0
16.15日志消息与内核信息的查看
16.15.1 显示linux内核的环形缓冲区信息
# dmesg [ tail / less / grep / more ]
*************
eth0: no IPv6 routers present
lp: driver loaded but no devices found
ppdev: user-space parallel port driver
hrtimer: interrupt took 2588670 ns
16.15.2查看系统报错日志
# less /var/log/messages
Oct 16 08:16:22 localhost kernel: imklog 5.8.10, log source = /proc/kmsg started.
Oct 16 08:16:22 localhost rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1604" x-info="http://www.rsyslog.com"] start
Oct 16 08:16:22 localhost kernel: Initializing cgroup subsys cpuset
Oct 16 08:16:22 localhost kernel: Initializing cgroup subsys cpu
*************
16.15.3 安全信息和系统登录与网络连接的信息
# less /var/log/secure
Oct 16 08:17:06 localhost sshd[8287]: Server listening on 0.0.0.0 port 22.
Oct 16 08:17:06 localhost sshd[8287]: Server listening on :: port 22.
Oct 16 00:22:58 localhost polkitd(authority=local): Registered Authentication Agent for session /org/freedesktop/ConsoleKit/Session1 (system bus name :1.25 [/usr/libexec/polkit-gnome-authentication-agent-1], object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
********************
16.16查看定时的任务
16.16.1查看定时任务的运行频率
# ls /etc/cron* + cat
/etc/cron.daily:
cups logrotate makewhatis.cron mlocate.cron prelink readahead.cron tmpwatch
/etc/cron.hourly:
0anacron
/etc/cron.monthly:
readahead-monthly.cron
/etc/cron.weekly:
16.1.2 查看用户是否执行了隐藏的命令
# for user in $(cat /etc/passwd | cut -f1 -d:); do crontab -l -u $user; done
no crontab for root
no crontab for bin
no crontab for daemon
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?