zabbix异常处理之resuming Zabbix agent checks on host "192.168.200.38": connection restored
一、查看zabbix server端log
查看zabbix server日志发现这台主机的日志有大量报错信息"first network error"以及 another network error
12522:20200915:003129.375 resuming Zabbix agent checks on host "192.168.200.38": connection restored
12511:20200915:003211.039 Zabbix agent item "agent.hostname" on host "192.168.200.38" failed: first network error, wait for 15 seconds
12513:20200915:003226.526 Zabbix agent item "system.cpu.util[,system,avg15]" on host "192.168.200.38" failed: another network error, wait for 15 seconds
12526:20200915:003237.372 Zabbix agent item "vm.memory.size[total]" on host "192.168.200.38" failed: another network error, wait for 15 seconds
12526:20200915:003256.377 resuming Zabbix agent checks on host "192.168.200.38": connection restored
12511:20200915:003330.181 Zabbix agent item "system.swap.size[,free]" on host "192.168.200.38" failed: first network error, wait for 15 seconds
12526:20200915:003350.383 resuming Zabbix agent checks on host "192.168.200.38": connection restored
12515:20200915:003426.192 Zabbix agent item "system.cpu.util[,system,avg15]" on host "192.168.200.38" failed: first network error, wait for 15 seconds
12524:20200915:003441.390 resuming Zabbix agent checks on host "192.168.200.38": connection restored
12515:20200915:003520.333 Zabbix agent item "perf_counter[\2\16]" on host "192.168.200.38" failed: first network error, wait for 15 seconds
12520:20200915:003537.209 Zabbix agent item "vm.memory.size[total]" on host "192.168.200.38" failed: another network error, wait for 15 seconds
12526:20200915:003559.396 resuming Zabbix agent checks on host "192.168.200.38": connection restored
12513:20200915:003630.007 Zabbix agent item "system.swap.size[,free]" on host "192.168.200.38" failed: first network error, wait for 15 seconds
12526:20200915:003650.401 resuming Zabbix agent checks on host "192.168.200.38": connection restored
12514:20200915:003731.023 Zabbix agent item "system.swap.size[,pfree]" on host "192.168.200.38" failed: first network error, wait for 15 seconds
12524:20200915:003750.409 resuming Zabbix agent checks on host "192.168.200.38": connection restored
12510:20200915:003759.024 Zabbix agent item "net.if.out[Nutanix VirtIO Ethernet Adapter,bytes]" on host "192.168.200.38" failed: first network error, wait for 15 seconds
二、查看主机TCP连接
发现存在大量的TIME_WAIT连接
三、 百度查明原因
从系统启动,Windows Vista 中、 在 Windows 7 中,Windows Server 2008 中和在 Windows Server 2008 R2 中的 497 天后未关闭 TIME_WAIT 状态的所有 TCP/IP 端口
意思是说,系统启动的497天以后,所有在"TIME_WAIT"状态的TCP链接都不会被关闭。TCP端口逐渐被占用完,不能创建新的TCP/IP连接
四、解决方案
1、重启服务器
重启服务器可以暂时解决这个问题,但是运行497天,仍会会出现这个问题
2、安装补丁
微软官网公告地址:https://support.microsoft.com/zh-cn/help/2553549/all-the-tcp-ip-ports-that-are-in-a-time-wait-status-are-not-closed-aft
由于已经微软已经停止更新了,现在已经无法下载补丁包了,可以使用window update来更新补丁