清除信号量队列导致zabbix自动关闭
前几天在海外UCloud机器上部署了一套zabbix proxy和zabbix agentd,可是第二天一大早就收到邮件说zabbix_proxy挂掉了,上去查一下发现两台机器中的一台的proxy和agentd都挂了,而另一台没事,再查一下log日志:
zabbix_agentd [12977]: [file:'cpustat.c',line:235] lock failed: [22] Invalid argument 12976:20150305:022001.966 One child process died (PID:12977,exitcode/signal:255). Exiting ... 12976:20150305:022003.967 Zabbix Agent stopped. Zabbix 2.0.13 (revision 48919). zabbix_proxy [12970]: [file:'selfmon.c',line:341] lock failed: [22] Invalid argument zabbix_proxy [12972]: [file:'selfmon.c',line:341] lock failed: [22] Invalid argument zabbix_proxy [12973]: [file:'selfmon.c',line:341] lock failed: [22] Invalid argument 12951:20150305:022001.362 One child process died (PID:12970,exitcode/signal:255). Exiting ... 12951:20150305:022003.365 syncing history data... zabbix_proxy [12951]: [file:'dbcache.c',line:2196] lock failed: [22] Invalid argument
第一感觉就是crontab跑了一个什么脚本,删除了啥东西导致的,果不其然,的确是删除了信号量导致的(关于信号量的介绍参看大牛博客 ipcs介绍 ),删除脚本如下:
#!/bin/sh for semid in `ipcs -s | cut -f2 -d" "` do ipcrm -s $semid done
这么粗暴的删除,不出事才怪呢,加个删除条件:
#!/bin/sh for semid in `ipcs -s | grep -v zabbix | cut -f2 -d" "` do ipcrm -s $semid done
再跑一下脚本,没问题啦 ^_^