monit监控-god监控

两个自动化运维的监控软件,留底备用。先记录一下:

 

god

ruby编写,据说小米在用。

http://godrb.com/

 

monit

Python编写,正在IDOL系统中试用。

https://mmonit.com/monit/documentation/monit.html

 

附上monit的配置文件以及exec脚本:

在/monit/etc/monit.d目录下,名称为:服务进程名.monit

check process content-hist-b with pidfile /app/cfg/content-hist-b/content.pid
start program = "/app/etc/init.d/content-hist-b start"
as uid idol and gid users
stop program = "/app/etc/init.d/content-hist-b stop"
as uid idol and gid users
if failed host 127.0.0.1 port 9500 then restart
if 3 restarts within 3 cycles then exec "/usr/local/monit/bin/restart.sh"
as uid idol and gid users

ver1:

#!/bin/bash

Condir=/app/cfg/content-hist-b

Startfile=/app/etc/init.d/content-hist-b

Pid=content-hist-b

 

echo "D&Gby1900d129" |sudo -S /usr/sbin/sysctl -w vm.drop_caches=3;

count=`pgrep -f $Pid` && echo $count;

if [ -n "$count" ]; then

                sleep 1;
                echo "Prcoess is busy";
                kill -9 $count;
                sleep 60;
                cd $Condir && sh clean.sh && $Startfile restart;     

      else

                echo "Prcoess is stop";
                cd $Condir && sh clean.sh && $Startfile restart;

      fi
exit;

 

ver2:

#!/bin/bash

Condir=/app/cfg/content-hist-b

Startfile=/app/etc/init.d/content-hist-b

Pid=content-hist-b

 

sysctl -w vm.drop_caches=3;
count=`pgrep -f $Pid` && echo $count;

if [ -n "$count" ]; then
                sleep 1;
                echo "Prcoess is busy";
                kill -9 $count;
                sleep 30;
                su - idol -c "cd $Condir && sh clean.sh && $Startfile restart";
        else
                echo "Prcoess is stop";
                su - idol -c "cd $Condir && sh clean.sh && $Startfile restart";
               
        fi

exit;

 

不过使用之后感觉不是很靠谱,所以又写了一份shell判断监控端口连通性,并自动重启的脚本。

如下:

portdd.sh

#!/bin/bash

state="succeeded!"

Condir=/app/cfg/content-shortterm-9000

Startfile=/app/etc/init.d/content-shortterm-9000

Pid=content-shortterm-9000

DATAFILE=/app/data/content-shortterm-9000

DATE=`date +%Y-%m-%d-%H:%M`

 

while :;
do
count=`pgrep -f $Pid`;
port=$(nc -vz -w 10 192.168.5.137 9000 |awk '{print $7}');
    if [ "$port"x = "$state"x ]; then
        sleep 1;
        echo "process is ok" ;
    else
       sleep 1;
       echo "Prcoess is bad busy";
       kill -9 $count;
       sleep 60;
       mv $DATAFILE $DATAFILE$DATE;
       su idol -c "cd $Condir && sh clean.sh && $Startfile restart";
       echo "process is restart complete";
    fi

sysctl -w vm.drop_caches=3;
sleep 3600;
done

上面是用netcat命令对端口状态进行监测的,也可以用nmap对端口状态进行监测 “nmap 192.168.155.249 -p 22 | grep 22”,对于没有端口通信的应用进程,还可以用stat命令判断应用进程日志文件的更新时间间隔来进行筛查,并对其进行操作。

 

posted @ 2015-12-14 14:40  夜岚の馨语  阅读(562)  评论(0编辑  收藏  举报