11-Zabbix自动发现LLD实现进程使用CPU和内存监控 + 结合主动模式应该会更好
原文:https://blog.csdn.net/u013272009/article/details/90486079
Zabbix 自动发现(LLD)
LLD : Low-level discovery
官网文档: https://www.zabbix.com/documentation/4.0/manual/discovery/low_level_discovery
作用: 可以指定规则(rule),来达成不确定数量的监测项的自动配置生成
自定义 LLD 规则, 参见上官网文档中的 Creating custom LLD rules 节,比较有用
比如使用 zabbix 实现服务器进程 CPU 、 MEM 的使用情况,则使用 LLD 较为合适
实际例子
如上图, Server CPU all 图表,服务进程数量等开好时才确定。下次开可能又不一样
下面,使用 LLD 来实现上述图表
1. 编写获取服务名脚本
例如 kgetserver.sh :
#!/bin/bash echo '{"data":[' n0=`ps -aux | grep Server | grep -v grep | grep -v $0 | grep -v kgetcpu | grep -v kgetmem | grep -v tail | wc -l` ps -aux | grep Server | grep -v grep | grep -v $0 | grep -v kgetcpu | grep -v kgetmem | grep -v tail | awk -v n=$n0 '{printf "{\"{#PROCESSNAME}\":\"\\\"";for(i=11;i<=NF;i++){printf $i;if(i<NF)printf " "};printf "\\\"\"}";if(NR<n)printf ",";printf "\n"}' echo ']}'
执行可输出:
{"data":[ {"{#PROCESSNAME}":"\"./MgrServer.dbg\""}, {"{#PROCESSNAME}":"\"./LogServer_Ex.dbg\""}, {"{#PROCESSNAME}":"\"./RecordServer_Ex.dbg --stderrthreshold 0 --log_dir ../log -s 1\""}, {"{#PROCESSNAME}":"\"./RecordServer_Ex.dbg --stderrthreshold 0 --log_dir ../log -s 2\""}, {"{#PROCESSNAME}":"\"./RecordServer_Ex.dbg --stderrthreshold 0 --log_dir ../log -s 3\""}, {"{#PROCESSNAME}":"\"./RecordServer_Ex.dbg --stderrthreshold 0 --log_dir ../log -s 4\""}, {"{#PROCESSNAME}":"\"./RecordServer_Ex.dbg --stderrthreshold 0 --log_dir ../log -s 5\""}, {"{#PROCESSNAME}":"\"./RecordServer_Ex.dbg --stderrthreshold 0 --log_dir ../log -s 6\""}, {"{#PROCESSNAME}":"\"./ProxyServer_Ex.dbg\""}, {"{#PROCESSNAME}":"\"./RedisSyncServer_Ex.dbg -s 1\""}, {"{#PROCESSNAME}":"\"./RedisSyncServer_Ex.dbg -s 2\""}, {"{#PROCESSNAME}":"\"./RedisSyncServer_Ex.dbg -s 3\""}, {"{#PROCESSNAME}":"\"./RedisSyncServer_Ex.dbg -s 4\""}, {"{#PROCESSNAME}":"\"./RedisSyncServer_Ex.dbg -s 5\""}, {"{#PROCESSNAME}":"\"./RedisSyncServer_Ex.dbg -s 6\""}, {"{#PROCESSNAME}":"\"./RobotServer_Ex.dbg\""}, {"{#PROCESSNAME}":"\"./LoginServer.dbg --stderrthreshold 0 --log_dir ../log -s 1\""}, {"{#PROCESSNAME}":"\"./LoginServer.dbg --stderrthreshold 0 --log_dir ../log -s 2\""}, {"{#PROCESSNAME}":"\"./LoginServer.dbg --stderrthreshold 0 --log_dir ../log -s 3\""} ]}
本脚本就是 rule ,通过本脚本可以找到要监测的服务项
再比如 kgetproc.sh :
#!/bin/bash echo '{"data":[' n0=`ps -aux | grep $1 | grep -v grep | grep -v $0 | grep -v kgetcpu | grep -v kgetmem | grep -v tail | grep -v defunct | wc -l` ps -aux | grep $1 | grep -v grep | grep -v $0 | grep -v kgetcpu | grep -v kgetmem | grep -v tail | grep -v defunct | awk -v n=$n0 '{printf "{\"{#PROCESSNAME}\":\"\\\"";for(i=11;i<=NF;i++){printf $i;if(i<NF)printf " "};printf "\\\"\", \"{#PROCESSPID}\":";printf $2;printf ",\"{#PROCESSNO}\":"; printf NR; printf "}";if(NR<n)printf ",";printf "\n"}' echo ']}'
执行可输出:
[root@host-192-168-21-36 opt]# ./kgetproc.sh codis-server {"data":[ {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23790\"", "{#PROCESSPID}":28381,"{#PROCESSNO}":1}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23791\"", "{#PROCESSPID}":28486,"{#PROCESSNO}":2}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23792\"", "{#PROCESSPID}":28523,"{#PROCESSNO}":3}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23793\"", "{#PROCESSPID}":28576,"{#PROCESSNO}":4}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23794\"", "{#PROCESSPID}":28597,"{#PROCESSNO}":5}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23795\"", "{#PROCESSPID}":28633,"{#PROCESSNO}":6}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23796\"", "{#PROCESSPID}":28671,"{#PROCESSNO}":7}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23797\"", "{#PROCESSPID}":28707,"{#PROCESSNO}":8}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23798\"", "{#PROCESSPID}":28735,"{#PROCESSNO}":9}, {"{#PROCESSNAME}":"\"/home/fananchong/go/src/github.com/CodisLabs/codis/admin/../bin/codis-server 127.0.0.1:23799\"", "{#PROCESSPID}":28780,"{#PROCESSNO}":10} ]}
2. 编写获取某进程CPU 、 MEM 占用脚本
比如 kgetcpu.sh :
#!/bin/bash mypid=`ps aux | grep "$1" | grep -v grep | grep -v "$0" | grep -v tail | grep -v defunct | grep -v vi | awk '{print $2}'` getactive=`top -b n 1 | awk -v v=$mypid '{if($1==v){print $9};}'` if [[ -n $getactive ]]; then echo $getactive else echo "0" fi
比如 kgetmem.sh :
#!/bin/bash mypid=`ps aux | grep "$1" | grep -v grep | grep -v "$0" | grep -v tail | grep -v defunct | grep -v vi | awk '{print $2}'` getactive=`top -b n 1 | awk -v v=$mypid '{if($1==v){print $6};}'` if [[ ""$getactive != "" ]]; then if [[ ${getactive} =~ "g" ]];then getactive=${getactive%%g*} echo "$getactive*1024*1024" | bc else n=$[getactive*1024]; echo $n fi else echo "0" fi
以上脚本定义了每个监测项要监测的内容
3. 配置监测项
比如 /etc/zabbix/zabbix_agentd.d/userparameter_mygraph.conf :
UserParameter=myGraph.server_cpu[*],sudo /opt/kgetcpu.sh $1 UserParameter=myGraph.server_mem[*],sudo /opt/kgetmem.sh $1 UserParameter=myGraph.server_process[*],sudo /opt/kgetserver.sh UserParameter=myGraph.proc[*],sudo /opt/kgetproc.sh $1
重启服务
systemctl restart zabbix-agent.service
剩下的就是使用 zabbix frontend ,在页面上操作了
4. 模版(Templates)上创建 Discovery rule
类似上图
5. 模版(Templates)上创建 Item prototypes
类似上图
6. 模版(Templates)上创建 Graph prototype
类似上图
至此,所有监测项会自动生成
Host 上创建 Server CPU all 图形
(目前是手动创建的,按道理也可以自动生成。 有时间翻翻文档,再补上)