prometheus 最新 pushgateway 企业级实际使用
转载自博客:https://blog.csdn.net/shm19990131/article/details/107221127
pushgateway 自定义监控项
一、TCP 等待连接监控
出于各种 wait 状态的 TCP 连接,是作为运维平日排查(网络负载、服务器负载、DB)的一个重要指标。
netstat (Close_wait,time_wait)
一般当 wait 类型的TCP过大时,说明系统网络(流量负载出现问题)
vim /usr/local/node_exporter/shell/tcp_connection.sh
#!/bin/bash instance_name=`hostname -f` if [ $instance_name == "localhost" ]; then echo "Must FQDN hostname" exit fi label_1="count_netstat_wait_connections" label_2="count_netstat_listen_connections" label_3="count_netstat_established_connections" label_4="count_netstat_total_connections" count_netstat_wait_connections=`netstat -anpt | grep -i 'TIME[_-]WAIT' | wc -l` count_netstat_listen_connections=`netstat -anpt | grep -i 'LISTEN' | wc -l` count_netstat_established_connections=`netstat -anpt | grep -i 'ESTABLISHED' | wc -l` count_netstat_total_connections=`netstat -anpt | wc -l` echo "$label_1:$count_netstat_wait_connections" echo "$label_2:$count_netstat_listen_connections" echo "$label_3:$count_netstat_established_connections" echo "$label_4:$count_netstat_total_connections" echo "$label_1 $count_netstat_wait_connections" | curl --data-binary @- http://192.168.168.11:9091/metrics/job/pushgateway/instance/$intance_name echo "$label_2 $count_netstat_listen_connections" | curl --data-binary @- http://192.168.168.11:9091/metrics/job/pushgateway/instance/$intance_name echo "$label_3 $count_netstat_established_connections" | curl --data-binary @- http://192.168.168.11:9091/metrics/job/pushgateway/instance/$intance_name echo "$label_4 $count_netstat_total_connections" | curl --data-binary @- http://192.168.168.11:9091/metrics/job/pushgateway/instance/$intance_name
二、网络丢包率
企业中 实际使用的对服务器内网流量 ping延迟和丢包率
lostpk:丢包率
rrt:网络延迟
ping :
-q:不显示指令执行过程,开头和结尾的相关信息除外
-A:确定ping 的速度
-s:设定 icmp 包的大小是500M
-W:延迟最大等待时间,1000=1秒
-c:完成次数,共发送100个 icmp 包,然后停止
[root@node1 ~]# timeout 5 ping -q -A -s 500 -W 1000 -c 100 node2
PING node2 (192.168.168.12) 500(528) bytes of data.
--- node2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 27ms
rtt min/avg/max/mdev = 0.117/0.203/1.113/0.182 ms, ipg/ewma 0.278/0.125 ms
发出去的包数,返回的报数,丢包率,耗时时间
最小/平均/最大响应时间,和本机硬件消耗时间
[root@node1 ~]# timeout 5 ping -q -A -s 500 -W 1000 -c 100 node2 PING node2 (192.168.168.12) 500(528) bytes of data. --- node2 ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 27ms rtt min/avg/max/mdev = 0.118/0.200/1.270/0.179 ms, ipg/ewma 0.279/0.134 ms 发出去的包数,返回的报数,丢包率,耗时时间 最小/平均/最大响应时间,和本机硬件消耗时间
丢包率、延迟的 pushgateway 脚本
#!/bin/bash #ping IP地址,生产环境指向对外开放服务的域名或者IP instance_name=`hostname -f` host=baidu.com lostpk=`ping -q -A -s 500 -W 1000 -c 100 "$host" | grep transmitted | awk '{print $6}'` rrt=`ping -q -A -s 500 -W 1000 -c 100 "$host" | grep transmitted | awk '{print $10}'` value_lostpk=`echo $lostpk | awk -F"%" '{print $1}'` value_rrt=`echo $rrt | awk -F"ms" '{print $1}'` echo "lostpk_"$instance_name"_to_baidu:$value_lostpk" echo "lostpk_"$instance_name"_to_baidu $value_lostpk" | curl --data-binary @- http://192.168.168.11:9091/metrics/job/pushgateway/instance/localhost:9091 echo "rrt_"$instance_name"_to_baidu:$value_rrt" echo "rrt_"$instance_name"_to_baidu $value_rrt" | curl --data-binary @- http://192.168.168.11:9091/metrics/job/pushgateway/instance/localhost:9091
posted on 2024-08-21 16:37 luzhouxiaoshuai 阅读(16) 评论(0) 编辑 收藏 举报