LB层到Real Server之间访问请求的响应时间及HTTP状态码监控及报警设置

 

为了监控到各业务的访问质量,基于LB层的Nginx日志,实现LB层到Real Server之间访问请求的响应时间(即upstream_response_time)及HTTP状态码(即upstream_status)的监控及报警。操作记录如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
基本信息:
负载均衡采用的是Nginx+Keeplived
负载域名:bs7001.kevin-inc.com (有很多负载域名,这里用该域名作为示例)
日志:bs7001.kevin-inc.com-access.log
 
1)LB层Nginx的log_format日志格式的设置(可以参考:http://www.cnblogs.com/kevingrace/p/5893499.html)
[root@inner-lb01 ~]# cat /data/nginx/conf/nginx.conf
......
######
    ## set access log format
    ######
    log_format  main  '$remote_addr $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '$http_user_agent $http_x_forwarded_for $request_time $upstream_response_time $upstream_addr $upstream_status';
  
    #######
.....
 
2)监控及报警脚本设置
日志路径
[root@inner-lb01 ~]# ll /data/nginx/logs/bs7001.kevin-inc.com-access.log
-rw-r--r-- 1 root root 0 12月 13 17:00 /data/nginx/logs/bs7001.kevin-inc.com-access.log
 
sendemail安装配置(安装可参考:http://www.cnblogs.com/kevingrace/p/5961861.html)
[root@inner-lb01 ~]# cat /opt/sendemail.sh        //该脚本可直接拿过来使用
#!/bin/bash
# Filename: SendEmail.sh
# Notes: 使用sendEmail
#
# 脚本的日志文件
LOGFILE="/tmp/Email.log"
:>"$LOGFILE"
exec 1>"$LOGFILE"
exec 2>&1
SMTP_server='smtp.kevin.com'
username='notice@kevin.com'
password='notice@123'
from_email_address='notice@kevin.com'
to_email_address="$1"
message_subject_utf8="$2"
message_body_utf8="$3"
# 转换邮件标题为GB2312,解决邮件标题含有中文,收到邮件显示乱码的问题。
message_subject_gb2312=`iconv -t GB2312 -f UTF-8 << EOF
$message_subject_utf8
EOF`
[ $? -eq 0 ] && message_subject="$message_subject_gb2312" || message_subject="$message_subject_utf8"
# 转换邮件内容为GB2312,解决收到邮件内容乱码
message_body_gb2312=`iconv -t GB2312 -f UTF-8 << EOF
$message_body_utf8
EOF`
[ $? -eq 0 ] && message_body="$message_body_gb2312" || message_body="$message_body_utf8"
# 发送邮件
sendEmail='/usr/local/bin/sendEmail'
set -x
$sendEmail -s "$SMTP_server" -xu "$username" -xp "$password" -f "$from_email_address" -t "$to_email_address" -u "$message_subject" -m "$message_body" -o message-content-type=text -o message-charset=gb2312
 
 
[root@inner-lb01 ~]# cd /opt/lb_log_monit.sh/
[root@inner-lb01 lb_log_monit.sh]# ll
总用量 12
-rwxr-xr-x 1 root root 1180 2月   1 13:03 bs7001_request_status_monit.sh
-rwxr-xr-x 1 root root  821 2月   1 11:20 bs7001_request_time_monit_request.sh
-rwxr-xr-x 1 root root  559 2月   1 13:01 bs7001_request_time_monit.sh
 
 
访问请求的响应时间监控报警脚本(下面脚本中取日志文件中的第3、10列以及倒数第1、2、3列)
[root@inner-lb01 lb_log_monit.sh]# cat bs7001_request_time_monit.sh
#!/bin/bash
/usr/bin/tail -1000 /data/nginx/logs/bs7001.kevin-inc.com-access.log|awk '{print $3,$10,$(NF-2),$(NF-1),$(NF)}' > /root/lb_log_check/bs7001.kevin-inc.com-check.log
 
for i in `awk '{print $3}' /root/lb_log_check/bs7001.kevin-inc.com-check.log`
do
  a=$(printf "%f" `echo ${i}*1000|bc`|awk -F"." '{print $1}')
  b=$(printf "%f" `echo 1*1000|bc`|awk -F"." '{print $1}')
 
  if [ $a -ge $b ];then
     cat /root/lb_log_check/bs7001.kevin-inc.com-check.log |grep $i
  else
     echo "it is ok" >/dev/null 2>&1
  fi
done
 
[root@inner-lb01 lb_log_monit.sh]# cat bs7001_request_time_monit_request.sh
#!/bin.bash
/bin/bash -x /opt/lb_log_monit.sh/bs7001_request_time_monit.sh > /root/lb_log_check/bs7001.kevin-inc.com_request_time.log
 
NUM=`cat /root/lb_log_check/bs7001.kevin-inc.com_request_time.log|wc -l`
if [ $NUM != 0 ];then
  /bin/bash /opt/sendemail.sh wangshibo@kevin.com "从LB层访问bs7001.kevin-inc.com请求的响应时间" "响应时间已超过1秒钟!\n具体情况如下:\n`cat /root/lb_log_check/bs7001.kevin-inc.com_request_time.log`"
  /bin/bash /opt/sendemail.sh linan@kevin.com "从LB层访问bs7001.kevin-inc.com请求的响应时间" "响应时间已超过1秒钟!\n具体情况如下:\n`cat /root/lb_log_check/bs7001.kevin-inc.com_request_time.log`"
else
  echo "从LB层访问bs7001.kevin-inc.com请求的响应正常"
fi
 
[root@inner-lb01 lb_log_monit.sh]# ll /root/lb_log_check/
总用量 152
-rw-r--r-- 1 root root 147766 2月   1 15:00 bs7001.kevin-inc.com-check.log
-rw-r--r-- 1 root root    216 2月   1 15:00 bs7001.kevin-inc.com_request_time.log
 
 
访问的HTTP状态码监控报警脚本(500,502,503,504的状态码进行报警)
[root@inner-lb01 lb_log_monit.sh]# cat bs7001_request_status_monit.sh
#!/bin/bash
/usr/bin/tail -1000 /data/nginx/logs/bs7001.kevin-inc.com-access.log|awk '{print $3,$10,$(NF-2),$(NF-1),$(NF)}' > /root/lb_log_check/bs7001.kevin-inc.com-check.log
 
for i in `awk '{print $5}' /root/lb_log_check/bs7001.kevin-inc.com-check.log|sort|uniq`
do
  if [ ${i} = 500  ];then
    /bin/bash /opt/sendemail.sh wangshibo@kevin.com "从LB层访问bs7001.kevin-inc.com请求的HTTP状态返回码" "HTTP状态返回码:500\n具体情况如下:\n`cat /root/lb_log_check/bs7001.kevin-inc.com-check.log |grep ${i}`"
  elif [ ${i} = 502  ];then
    /bin/bash /opt/sendemail.sh wangshibo@kevin.com "从LB层访问bs7001.kevin-inc.com请求的HTTP状态返回码" "HTTP状态返回码:502\n具体情况如下:\n`cat /root/lb_log_check/bs7001.kevin-inc.com-check.log |grep ${i}`"
  elif [ ${i} = 503  ];then
    /bin/bash /opt/sendemail.sh wangshibo@kevin.com "从LB层访问bs7001.kevin-inc.com请求的HTTP状态返回码" "HTTP状态返回码:503\n具体情况如下:\n`cat /root/lb_log_check/bs7001.kevin-inc.com-check.log |grep ${i}`"
  else
     echo "it is ok"
  fi
done
 
3)结合crontab进行定时监控
[root@inner-lb01 lb_log_monit.sh]# crontab -l
#LB到后端服务器之间访问各系统业务的请求响应时间和http状态码监控
*/2 * * * * /bin/bash -x /opt/lb_log_monit.sh/bs7001_request_time_monit_request.sh >/dev/null 2>&1
*/2 * * * * /bin/bash -x /opt/lb_log_monit.sh/bs7001_request_status_monit.sh >/dev/null 2>&1
 
取对应log文件中的第3、10以及倒数第1、2、3列内容
[root@inner-lb01 lb_log_monit.sh]# /usr/bin/tail -10 /data/nginx/logs/bs7001.kevin-inc.com-access.log|awk '{print $3,$10,$(NF-2),$(NF-1),$(NF)}'
[01/Feb/2018:15:05:41 "http://bs7001.kevin-inc.com/I8QW/dataimport/qw/found_analysis_imp8.jsp" 0.002 192.168.1.22:7001 304
[01/Feb/2018:15:05:41 "http://bs7001.kevin-inc.com/I8QW/dataimport/qw/found_analysis_imp8.jsp" 0.001 192.168.1.22:7001 304
[01/Feb/2018:15:05:41 "http://bs7001.kevin-inc.com/I8QW/dataimport/qw/found_analysis_imp8.jsp" 0.002 192.168.1.22:7001 304
[01/Feb/2018:15:05:41 "http://bs7001.kevin-inc.com/I8QW/dataimport/qw/found_analysis_imp8.jsp" 0.001 192.168.1.22:7001 304
[01/Feb/2018:15:05:41 "http://bs7001.kevin-inc.com/I8QW/dataimport/qw/found_analysis_imp8.jsp" 0.001 192.168.1.22:7001 304
[01/Feb/2018:15:05:41 "http://bs7001.kevin-inc.com/I8QW/dataimport/qw/found_analysis_imp8.jsp" 0.002 192.168.1.22:7001 304
[01/Feb/2018:15:06:02 "http://bs7001.kevin-inc.com/portal/main_new.do" 0.006 192.168.1.21:7001 200
[01/Feb/2018:15:07:12 "http://bs7001.kevin-inc.com/portal/main_new.do" 0.003 192.168.1.22:7001 200
[01/Feb/2018:15:07:51 "http://bs7001.kevin-inc.com/portal/main_new.do" 0.003 192.168.1.21:7001 200
[01/Feb/2018:15:07:57 "http://bs7001.kevin-inc.com/portal/main_new.do" 0.007 192.168.1.22:7001 200

posted @   散尽浮华  阅读(2343)  评论(0编辑  收藏  举报
编辑推荐:
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 三行代码完成国际化适配,妙~啊~
· .NET Core 中如何实现缓存的预热?
点击右上角即可分享
微信分享提示