nagios系列(四)之nagios主动方式监控tcp常用的80/3306等端口监控web/syncd/mysql及url服务
nagios主动方式监控tcp服务web/syncd/mysql及url
cd /usr/local/nagios/libexec/
[root@node4 libexec]# ./check_tcp -H 192.168.8.40 -p 80
TCP OK - 0.010 second response time on port 80|time=0.010334s;;;0.000000;10.000000
[root@node4 libexec]# ./check_tcp -H 192.168.8.198 -p 8888
TCP OK - 0.002 second response time on port 8888|time=0.001964s;;;0.000000;10.000000
# ./check_tcp -H 192.168.8.198 -p 22
TCP OK - 0.002 second response time on port 22|time=0.001633s;;;0.000000;10.000000
1.添加关于url链接的监控
①添加自定义的servies目录
cd /usr/local/nagios/etc/objects/
# mkdir services
[root@node4 objects]# chown -R nagios.nagios services
编辑/usr/local/nagios/etc/nagios.cfg文件
添加配置
作为别用增加一个services目录,优点很多,在目录下只要符合*.cfg就可以被nagios加载,使用脚本批量部署时非常方便的随机命名配置文件
②在services目录下添加:blog.cfg文件
文件内容可以参考/usr/local/nagios/etc/objects/templates.cfg文件的service部分
# cat services/blog.cfg
define service{
use generic-service
host_name centossz008
service_description blog_url
check_command check_weburl!-I 192.168.8.40
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
③在commands.cfg添加关于checkweburl的命令
# 'check_weburl" command definition
define command{
command_name check_weburl
command_line $USER1$/check_http $ARG1$ -w 10 -c 30
}
检测语法service nagios checkconfig,重载nagios配置service nagios reload
2.对特殊url地址的监控
添加对blog.chinasoft.com的解析
/etc/hosts
192.168.8.40 blog.chinasoft.com
# curl -I http://blog.chinasoft.com
HTTP/1.1 200 OK
Date: Fri, 22 Jul 2016 01:55:44 GMT
Server: Apache/2.4.9 (Unix)
X-Powered-By: PHP/5.4.26
Content-Type: text/html
注意:类似/pma/index.php的这种uri需要添加双引号
define service{
use generic-service
host_name centossz008
service_description phpadmin_url
check_command check_weburl!-H blog.chinasoft.com -u "/pma/index.php"
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
在客户机上执行破坏性操作,会提示错误
/web/a.com/htdocs/pma
# mv index.php index.php.bak
phpadmin_url
WARNING 07-22-2016 10:27:160d 0h 22m 2s3/3HTTP WARNING: HTTP/1.1 404 Not Found - 217 bytes in 0.006 second response time
3.对tcp服务的监控,可以通过监控端口实现
①对rsyncd服务监控
在客户端node3.chinasoft.com执行:
# touch /etc/rsyncd.conf
[root@node3 ~]# rsync --daemon
[root@node3 ~]# lsof -i :873
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsync 59285 root 4u IPv4 237676 0t0 TCP *:rsync (LISTEN)
rsync 59285 root 5u IPv6 237677 0t0 TCP *:rsync (LISTEN)
vim /usr/local/nagios/etc/objects/services/blog.cfg
define service{
use generic-service
host_name node3.chinasoft.com
service_description sync_port
check_command check_tcp!837
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
# ../../libexec/check_tcp -H 192.168.8.41 -p 873
TCP OK - 0.001 second response time on port 873|time=0.001093s;;;0.000000;10.000000
发现是定义的端口错误应该是873不是837:
check_command check_tcp!873
②对mysql的3306端口监控
vim /usr/local/nagios/etc/objects/services/blog.cfg
define service{
use generic-service
host_name centossz008
service_description mysql_port
check_command check_tcp!3306
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
4.被动方式监控web的80端口
在被监控端即192.168.8.40的nrpe.cfg中添加命令定义:
command[check_port_80]=/usr/local/nagios/libexec/check_tcp -H 192.168.8.40 -p 80 -w 6 -c 10
define service{
use generic-service
host_name centossz008
service_description blog_port_80_beidong
check_command check_nrpe!80
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
cat /usr/local/nagios/etc/objects/services/servergroup.cfg
define servicegroup{
servicegroup_name Swap Useage
alias Linux Servers
members node3.chinasoft.com,Swap Useage,node4.chinasoft.com,Swap Useage,centossz008,Swap Useage
}
报错:
Error: Could not find a service matching host name 'node3.chinasoft.com' and description 'Swap Useage' (config file '/usr/local/nagios/etc/objects/services/servergroup.cfg', starting on line 1)
/usr/local/nagios/etc/objects/servie.cfg中没有定义check_swap或者描述description 'Swap Useage'不一致,如下图:
备注:
可以通过# tail /usr/local/nagios/var/nagios.log来定位错误
cd /usr/local/nagios/libexec/
[root@node4 libexec]# ./check_tcp -H 192.168.8.40 -p 80
TCP OK - 0.010 second response time on port 80|time=0.010334s;;;0.000000;10.000000
[root@node4 libexec]# ./check_tcp -H 192.168.8.198 -p 8888
TCP OK - 0.002 second response time on port 8888|time=0.001964s;;;0.000000;10.000000
# ./check_tcp -H 192.168.8.198 -p 22
TCP OK - 0.002 second response time on port 22|time=0.001633s;;;0.000000;10.000000
1.添加关于url链接的监控
①添加自定义的servies目录
cd /usr/local/nagios/etc/objects/
# mkdir services
[root@node4 objects]# chown -R nagios.nagios services
编辑/usr/local/nagios/etc/nagios.cfg文件
添加配置
cfg_dir=/usr/local/nagios/etc/objects/services
作为别用增加一个services目录,优点很多,在目录下只要符合*.cfg就可以被nagios加载,使用脚本批量部署时非常方便的随机命名配置文件
②在services目录下添加:blog.cfg文件
文件内容可以参考/usr/local/nagios/etc/objects/templates.cfg文件的service部分
# cat services/blog.cfg
define service{
use generic-service
host_name centossz008
service_description blog_url
check_command check_weburl!-I 192.168.8.40
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
③在commands.cfg添加关于checkweburl的命令
# 'check_weburl" command definition
define command{
command_name check_weburl
command_line $USER1$/check_http $ARG1$ -w 10 -c 30
}
检测语法service nagios checkconfig,重载nagios配置service nagios reload
2.对特殊url地址的监控
添加对blog.chinasoft.com的解析
/etc/hosts
192.168.8.40 blog.chinasoft.com
# curl -I http://blog.chinasoft.com
HTTP/1.1 200 OK
Date: Fri, 22 Jul 2016 01:55:44 GMT
Server: Apache/2.4.9 (Unix)
X-Powered-By: PHP/5.4.26
Content-Type: text/html
注意:类似/pma/index.php的这种uri需要添加双引号
define service{
use generic-service
host_name centossz008
service_description phpadmin_url
check_command check_weburl!-H blog.chinasoft.com -u "/pma/index.php"
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
在客户机上执行破坏性操作,会提示错误
/web/a.com/htdocs/pma
# mv index.php index.php.bak
phpadmin_url
WARNING 07-22-2016 10:27:160d 0h 22m 2s3/3HTTP WARNING: HTTP/1.1 404 Not Found - 217 bytes in 0.006 second response time
3.对tcp服务的监控,可以通过监控端口实现
①对rsyncd服务监控
在客户端node3.chinasoft.com执行:
# touch /etc/rsyncd.conf
[root@node3 ~]# rsync --daemon
[root@node3 ~]# lsof -i :873
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsync 59285 root 4u IPv4 237676 0t0 TCP *:rsync (LISTEN)
rsync 59285 root 5u IPv6 237677 0t0 TCP *:rsync (LISTEN)
vim /usr/local/nagios/etc/objects/services/blog.cfg
define service{
use generic-service
host_name node3.chinasoft.com
service_description sync_port
check_command check_tcp!837
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
# ../../libexec/check_tcp -H 192.168.8.41 -p 873
TCP OK - 0.001 second response time on port 873|time=0.001093s;;;0.000000;10.000000
发现是定义的端口错误应该是873不是837:
check_command check_tcp!873
②对mysql的3306端口监控
vim /usr/local/nagios/etc/objects/services/blog.cfg
define service{
use generic-service
host_name centossz008
service_description mysql_port
check_command check_tcp!3306
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
4.被动方式监控web的80端口
在被监控端即192.168.8.40的nrpe.cfg中添加命令定义:
command[check_port_80]=/usr/local/nagios/libexec/check_tcp -H 192.168.8.40 -p 80 -w 6 -c 10
编辑主服务器的文件
vim /usr/local/nagios/etc/objects/services/blog.cfgdefine service{
use generic-service
host_name centossz008
service_description blog_port_80_beidong
check_command check_nrpe!80
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options w,u,c,r
notification_interval 30
}
发现报错:
排错:
服务端执行
# /usr/local/nagios/libexec/check_tcp -H 192.168.8.40 -p 80 -w 6 -c 10
TCP OK - 0.002 second response time on port 80|time=0.001845s;6.000000;10.000000;0.000000;10.000000
# /usr/local/nagios/libexec/check_nrpe -H 192.168.8.40 -c check_port_80
NRPE: Command 'check_port_80' not defined
原来是定义反了,需要在客户端的nrpe.cfg文件中定义check_port_80命令才OK
cat /usr/local/nagios/etc/objects/services/servergroup.cfg
define servicegroup{
servicegroup_name Swap Useage
alias Linux Servers
members node3.chinasoft.com,Swap Useage,node4.chinasoft.com,Swap Useage,centossz008,Swap Useage
}
报错:
Error: Could not find a service matching host name 'node3.chinasoft.com' and description 'Swap Useage' (config file '/usr/local/nagios/etc/objects/services/servergroup.cfg', starting on line 1)
/usr/local/nagios/etc/objects/servie.cfg中没有定义check_swap或者描述description 'Swap Useage'不一致,如下图:
备注:
可以通过# tail /usr/local/nagios/var/nagios.log来定位错误