nagios部署
nagios的服务端部署
1、配置yum源,配置成阿里的yum源
cd /etc/yum.repos.d/ wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
2、配置环境变量
echo "export LC_ALL=C">>/etc/profile source /etc/profile
3、停掉iptables selinux
/etc/init.d/iptables stop chkconfig iptables off setenforce 0 cat /etc/selinux/config SELINUX=disabled
4、配置时间同步的定时任务
[root@nagios-srv nrpe-2.12]# crontab -l ####################### */5 * * * * /usr/sbin/ntpdate time.nist.gov >/dev/null 2>&1
5、安装nagios服务支持包
yum install gcc glibc glibc-common gd gd-devel httpd php php-gd mysql* -y
6、添加nagios用户和组
useradd -m nagios #useradd apache #yum安装httpd,不需要再创建apache用户,yum时会生成 groupadd nagcmd usermod -a -G nagcmd nagios usermod -a -G nagcmd apache
7、下载和安装nagios
cd /home/oldboy/tools/ tar xf nagios-3.5.1.tar.gz cd nagios ./configure --with-command-group=nagcmd make all make install make install-init make install-config make install-commandmode make install-webconf
8、添加web认证
由于在 make install-config的时候生成nagios配置文件为/etc/httpd/conf.d/nagios.conf,此配置文件指定的web认证文件为/usr/local/nagios/etc/htpasswd.users,所以要在这里添加web认证,查看nagios配置文件如下:
[root@nagios-srv nrpe-2.12]# cat /etc/httpd/conf.d/nagios.conf <Directory "/usr/local/nagios/share"> # SSLRequireSSL Options None AllowOverride None Order allow,deny Allow from all # Order deny,allow # Deny from all # Allow from 127.0.0.1 AuthName "Nagios Access" AuthType Basic AuthUserFile /usr/local/nagios/etc/htpasswd.users Require valid-user </Directory>
所以添加web认证为:
htpasswd -cb /usr/local/nagios/etc/htpasswd.users oldboy 123456
9、安装nagios插件 nagios-plugins
yum -y install perl-devel tar xf nagios-plugins-1.4.16.tar.gz cd nagios-plugins-1.4.16 ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules make && make install
安装完nagios插件后,可以查看到底安装了多少插件:ls /usr/local//nagios/libexec/|wc -l
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
[root@nagios-srv nrpe-2.12]# ll /usr/local//nagios/libexec/ total 5748 -rwxr-xr-x 1 nagios nagios 376556 Oct 12 22:33 check_apt -rwxr-xr-x 1 nagios nagios 2245 Oct 12 22:33 check_breeze -rwxr-xr-x 1 nagios nagios 128328 Oct 12 22:33 check_by_ssh lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_clamd -> check_tcp -rwxr-xr-x 1 nagios nagios 85726 Oct 12 22:33 check_cluster -r-sr-xr-x 1 root nagios 123643 Oct 12 22:33 check_dhcp -rwxr-xr-x 1 nagios nagios 121650 Oct 12 22:33 check_dig -rwxr-xr-x 1 nagios nagios 417927 Oct 12 22:33 check_disk -rwxr-xr-x 1 nagios nagios 9148 Oct 12 22:33 check_disk_smb -rwxr-xr-x 1 nagios nagios 129515 Oct 12 22:33 check_dns -rwxr-xr-x 1 nagios nagios 80721 Oct 12 22:33 check_dummy -rwxr-xr-x 1 nagios nagios 3056 Oct 12 22:33 check_file_age -rwxr-xr-x 1 nagios nagios 6318 Oct 12 22:33 check_flexlm lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_ftp -> check_tcp -rwxr-xr-x 1 nagios nagios 520646 Oct 12 22:33 check_http -r-sr-xr-x 1 root nagios 133729 Oct 12 22:33 check_icmp -rwxr-xr-x 1 nagios nagios 93440 Oct 12 22:33 check_ide_smart -rwxr-xr-x 1 nagios nagios 15137 Oct 12 22:33 check_ifoperstatus -rwxr-xr-x 1 nagios nagios 12601 Oct 12 22:33 check_ifstatus lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_imap -> check_tcp -rwxr-xr-x 1 nagios nagios 6890 Oct 12 22:33 check_ircd lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_jabber -> check_tcp -rwxr-xr-x 1 nagios nagios 106605 Oct 12 22:33 check_load -rwxr-xr-x 1 nagios nagios 6020 Oct 12 22:33 check_log -rwxr-xr-x 1 nagios nagios 20287 Oct 12 22:33 check_mailq -rwxr-xr-x 1 nagios nagios 93174 Oct 12 22:33 check_mrtg -rwxr-xr-x 1 nagios nagios 92511 Oct 12 22:33 check_mrtgtraf -rwxr-xr-x 1 nagios nagios 129444 Oct 12 22:33 check_mysql -rwxr-xr-x 1 nagios nagios 122426 Oct 12 22:33 check_mysql_query -rwxr-xr-x 1 nagios nagios 105638 Oct 12 22:33 check_nagios lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_nntp -> check_tcp lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_nntps -> check_tcp -rwxrwxr-x 1 nagios nagios 76752 Oct 12 22:40 check_nrpe -rwxr-xr-x 1 nagios nagios 127711 Oct 12 22:33 check_nt -rwxr-xr-x 1 nagios nagios 130102 Oct 12 22:33 check_ntp -rwxr-xr-x 1 nagios nagios 119191 Oct 12 22:33 check_ntp_peer -rwxr-xr-x 1 nagios nagios 117760 Oct 12 22:33 check_ntp_time -rwxr-xr-x 1 nagios nagios 159404 Oct 12 22:33 check_nwstat -rwxr-xr-x 1 nagios nagios 8324 Oct 12 22:33 check_oracle -rwxr-xr-x 1 nagios nagios 108966 Oct 12 22:33 check_overcr -rwxr-xr-x 1 nagios nagios 132723 Oct 12 22:33 check_ping lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_pop -> check_tcp -rwxr-xr-x 1 nagios nagios 396865 Oct 12 22:33 check_procs -rwxr-xr-x 1 nagios nagios 106524 Oct 12 22:33 check_real -rwxr-xr-x 1 nagios nagios 9584 Oct 12 22:33 check_rpc -rwxr-xr-x 1 nagios nagios 1412 Oct 12 22:33 check_sensors lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_simap -> check_tcp -rwxr-xr-x 1 nagios nagios 446535 Oct 12 22:33 check_smtp lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_spop -> check_tcp -rwxr-xr-x 1 nagios nagios 103032 Oct 12 22:33 check_ssh lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_ssmtp -> check_tcp -rwxr-xr-x 1 nagios nagios 108265 Oct 12 22:33 check_swap -rwxr-xr-x 1 nagios nagios 160418 Oct 12 22:33 check_tcp -rwxr-xr-x 1 nagios nagios 105054 Oct 12 22:33 check_time lrwxrwxrwx 1 root root 9 Oct 12 22:33 check_udp -> check_tcp -rwxr-xr-x 1 nagios nagios 117566 Oct 12 22:33 check_ups -rwxr-xr-x 1 nagios nagios 83458 Oct 12 22:33 check_users -rwxr-xr-x 1 nagios nagios 2939 Oct 12 22:33 check_wave -rwxr-xr-x 1 nagios nagios 109747 Oct 12 22:33 negate -rwxr-xr-x 1 nagios nagios 103274 Oct 12 22:33 urlize -rwxr-xr-x 1 nagios nagios 1921 Oct 12 22:33 utils.pm -rwxr-xr-x 1 nagios nagios 2728 Oct 12 22:33 utils.sh
10、安装nrpe
因为只有安装了nrpe,服务端才有check_nrpe插件,才可和nrpe客户端通讯,而且服务端也要被监控,所以这个nrpe插件一定要安装
tar xf nrpe-2.12.tar.gz cd nrpe-2.12 ./configure make all make install-plugin make install-daemon make install-daemon-config
11、启动nagios和apache,查看nagios和apache服务是否启动
/etc/init.d/nagios start /etc/init.d/httpd start
[root@nagios-srv nrpe-2.12]# ps -ef|grep nagios nagios 54358 1 0 22:43 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg [root@nagios-srv nrpe-2.12]# lsof -i :80 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME httpd 54457 root 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54459 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54460 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54461 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54462 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54463 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54464 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54465 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54466 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN) httpd 54467 apache 4u IPv6 108352 0t0 TCP *:http (LISTEN)
12、最后通过浏览器访问nagios查看nagios服务端是否部署成功
访问http://192.168.1.102/nagios/ 提示输入认证的用户名和密码:
通过输入正确的用户名和密码后界面如下就表示nagios服务端部署完成了
nagios客户端部署
由于客户端不需要画图工具和lamp环境,所以在安装nagios需要的支持包客户端就没必要装了(比如:gd gd-devel httpd php php-gd mysql* -y不需要安装)
但是还是要检查selinux、iptables要关掉,更换成阿里云的yum源,配置环境变量、还有时间同步定时任务要部署。
1、创建nagios用户
[root@lnmp01 tools]# useradd nagios -m -s /sbin/nologin
2、安装nagios-pluins插件
yum install perl-devel -y tar xf nagios-plugins-1.4.16.tar.gz cd nagios-plugins-1.4.16 ./configure --prefix=/usr/local/nagios --enable-perl-modules --enable-redhat-pthread-workaround make && make install
最后统计安装了多少nagios插件:ll /usr/local/nagios/libexec/|wc -l
3、安装nrpe
tar xf nrpe-2.12.tar.gz cd nrpe-2.12 ./configure make all make install-plugin make install-daemon make install-daemon-config
4、安装监控磁盘IO的iostat所需要的软件
Class-Accessor-0.31.tar.gz Config-Tiny-2.12.tar.gz Math-Calc-Units-1.07.tar.gz Nagios-Plugin-0.34.tar.gz Params-Validate-0.91.tar.gz Regexp-Common-2010010201.tar.gz
安装这些软件:
tar xf Params-Validate-0.91.tar.gz cd Params-Validate-0.91 perl Makefile.PL make && make install cd .. tar xf Class-Accessor-0.31.tar.gz cd Class-Accessor-0.31 perl Makefile.PL make && make install cd .. tar xf Math-Calc-Units-1.07.tar.gz cd Math-Calc-Units-1.07 perl Makefile.PL make && make install cd .. tar xf Regexp-Common-2010010201.tar.gz cd Regexp-Common-2010010201 perl Makefile.PL make && make install cd .. tar xf Nagios-Plugin-0.34.tar.gz cd Nagios-Plugin-0.34 perl Makefile.PL make && make install cd .. tar xf Config-Tiny-2.12.tar.gz cd Config-Tiny-2.12 perl Makefile.PL make && make install cd..
5、安装sysstat
[root@lnmp01 tools]# yum install sysstat -y
6、调整check_memory.pl、check_iostat插件
cp /home/oldboy/tools/check_memory.pl /usr/local/nagios/libexec/ cp /home/oldboy/tools/check_iostat /usr/local/nagios/libexec/ chmod 755 /usr/local/nagios/libexec/check_memory.pl chmod 755 /usr/local/nagios/libexec/check_iostat dos2unix /usr/local/nagios/libexec/check_memory.pl dos2unix /usr/local/nagios/libexec/check_iostat
7、修改nrpe配置文件
修改nrpe.cfg,允许那台主机可以监控管理本机,将nagios的服务端的ip加进来
[root@lnmp01 tools]# vim /usr/local/nagios/etc/nrpe.cfg allowed_hosts=127.0.0.1,192.168.1.102
同时对配置文件中的下面5行数据删除
199 command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 200 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25 ,20 201 command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /d ev/hda1 202 command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z 203 command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 2 00
删除上面的5行数据后再加入使用下面的命令导入到nrpe.cfg的配置文件中
如果后期自己写的插件,就应该放在这个配置文件的指定位置中,并在此配置文件中配置。
echo "command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,6 -c 30,25,20">>/usr/local/nagios/etc/nrpe.cfg echo "command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 6% -c 3%">>/usr/local/nagios/etc/nrpe.cfg echo "command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 8% -p /">>/usr/local/nagios/etc/nrpe.cfg echo "command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%">>/usr/local/nagios/etc/nrpe.cfg echo "command[check_iostat]=/usr/local/nagios/libexec/check_iostat -w 6 -c 10">>/usr/local/nagios/etc/nrpe.cfg
8、启动nrpe守护进程,并将启动命令放到启动文件rc.local中。
[root@lnmp01 tools]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d [root@lnmp01 tools]# echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >>/etc/rc.local
查看nrpe进程:
[root@lnmp01 tools]# ps -ef |grep nrpe nagios 61056 1 0 19:19 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d [root@lnmp01 tools]# netstat -lntup|grep nrpe tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 61056/nrpe
nagios服务端的配置
nagios的配置文件有主配置文件nagios.cfg和其他配置文件,其他配置文件是包含在nagios.cfg主配置文件中的,也就是说nagios.cfg主配置文件include其他配置文件,用tree查看
[root@nagios-srv etc]# tree /usr/local/nagios/etc/ /usr/local/nagios/etc/ |-- cgi.cfg |-- htpasswd.users |-- nagios.cfg |-- nrpe.cfg |-- objects | |-- commands.cfg | |-- contacts.cfg | |-- localhost.cfg | |-- printer.cfg | |-- switch.cfg | |-- templates.cfg | |-- timeperiods.cfg | `-- windows.cfg `-- resource.cfg
在nagios主配置文件中可以使用cfg_file来指定其他配置文件,用cfg_dir来指定要包含的存放其他配置文件的目录,配置调整如下所示:
[root@nagios-srv etc]# vim nagios.cfg # You can specify individual object config files as shown below: cfg_file=/usr/local/nagios/etc/objects/commands.cfg cfg_file=/usr/local/nagios/etc/objects/contacts.cfg cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg cfg_file=/usr/local/nagios/etc/objects/templates.cfg cfg_file=/usr/local/nagios/etc/objects/services.cfg cfg_file=/usr/local/nagios/etc/objects/hosts.cfg #这个地方要对localhost.cfg配置文件注释掉 #cfg_file=/usr/local/nagios/etc/objects/localhost.cfg cfg_dir=/usr/local/nagios/etc/services #cfg_dir=/usr/local/nagios/etc/printers #cfg_dir=/usr/local/nagios/etc/switches #cfg_dir=/usr/local/nagios/etc/routers
主配置文件nagios.cfg的配置services、hosts默认是没有的,是我们自己自定义的,所以要手动添加这些配置文件和目录
[root@nagios-srv etc]# mkdir services [root@nagios-srv etc]# chown -R nagios.nagios services [root@nagios-srv etc]# head -51 objects/localhost.cfg>objects/hosts.cfg [root@nagios-srv objects]# chown nagios.nagios hosts.cfg [root@nagios-srv objects]# touch services.cfg [root@nagios-srv objects]# chown nagios.nagios services.cfg
nagios的监控模式和监控模式的选择
nagios的监控模式分为:主动模式和被动模式
主动模式是nagios服务端主动请求监控客户端状态,不需要客户端的nrpe的支持,是利用服务端的插件要获取数据
被动模式是nagios服务端通过check_nrpe插件,和客户端nrpe进程沟通,调用本地插件获取数据
nagios的被动模式的配置:
配置hosts.cfg配置文件及根据提示的错误配置其他文件
[root@nagios-srv ~]# cd /usr/local/nagios/etc/objects/ [root@nagios-srv objects]# vim hosts.cfg # Define a host for the local machine define host{ use linux-server host_name 103.lnmp01 alias 103.lnmp01 address 192.168.1.103 } define host{ use linux-server host_name nagios_server alias nagios_server address 192.168.1.102 } define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group members 103.lnmp01,nagios_server }
配置文件配置完后,检查语法 /etc/init.d/nagios checkconfig,这时候会提示有错,但是没有提示出错在哪儿,但是用/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 却可以显示出错的位置
这是由/etc/init.d/nagios的启动脚本控制的,修改脚本如下,将$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;的错误定向到空去掉就好了
[root@nagios-srv objects]# vim /etc/init.d/nagios checkconfig) printf "Running configuration check..." $NagiosBin -v $NagiosCfgFile; if [ $? -eq 0 ]; then echo " OK." else echo " CONFIG ERROR! Check your Nagios configuration." exit 1 fi ;;
然后再用 /etc/init.d/nagios checkconfig检查语法就会提示错误的位置了
[root@nagios-srv objects]# /etc/init.d/nagios checkconfig Checking services... Error: There are no services defined! Checked 0 services. Total Warnings: 2 Total Errors: 1
提示错误为没有services,那么接下来编辑services.cfg
[root@nagios-srv objects]# vim services.cfg define service{ use generic-service host_name 103.lnmp01 service_description Disk Partition check_command check_nrpe!check_disk } define service{ use generic-service host_name 103.lnmp01 service_description Memory infomation check_command check_nrpe!check_mem }
再检查语法/etc/init.d/nagios checkconfig依然有报错,报错如下:
[root@nagios-srv objects]# /etc/init.d/nagios checkconfig Checking services... Error: Service check command 'check_nrpe check_mem' specified in service 'Disk Partition' for host '103.lnmp01' not defined anywhere! Total Warnings: 1 Total Errors: 1
这个报错为没有定义check_nrpe和check_mem命令,我们知道定制nagios命令是在命令配置文件commands.cfg中,接下来修改commands.cfg配置文件如下:
在配置文件中添加定义check_nrpe和check_mem命令
[root@nagios-srv objects]# vim commands.cfg # 'check_nrpe' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ } # 'check_memory' command definition define command{ command_name check_mem command_line $USER1$/check_memory.pl -H $HOSTADDRESS$ -c $ARG1$ }
这时候再检查语法就没错误了,根据commands.cfg配置文件中的定义check_nrpe命令来测试下是否可以监控
[root@nagios-srv etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.103 -c check_disk DISK OK - free space: / 5836 MB (66% inode=84%);| /=2936MB;7399;8509;0;9249 [root@nagios-srv objects]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.103 -c check_mem CHECK_MEMORY OK - 359M free | free=376508416b;29873356.8:;14936678.4:
浏览器打开http://192.168.1.102/nagios/ 这时候会报一个cgi错误,这是权限问题,默认nagios的授权是nagiosadmin,而我们使用的监控用户时oldboy,所以讲cgi,cfg的配置文件中的nagiosadmin替换成oldboy即可
[root@nagios-srv etc]# vim cgi.cfg authorized_for_all_service_commands=oldboy authorized_for_all_host_commands=oldboy #在vim中使用g/nagiosadmin/s//oldboy/g将nagiosadmin全部改成oldboy #重启nagios服务 /etc/init.d/nagios reload
nagios的主动模式配置
nagios主动模式是有nagios服务端主动发起的监控,比如对客户端url、端口的监控,可以通过help的命令查看命令格式
对客户端url的监控命令参数帮助:
[root@nagios-srv objects]# /usr/local/nagios/libexec/check_http --help Usage: check_http -H <vhost> | -I <IP-address> [-u <uri>] [-p <port>] [-w <warn time>] [-c <critical time>] [-t <timeout>] [-L] [-a auth] [-b proxy_auth] [-f <ok|warning|critcal|follow|sticky|stickyport>] [-e <expect>] [-s string] [-l] [-r <regex> | -R <case-insensitive regex>] [-P string] [-m <min_pg_size>:<max_pg_size>] [-4|-6] [-N] [-M <age>] [-A string] [-k string] [-S <version>] [--sni] [-C <warn_age>[,<crit_age>]] [-T <content-type>] [-j method]
对客户端端口的监控命令参数帮助:
[root@nagios-srv objects]# /usr/local/nagios/libexec/check_tcp --help Usage: check_tcp -H host -p port [-w <warning time>] [-c <critical time>] [-s <send string>] [-e <expect string>] [-q <quit string>][-m <maximum bytes>] [-d <delay>] [-t <timeout seconds>] [-r <refuse state>] [-M <mismatch state>] [-v] [-4|-6] [-j] [-D <warn days cert expire>[,<crit days cert expire>]] [-S <use SSL>] [-E]
由于nagios的主配置文件中指定了这个目录cfg_dir
=
/
usr
/
local
/
nagios
/
etc
/
services,那么我们就可以在这个目录下自定义配置文件来对nagios的客户端主动监控
1、比如对主域名进行监控,定义一个http-url.cfg配置文件
[root@nagios-srv services]# vim http-url.cfg define service{ use generic-service host_name 103.lnmp01 service_description blog_url monitor check_command check_weburl!-H blog.etiantian.org max_check_attempts 3 normal_check_interval 2 retry_check_interval 1 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c,r contact_groups admins }
然后在commands.cfg配置文件中对自定义的配置文件http-url.cfg中的自定义命令名check_weburl进行配置
[root@nagios-srv objects]# vim commands.cfg # 'check_http_url' command definition define command{ command_name check_weburl command_line $USER1$/check_http $ARG1$ -w 10 -c 30 }
检查语法并重启nagios服务/etc/init.d/nagios reload
不仅可以对主域名的监控,其实也可以对主域名下的url进行监控,例如:
[root@nagios-srv services]# vim http-url.cfg define service{ use generic-service host_name 103.lnmp01 service_description monitor url /upload/test.html check_command check_weburl!-H blog.etiantian.org -u /upload/test.html max_check_attempts 3 normal_check_interval 2 retry_check_interval 1 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c,r contact_groups admins }
如果主域名下的url路径很复杂的话,比如动态的url,在定义service的时候应该在-u 参数后的的url路径上加上引号。例如:check_command check_weburl!-H blog.etiantian.org -u "/dynamic/?article=1&u=2"
然后重启nagios服务即可/etc/init.d/nagios reload
那么用域名的监控对于集群下面的节点怎么监控呢,这就要用到别名对集群下面同样节点的url监控
2、对客户端的端口进行监控
对端口监控用check_tcp插件,而这个插件的命令已经在commands.cfg配置文件中已经定义了,我们就可以直接使用
# 'check_tcp' command definition define command{ command_name check_tcp command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$ }
根据check_tcp在commands.cfg配置文件中命令定义格式,在services目录中创建一个对端口监控的配置文件
[root@nagios-srv services]# vim monitor-port.cfg define service{ use generic-service host_name 103.lnmp01 service_description Monitor Port-80 check_command check_tcp!80 max_check_attempts 3 normal_check_interval 2 retry_check_interval 1 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c,r contact_groups admins }
这样重启nagios服务即可。对于端口的监控还有很多,比如可以对mysql的3306端口监控,ssh的22端口监控等所以的端口进行监控
由于主动模式监控和被动模式监控可以互换的,那么下面尝试对80端口的主动模式换成被动模式
修改客户端的nrpe.cfg配置文件,添加一个check_port_80的插件命令如下:
[root@lnmp01 upload]# vim /usr/local/nagios/etc/nrpe.cfg command[check_port_80]=/usr/local/nagios/libexec/check_tcp -H 192.168.1.103 -p 8 0
重启nrpe客户端
这时候在服务端用被动模式是否可以监控到客户端的80端口: /usr/local/nagios/libexec/check_nrpe -H 192.168.1.103 -c check_port_80 ,测试没问题就可以在服务端配置services了
[root@nagios-srv services]# vim http-url.cfg define service{ use generic-service host_name 103.lnmp01 service_description beidong-monitor 103.lnmp01-port-80 check_command check_nrpe!check_port_80 max_check_attempts 3 normal_check_interval 2 retry_check_interval 1 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c,r contact_groups admins }
重启服务端的nagios服务,这时候nagios服务端就可以被动的监控客户端的80端口了
3、nagios的模板定义
由于在配置host.cfg、services.cfg,还有其他的一些配置文件中基本上都要用到模板,有了这些模板再编辑这些配置文件的时候就方便许多了,只要在定义的模板中将公用的数据放到模板中,其他的配置文件然后调用这些模板即可。
比如对services.cfg的配置用模板定义的方式设置,首先在templates.cfg的模板配置文件中自定义一个模板generic-goser-service,将services.cfg中公用的部分全部放在这个模板中
[root@nagios-srv objects]# vim templates.cfg define service{ name generic-goser-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 2 normal_check_interval 2 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 10 notification_period 24x7 register 0 }
然后再services.cfg中就可以调用这个模板
[root@nagios-srv objects]# vim services.cfg define service{ use generic-goser-service host_name 103.lnmp01 service_description Disk Partition check_command check_nrpe!check_disk } define service{ use generic-goser-service host_name 103.lnmp01 service_description Memory infomation check_command check_nrpe!check_mem } define service{ use generic-goser-service host_name 103.lnmp01 service_description Load infomation check_command check_nrpe!check_load } define service{ use generic-goser-service host_name 103.lnmp01 service_description Swap status check_command check_nrpe!check_swap } define service{ use generic-goser-service host_name 103.lnmp01 service_description IO status check_command check_nrpe!check_iostat }
同理对主机的配置文件hosts.cfg也一样在templates.cfg中定义模板,然后再hosts.cfg配置文件中调用;
警报连接人模板在contacts.cfg中定义,然后在services.cfg等配置文件中调用,如下对contacts.cfg做自定义的警报联系人设置
define contact{ contact_name goser01 use generic-contact alias Nagios Admin email goser01@163.com } define contact{ contact_name goser02 use generic-contact alias Nagios Admin email goser02@163.com } define contactgroup{ contactgroup_name sas alias Nagios Administrators members goser01,goser02 }
然后就可在services.cfg配置文件的模板中将自定义好的报警联系人配置进去,这样报警就会发送给模板中定义好的联系人了。
[root@nagios-srv objects]# vim templates.cfg contact_groups admins,sas
当然还可对周期配置文件模板做自定义,然后将定义好的周期让sercices.cfg和其他配置来使用,周期配置文件内容为:
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
[root@nagios-srv objects]# less timeperiods.cfg ############################################################################### # TIMEPERIODS.CFG - SAMPLE TIMEPERIOD DEFINITIONS # # Last Modified: 05-31-2007 # # NOTES: This config file provides you with some example timeperiod definitions # that you can reference in host, service, contact, and dependency # definitions. # # You don't need to keep timeperiods in a separate file from your other # object definitions. This has been done just to make things easier to # understand. # ############################################################################### ############################################################################### ############################################################################### # # TIME PERIODS # ############################################################################### ############################################################################### # This defines a timeperiod where all times are valid for checks, # notifications, etc. The classic "24x7" support nightmare. :-) define timeperiod{ timeperiod_name 24x7 alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 } # 'workhours' timeperiod definition define timeperiod{ timeperiod_name workhours alias Normal Work Hours monday 09:00-17:00 tuesday 09:00-17:00 wednesday 09:00-17:00 thursday 09:00-17:00 friday 09:00-17:00 } # 'none' timeperiod definition define timeperiod{ timeperiod_name none alias No Time Is A Good Time } # Some U.S. holidays # Note: The timeranges for each holiday are meant to *exclude* the holidays from being # treated as a valid time for notifications, etc. You probably don't want your pager # going off on New Year's. Although you're employer might... :-) define timeperiod{ name us-holidays timeperiod_name us-holidays alias U.S. Holidays january 1 00:00-00:00 ; New Years monday -1 may 00:00-00:00 ; Memorial Day (last Monday in May) july 4 00:00-00:00 ; Independence Day monday 1 september 00:00-00:00 ; Labor Day (first Monday in September) thursday 4 november 00:00-00:00 ; Thanksgiving (4th Thursday in November) december 25 00:00-00:00 ; Christmas } # This defines a modified "24x7" timeperiod that covers every day of the # year, except for U.S. holidays (defined in the timeperiod above). define timeperiod{ timeperiod_name 24x7_sans_holidays alias 24x7 Sans Holidays use us-holidays ; Get holiday exceptions from other timeperiod sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 }
将来服务多的话,可以对服务进行分组,就像上面对主机分组一样,比如:
[root@nagios-srv services]# vim servergroup.cfg define servicegroup{ servicegroup_name Swap Useage alias Linux Servers;Long name of the group members 103.lnmp01,Swap Useage,105.mysql,Swap Useage }
这里的Swap要和swap定义的服务的描述要一致,否则会报错。
nagios监控的图形显示和管理
1、安装图形显示的支持库
yum install cairo pango zlib zlib-devel freetype freetype-devel gd gd-devel -y
2、安装rrdtool,即轮询的数据库工具,和依赖包,比如可以按照每周,每月等出图,需要安装libart_lgpl依赖包
这里用编译安装libart_lgpl
cd /home/oldboy/tools wget http://ftp.acc.umu.se/pub/gnome/sources/libart_lgpl/2.3/libart_lgpl-2.3.17.tar.gz tar xf libart_lgpl-2.3.17.tar.gz cd libart_lgpl-2.3.17 ./configure make && make install cp -r /usr/local/include/libart-2.0 /usr/include cd ..
安装rrdtool,这是一个专门用来画图的工具
wget https://oss.oetiker.ch/rrdtool/pub/rrdtool-1.2.14.tar.gz tar xf rrdtool-1.2.14.tar.gz cd rrdtool-1.2.14 ./configure --prefix=/usr/local/rrdtool --disable-python --disable-tcl make && make install cd ..
3、安装pnp软件,pnp是出图软件,是收集完数据通过rrdtool画图,最后再显示图的效果
wget https://sourceforge.net/projects/pnp4nagios/files/PNP/pnp-0.4.14/pnp-0.4.14.tar.gz tar xf pnp-0.4.14.tar.gz cd pnp-0.4.14 ./configure --with-rrdtool=/usr/local/rrdtool/bin/rrdtool --with-perfdata-dir=/usr/local/nagios/share/perdata/ make all make install make install-config make install-init
查看是否生成了收集数据来出图的脚本:
[root@nagios-srv pnp-0.4.14]# ll /usr/local/nagios/libexec/ |grep process -rwxr-xr-x 1 nagios nagios 31826 Oct 14 16:53 process_perfdata.pl
4、调整nagios.cfg主配置文件,process_perfdata=1让其可以收集数据,同时打开主机收集数据和服务收集数据
[root@nagios-srv etc]# vim nagios.cfg # Values: 1 = process performance data, 0 = do not process performance data process_performance_data=1 host_perfdata_command=process-host-perfdata service_perfdata_command=process-service-perfdata
5、调整commands.cfg配置文件,删除默认的主机和服务的出图收集脚本的方式,我们要重新定义
[root@nagios-srv objects]# vim commands.cfg # 'process-host-perfdata' command definition define command{ command_name process-host-perfdata command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOST STATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOST PERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out } # 'process-service-perfdata' command definition define command{ command_name process-service-perfdata command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$S ERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECU TIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local /nagios/var/service-perfdata.out }
将上面删除的内容重新定义为:
# 'process-host-perfdata' command definition define command{ command_name process-host-perfdata command_line /usr/local/nagios/libexec/process_perfdata.pl } # 'process-service-perfdata' command definition define command{ command_name process-service-perfdata command_line /usr/local/nagios/libexec/process_perfdata.pl }
从新启动nagios服务,这时候访问http://192.168.1.102/nagios/pnp/index.php便可以正常出图了
6、配置主机出图
因为主机使用的模板为:linux-server,所以在模板中对主机统一配置:
[root@nagios-srv objects]# cat hosts.cfg define host{ use linux-server host_name 103.lnmp01 alias 103.lnmp01 address 192.168.1.103 }
[root@nagios-srv objects]# vim templates.cfg define host{ name linux-server #配置action_url指定的主机 action_url /nagios/pnp/index.php?host=$HOSTNAME$ }
6、服务出图
可以在单个服务中配置出图,当然也可以在服务模板中对所有服务做统一出图
下面对模板做配置,对所有服务做出图处理 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$
[root@nagios-srv objects]# vim templates.cfg define service{ name generic-goser-service action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVI CEDESC$ } [root@nagios-srv objects]# /etc/init.d/nagios checkconfig [root@nagios-srv objects]# /etc/init.d/nagios reload
自定义插件
自定义插件可以用c c++ python java shell等来开发。nagios插件提供两个返回值,一个是插件的退出状态码,另一个是插件在控制台打印的第一行数据。
nagios主程序可识别的状态码为:
OK #退出代码0 --表示服务正常工作 WARNING #退出代码1 --表示服务处于警告状态 CRITICAL #退出代码2 --表示服务处于紧急,严重状态 UNKNOWN #退出代码3 --表示服务处于未知状态
这些状态可以在utils.sh插件中查看到
[root@nagios-srv ~]# cat /usr/local/nagios/libexec/utils.sh #! /bin/sh STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 STATE_DEPENDENT=4
我们现在用shell来开发一个验证密码文件是否被修改的简单插件,配置如下
在nagios客户端生成一个MD5密码文件:
[root@lnmp01 ~]# md5sum /etc/passwd>/etc/goser.md5 #通过md5sum -c /etc/goser.md5验证passwd文件是否被修改
在/usr/local/nagios/libexec/目录下编写一个shell脚本:check_passwd
[root@lnmp01 ~]# vim /usr/local/nagios/libexec/check_passwd #!/bin/sh char=`md5sum -c /etc/goser.md5 2>/dev/null|grep "OK"|wc -l` if [ $char -eq 1 ];then echo "Passwd is ok" exit 0 else echo "Passwd is changed" exit 2 fi
验证脚本是否能够正确执行:
[root@lnmp01 ~]# sh /usr/local/nagios/libexec/check_passwd Passwd is ok [root@lnmp01 ~]# useradd test02 [root@lnmp01 ~]# sh /usr/local/nagios/libexec/check_passwd Passwd is changed
要监控客户端密码文件是否变化,只能用nrpe的被动模式,所以就要在客户端的nrpe.cfg配置文件做如下配置
[root@lnmp01 ~]# vim /usr/local/nagios/etc/nrpe.cfg command[check_passwd]=/usr/local/nagios/libexec/check_passwd
重启nrpe服务,使nrpe.cfg配置文件生效。
nagios的服务配置文件配置如下:
[root@nagios-srv objects]# vim services.cfg define service{ use generic-goser-service host_name 103.lnmp01 service_description Monitor /etc/passwd check_command check_nrpe!check_passwd }
验证服务端用nagios监控客户端passwd文件变化
#客户端passwd文件没有修改前 [root@nagios-srv ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.103 -c check_passwd Passwd is ok #客户端passwd文件修改后,再测试 #[root@lnmp01 ~]# useradd test03 [root@nagios-srv ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.103 -c check_passwd Passwd is changed
报警配置
1、添加联系人和组contacts.cfg
define contact{ contact_name goser01 use generic-contact alias Nagios Admin email goser01@163.com } define contact{ contact_name goser02 use generic-contact alias Nagios Admin email goser02@163.com } define contactgroup{ contactgroup_name sas alias Nagios Administrators members goser01,goser02 }
2、添加报警的命令commands.cfg