linux 6 安装 Nagios服务
Nagios
Nagios是一款用于系统和网络监控的应用程序。它可以在你设定的条件下对主机和服务进行监控,在状态变差和变好的时候给出告警信息。
Nagios更进一步的特征包括:
- 监控网络服务(SMTP、POP3、HTTP、NNTP、PING等);
- 监控主机资源(处理器负荷、磁盘利用率等);
- 简单地插件设计使得用户可以方便地扩展自己服务的检测方法;
- 并行服务检查机制;
- 具备定义网络分层结构的能力,用"parent"主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
- 当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式);
- 具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位;
- 自动的日志回滚;
- 可以支持并实现对主机的冗余监控;
- 可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等;
有许多插件可用于监控不同的设备和服务,包括:
- HTTP、POP3、IMAP、FTP、SSH、DHCP
- CPU负荷、磁盘利用率、内存占用、当前用户数
- Unix/Linux、Windows和Netware服务器
- 路由器和交换机
- 等等
服务端
建立ngios用户
# useradd nagios # passwd nagios Changing password for user nagios. New password: BAD PASSWORD: it is WAY too short BAD PASSWORD: is a palindrome Retype new password: passwd: all authentication tokens updated successfully. # # groupadd nagcmd # usermod -G nagcmd nagios # # id nagios uid=501(nagios) gid=501(nagios) groups=501(nagios),502(nagcmd)
安装服务环境
确认系统上已经安装如下软件包再继续。
- Apache
- GCC编译器
- GD库与开发库
可以用yum命令来安装这些软件包,键入命令:
yum install httpd
yum install gcc
yum install glibc glibc-common
yum install gd gd-devel
下载
# wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-4.0.7.tar.gz
# wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.0rc1.tar.gz
# wget http://nchc.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz
编译与安装Nagios
展开Nagios源程序包
# tar -xvf nagios-4.0.7.tar.gz ... # cd nagios-4.0.7
运行Nagios配置脚本并使用先前开设的用户及用户组: # ./configure -with-command-group=nagcmd ... Review the options above for accuracy. If they look okay, type 'make all' to compile the main program and CGIs.
编译Nagios程序包源码 # make all
安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限 # make install
# make install-init
# make install-commandmode
# make install-config
配置WEB接口
安装nagios的apache配置文件 # make install-webconf /usr/bin/install -c -m 644 sample-config/httpd.conf /etc/httpd/conf.d/nagios.conf *** Nagios/Apache conf file installed * web界面 # cp -R contrib/eventhandlers/ /usr/local/nagios/libexec/ # chown -R nagios:nagios /usr/local/nagios/libexec/eventhandlers
检测配置文件是否有错 # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Nagios Core 4.0.7 ... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check #
创建一个nagiosadmin的用户用于Nagios的WEB接口登录
# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin New password: Re-type new password: Adding password for user nagiosadmin
重启Apache服务以使设置生效
# service httpd start Starting httpd: httpd: apr_sockaddr_info_get() failed for nagios httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName [ OK ] #
将服务加入至开机自启
# chkconfig --add nagios
# chkconfig nagios on
# chkconfig httpd on
启动Nagios服务
# /etc/init.d/nagios start Starting nagios: done. # # /etc/init.d/nagios status nagios (pid 16066) is running...
登录WEB接口
浏览器打开 http://localhost/nagios/,在提示下输入你的用户名(nagiosadmin)和口令(刚刚设置的).
被监控客户端
安装环境
yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel
安装安装Nagios插件和NRPE插件
# useradd nagios
# passwd nagios
#
# tar -xvf nagios-plugins-2.0.3.tar.gz
# cd nagios-plugins-2.0.3
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
# make
# make install
# chown -R nagios:nagios /usr/local/nagios
# tar -xvf nrpe-2.14.tar.gz # cd nrpe-2.14 # ./configure # make # make install # make install-plugin
# make install-daemon #按照安装文档的说明,是将NRPE deamon作为xinetd下的一个服务运行
# make install-daemon-config
# make install-xinetd
#
# chkconfig --add xinetd
# chkconfig xinetd on
在/etc/xinetd.d/nrpe文件最后一行添加监控主机的IP地址
# tail /etc/xinetd.d/nrpe port = 5666 wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nrpe server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd log_on_failure += USERID disable = no only_from = 127.0.0.1 192.168.10.19 } #
编辑/etc/services 文件,增加NRPE服务 ,在文件最后 增加一行
# tail /etc/services 3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol isnetserv 48128/tcp # Image Systems Network Services isnetserv 48128/udp # Image Systems Network Services blp5 48129/tcp # Bloomberg locator blp5 48129/udp # Bloomberg locator com-bardac-dw 48556/tcp # com-bardac-dw com-bardac-dw 48556/udp # com-bardac-dw iqobject 48619/tcp # iqobject iqobject 48619/udp # iqobject nrpe 5666/tcp # nrpe #
重启xinted服务,查看nrpe是否已经启动,查看所定端口是否已经被监控
# service xinetd restart .Stopping xinetd: [ OK ] Starting xinetd: [ OK ] # netstat -lantup | grep 5666 tcp 0 0 :::5666 :::* LISTEN 46773/xinetd #
本地测试nrpe,成功启动,会返回版本号
# /usr/local/nagios/libexec/check_nrpe -H localhost NRPE v2.14
默认只允许本地访问,需要配置增加Nagios服务端的访问IP(192.168.10.19)
# cat /usr/local/nagios/etc/nrpe.cfg | grep -v "^$" | grep -v "^#" log_facility=daemon pid_file=/var/run/nrpe.pid server_port=5666 nrpe_user=nagios nrpe_group=nagios allowed_hosts=127.0.0.1,192.168.10.19 dont_blame_nrpe=0 allow_bash_command_substitution=0 debug=0 command_timeout=60 connection_timeout=300 command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1 command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
再去nagios服务端测试一下,192.168.10.18为被监控机器IP,成功启动 ,会返回版本号,被监控机就配好了。
# /usr/local/nagios/libexec/check_nrpe -H 192.168.10.18 NRPE v2.14
nagios服务端添加被监控主机
监控端的配置信息文件/usr/local/nagios/etc/objects/ localhost.cfg 添加被监控的IP,增加监控服务。
# cat /usr/local/nagios/etc/objects/localhost.cfg | grep -v "^#" | grep -v "^$" define host{ use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name localhost alias localhost address 127.0.0.1,192.168.10.18 } define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group members localhost ; Comma separated list of hosts that belong to this group } define service{ use local-service ; Name of service template to use host_name localhost service_description PING check_command check_ping!100.0,20%!500.0,60% } define service{ use local-service ; Name of service template to use host_name localhost service_description HTTP check_command check_http notifications_enabled 0 }
... #
检查配置文件,没有错误和警告,就重启nagios服务
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Nagios Core 4.0.7 ... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
# service nagios restart
在浏览器查看添加情况。
#更多的监控设置看这个网址的nagios中文手册,很齐全了
http://nagios-cn.sourceforge.net/nagios-cn/cgiconfig.html
Nagios监控体系的框架
Nagios通过NRPE来远端管理服务
- Nagios执行安装在它里面的check_nrpe插件,并告诉check_nrpe去检测哪些服务。
- 通过SSL,check_nrpe连接远端 机子上的NRPE daemon
- NRPE运行本地的各种插件去检测本地的服务和状态
- 最后,NRPE把检测的结果传给主机端check_nrpe,check_nrpe在把结果 送到Nagios状态队列中,
- Nagios依次读取队列中信息,再把结果显示出来
Server安装了nagios软件,对监控的数据做处理,并且提供web界面查看和管理,当然也可以对本机自身的信息 进行监控
Client安装了NRPE等客户端,根据监控机的请求执行监控,然后将结果回传给监控机。