Nagios配置
其实上篇Nogios安装只是安装了Nagios基本组件,虽然能够打开主页,但是如果不配置相关配置文件文件,那么左边菜单很多页面都打不开,相当于只是一个空壳子。接下来,我们来学习研究一下Nagios的配置,了解一下基本的配置和了解各类配置文件。
Nagios配置目录
Nagios的配置文件位于etc目录下(/usr/local/nagios/etc)如下图所示:
配置文件简介
配置文件名 |
功能描述 |
cgi.cfg |
控制CGI访问的配置文件 |
nagios.cfg |
主配置文件:主配置文件包括了一系列的设置,它们会影响Nagios守护进程 |
resource.cfg |
资源配置文件:资源文件可以保存用户自定义的宏。资源文件的一个主要用途是保存一些敏感的配置信息,不能让CGIS程序模块获取到的信息 |
objects |
objects是一个目录,在此目录下有很多配置文件,用于定义Nagios对象:commands.cfg、contacts.cfg、localhost.cfg |
objects目录下的配置文件描述
配置文件名 |
功能描述 |
commands.cfg |
命令定义配置文件,其中定义的命令可以被其他配置文件引用 |
contacts.cfg |
定义联系人和联系人组的配置文件 |
localhost.cfg |
定义监控本地主机的配置文件 |
printer.cfg |
定义监控打印机的一个配置文件模板,默认没有启用此文件 |
switch.cfg |
定义监控路由器的一个配置文件模板,默认没有启用此文件 |
templates.cfg |
定义主机和服务的一个模板配置文件,可以在其他配置文件中引用 |
timeperiods.cfg |
定义Nagios 监控时间段的配置文件 |
windows.cfg |
监控Windows 主机的一个配置文件模板,默认没有启用此文件 |
实践配置步骤
下面修改配置信息,首先让Nagios监控本机的各种资源消耗。修改下面配置文件前,首先将各类配置文件备份一份,以免修改过程出现重大问题时,还能回滚到修改前版本(修改前先将配置文件copy一份,命名为xxxx.bak 如下所示)
[root@bogon etc]# cd /usr/local/nagios/etc/
[root@bogon etc]# ls
cgi.cfg htpasswd nagios.cfg objects resource.cfg
[root@bogon etc]# cd objects/
[root@bogon objects]# ls
commands.cfg contacts.cfg localhost.cfg printer.cfg switch.cfg templates.cfg timeperiods.cfg windows.cfg
[root@bogon objects]#
1)先修改cgi.cfg
在cgi.cfg文件中,找到下面一些参数配置:
default_user_name=guest
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
将这些参数配置修改为如下:(如果不清楚为什么是kerry,参见上篇博客Nagios学习实践系列——基本安装篇解说)
default_user_name=kerry
authorized_for_system_information=nagiosadmin,kerry
authorized_for_configuration_information=nagiosadmin,kerry
authorized_for_system_commands=nagiosadmin,kerry
authorized_for_all_services=nagiosadmin,kerry
authorized_for_all_hosts=nagiosadmin,kerry
authorized_for_all_service_commands=nagiosadmin,kerry
authorized_for_all_host_commands=nagiosadmin,kerry
2)修改resource.cfg配置文件。
如图所示,找到$USER1$=/usr/local/nagios//libexec 将其改为$USER1$=/usr/local/nagios/libexec
3)修改nagios.cfg配置文件
修改一系列的参数配置,将那些多余的/去掉
log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg #此参数暂不配置
object_cache_file=/usr/local/nagios/var/objects.cache
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.dat
command_check_interval=1 #此参数暂时不配置
command_file=/usr/local/nagios/var/rw/nagios.cmd
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
log_archive_path=/usr/local/nagios/var/archives
check_result_path=/usr/local/nagios/var/spool/checkresults
state_retention_file=/usr/local/nagios/var/retention.dat
4)修改localhost.cfg配置文件
首先通过命令 hostname查看你监控主机的机器名,例如这次测试环境的主机名为bogon,进入localhost.cfg配置文件,将相应的
host_name或member等配置修改过来。
localhost.cfg文件的内容如下:
############################################################################### # LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE # # Last Modified: 05-31-2007 # # NOTE: This config file is intended to serve as an *extremely* simple # example of how you can create configuration entries to monitor # the local (Linux) machine. # ############################################################################### ############################################################################### ############################################################################### # # HOST DEFINITION # ############################################################################### ############################################################################### # Define a host for the local machine define host{ use linux-server ; Name of host template to use ; This host definition w ill inherit all variables that are defined ; in (or inherited by) t he linux-server host template definition. host_name bogon alias bogon address 127.0.0.1 } ############################################################################### ############################################################################### # # HOST GROUP DEFINITION # ############################################################################### ############################################################################### # Define an optional hostgroup for Linux machines define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group members bogon ; Comma separated list of hosts that belong to this group } ############################################################################### ############################################################################### # # SERVICE DEFINITIONS # ############################################################################### ############################################################################### # Define a service to "ping" the local machine define service{ use local-service ; Name of service template to use host_name bogon service_description PING check_command check_ping!100.0,20%!500.0,60% } # Define a service to check the disk space of the root partition # on the local machine. Warning if < 20% free, critical if # < 10% free space on partition. define service{ use local-service ; Name of service template to use host_name bogon service_description Root Partition check_command check_local_disk!20%!10%!/ } # Define a service to check the number of currently logged in # users on the local machine. Warning if > 20 users, critical # if > 50 users. define service{ use local-service ; Name of service template to use host_name bogon service_description Current Users check_command check_local_users!20!50 } # Define a service to check the number of currently running procs # on the local machine. Warning if > 250 processes, critical if # > 400 users. define service{ use local-service ; Name of service template to use host_name bogon service_description Total Processes check_command check_local_procs!250!400!RSZDT } # Define a service to check the load on the local machine. define service{ use local-service ; Name of service template to use host_name bogon service_description Current Load check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4. } # Define a service to check the swap usage the local machine. # Critical if less than 10% of swap is free, warning if less than 20% is free define service{ use local-service ; Name of service template to use host_name bogon service_description Swap Usage check_command check_local_swap!20!10 } # Define a service to check SSH on the local machine. # Disable notifications for this service by default, as not all users may have S SH enabled. define service{ use local-service ; Name of service template to use host_name bogon service_description SSH check_command check_ssh notifications_enabled 0 } # Define a service to check HTTP on the local machine. # Disable notifications for this service by default, as not all users may have H TTP enabled. define service{ use local-service ; Name of service template to use host_name bogon service_description HTTP check_command check_http notifications_enabled 0 }
基本配置完成后,我们启动Nagios、Apache服务
启动Apache服务
[root@bogon conf]# /usr/local/apache/bin/apachectl start
启动Nagios服务
[root@bogon conf]# service nagios start
如图所示,就可监控当前服务器的负载、当前用户数、HTTP服务、SSH服务….
配置问题汇总:
在配置Nagios的过程中、总会碰到千奇百怪、各式各样的问题,下面我慢慢收集整理碰到过得的一些问题,当然这是我碰到,没有碰到过得问题,不做收录。
问题1:Nagios配置好后,启动了Apache、Nagios服务后,进入Hosts、Services等界面时,出现乱码,如下图所示:
这个问题是由于Apache没有开启cgi脚本的缘故,进入apache的主配置文件目录,修改配置文件httpd.conf,将下面两行的注释取消,重启服务即可解决问题。
#LoadModule cgid_module modules/mod_cgid.so
#LoadModule alias_module modules/mod_alias.so
#LoadModule actions_module modules/mod_actions.so #暂未确定
在最后一行增加
AddDefaultCharset utf-8 #解决中文乱码问题
问题2:点击Map页面,出现下面错误信息(红色部分):
Not Found
The requested URL /nagios/cgi-bin/statusmap.cgi was not found on this server.
出现这个错误,是因为没有安装gd-devel包导致,需要安装gd-devel包。
问题3:Error: Could not open command file '/usr/local/nagios/var/rw/nagios.cmd':
关于这部分在nagios.cfg中有下面的内容
# EXTERNAL COMMAND FILE# This is the file that Nagios checks for external command requests.# It is also where the command CGI will write commands that are submitted# by users, so it must be writeable by the user that the web server# is running as (usually 'nobody'). Permissions should be set at the# directory level instead of on the file, as the file is deleted every# time its contents are processed. 这段话的核心意思是apache的运行用户要有对文件写的权限.权限应该设置在目录上,因为每次文件的内容被处理后文件就会被删掉 command_file=/usr/local/nagios/var/rw/nagios.cmd |
首先,看一下你的进程,apache的进程,是什么用户运行,我的机器是daemon
#ps -ef | grep http
root 50297 1 0 21:42 ? 00:00:00 /usr/local/apache//bin/httpd -k start
daemon 50298 50297 0 21:42 ? 00:00:00 /usr/local/apache//bin/httpd -k start
daemon 50299 50297 0 21:42 ? 00:00:00 /usr/local/apache//bin/httpd -k start
daemon 50300 50297 0 21:42 ? 00:00:00 /usr/local/apache//bin/httpd -k start
daemon 50301 50297 0 21:42 ? 00:00:00 /usr/local/apache//bin/httpd -k start
daemon 50425 50297 0 21:43 ? 00:00:00 /usr/local/apache//bin/httpd -k start
root 50909 3194 0 22:02 pts/1 00:00:00 grep http
注意,这里指的是守护进程,而不是root运行的那个起始进程。
然后怎么做呢,如果你运行的nagios进程的用户是nagios,组也是nagios,则:
usermod -G nagios daemon
chmod g+s /path/to/nagiosdir/var/rw