安装部署Zabibx监控体系项目
实现7*24小时自动化运维的重要组成之一就是监控系统,一个好的监控系统可以时时帮我们监控着整个系统所有主机的运行状态,比如内存、cpu、网络、还有各种服务指标等,还可以在故障出现时第一时间启动应急措施,比如通过脚本、命令等重启服务,通过短信、微信、邮箱等快速通知运维人员以确保及时修复故障而不影响生产应用。本节要介绍的便是监控系统中神器之一Zabbix,Zabbix是一款开源的自动化运维工具,zabbix除了上述提到的所有功能外,还提供优美的web监控界面和丰富的模版,利用这些模版可以帮助运维工程师快速发现、监控主机状态和故障。当然zabbix还有很多强大的功能,下面一一详细介绍
Zabbix基础
zabbix安装和配置
源码安装:下载地址:http://www.zabbix.com/download.php
# tar -zxvf zabbix-2.0.0.tar.gz
# groupadd zabbix
# useradd -g zabbix zabbix
注意:同时安装了server和agent的节点上,建议其运行用户不要相同。
创建数据库:server和proxy的运行都依赖于数据库,agent则不需要。
以MySQL数据库为例:
mysql> CREATE DATABASE zabbix CHARACTER SET utf8 COLLATE utf8_bin;
mysql> GRANT ALL ON zabbix.* TO zbuser@'%' IDENTIFIED BY 'zbpass';
# 请按需要修改用户名和密码;
shell> mysql -u<username> -p<password> zabbix < database/mysql/schema.sql
# 如果仅为proxy创建数据库,只导入schema.sql即可;否则,请继续下面的步骤;
shell> mysql -u<username> -p<password> zabbix < database/mysql/images.sql
shell> mysql -u<username> -p<password> zabbix < database/mysql/data.sql
编译安装zabbix:
同时安装server和agent,并支持将数据放入mysql数据中,可使用类似如下配置命令:
./configure --enable-server --enable-agent --with-mysql --enable-ipv6 --with-net-snmp --with-libcurl --with-ssh2
如果仅安装server,并支持将数据放入mysql数据中,可使用类似如下配置命令:
./configure --enable-server --with-mysql --with-net-snmp --with-libcurl
如果仅安装proxy,并支持将数据放入mysql数据中,可使用类似如下配置命令:
./configure --prefix=/usr --enable-proxy --with-net-snmp --with-mysql --with-ssh2
如果仅安装agent,可使用类似如下配置命令:
./configure --enable-agent
而后编译安装zabbix即可:
# make
# make install
配置zabbix:
zabbix程序的组件:
zabbix_server:服务端守护进程;
zabbix_agentd:agent守护进程;
zabbix_proxy:代理服务器,可选组件;
zabbix_get:命令行工具,手动测试向agent发起数据采集请求;
zabbix_sender:命令行工具,运行于agent端,手动向server端发送数据;
zabbix_java_gateway: java网关;
zabbix_database:MySQL或PostgreSQL;
zabbix_web:Web GUI
zabbix逻辑组件:
主机组
主机
监控项(item)
key:实现获取监控的目标上的数据的命令或脚本的名称;
应用(application):同一类监控项的集合;
触发器(trigger):表达式;PROBLEM, OK;
事件(event):
动作(action):由条件(condition)和操作(operation)组件;
媒介(media):发送通知的通道;
通知(notification):
远程命令(remote command):
报警升级():
模板(template):快速定义被监控主机的各监控项的预设项目集合;
图形(graph):用于展示历史数据或趋势数据的图像;
屏幕(screen):由多个graph组成;
server的配置文件为zabbix_server.conf,至少应该为其配置数据库等相关的信息;
agent的配置文件为zaabix_agentd.conf,至少应该为其指定server的IP地址;
proxy的配置文件为zabbix_proxy.conf,至少应该为其指定proxy的主机名和server的IP,以及数据库等相关的配置信息;
启动zabbix:
- server: zabbix_server
- agent: zabbix_agentd
- proxy: zabbix_proxy
安装frontend:
# cp -a frontend/php/ /var/www/html/zabbix
启动lamp或lnmp后,通过浏览器访问http://<server_ip_or_name>/zabbix即可进行安装。
如果使用rpm安装,zabbix关于mysql的数据库脚本文件路径为:
# cd /usr/share/doc/zabbix-server-mysql-2.0.6/create/
zabbix server配置启动:
配置文件:/etc/zabbix/zabbix_server.conf
配置段:
~]# grep "^#####" zabbix_server.conf
############ GENERAL PARAMETERS #################
############ ADVANCED PARAMETERS ################
####### LOADABLE MODULES #######
####### TLS-RELATED PARAMETERS #######
通用参数:
ListenPort=10051 SourceIP= LogType=file LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=0 DebugLevel=3 DBHost=localhost DBName=zabbix DBUser=zabbix DBPassword= DBSocket=/tmp/mysql.sock DBPort=3306
配置zabbix-web:
配置php的时区设定:
(1) /etc/php.ini
(2) /etc/httpd/conf.d/zabbix.conf
php_value date.timezone
安装生成的配置文件:/etc/zabbix/web/zabbix.conf.php
登录:admin/zabbix
agent端的配置:
~]# yum install zabbix-agent-3.0.2-1.el7.x86_64.rpm zabbix-sender-3.0.2-1.el7.x86_64.rpm
Unit file: zabbix-agent.service
配置文件:/etc/zabbix/zabbix_agentd.conf
############ GENERAL PARAMETERS #################
##### Passive checks related
被动监控相关配置
##### Active checks related
主动监控相关配置,agent端主动向server周期性发送数据;
############ ADVANCED PARAMETERS #################
####### USER-DEFINED MONITORED PARAMETERS #######
用户自定义参数
####### LOADABLE MODULES #######
####### TLS-RELATED PARAMETERS #######
##### Passive checks related
Server=IP1, IP2, ...
ListenPort=10050
ListenIP=0.0.0.0
StartAgents=3
##### Active checks related
ServerActive=IP1[:port], IP2[:port], ...
Hostname=Unique_HOSTNAME
必须与服务器配置的监控主机的主机名称保持一致;
启动服务:
systemctl start zabbix-agent.service
下面便是安装好的Zabbix界面
zabbix简单介绍
zabbix的功能及处理流程:数据采集-->数据存储-->数据展示和分析-->报警
数据采集方式:SNMP、agent、ICMP/SSH/IPMI
数据存储:cacti: rrd、nagios: , mysql、zabbix: mysql/pgsql/oracle
报警:mail(smtp)、Chat Message、SMS
确定zabbix的监控对象的方法:手动添加和自动发现
zabbix的gui配置项
- hosts, host group
- item, application
- item: key
- graph, screen
- trigger, event (discovery)
- action (notification, operation, condition)
zabbix仅运行在触发器上定义依赖关系;
模板是一系列配置的集合,它可以方便地快速部署在某监控对象上,并支持重复应用。模板可包含多种类型的条目:
- items
- triggers
- graphs
- applications
- screens (since Zabbix 2.0)
- low-level discovery rules (since Zabbix 2.0)
将模板应用至某主机上时,其定义的所有条目都会自动添加。因此,模板通常用于为某监控的服务或应用程序整合一组条目并将之分别应用于对应的运行了相应服务的主机上。此外,模板的另一个好处在于,必要时,修改了模板,被应用的主机都会相应的作出修改。
UserParameter:
UserParameter=<key>,<command>
实例:
serParameter=Nginx.active[*], /usr/bin/curl -s "http://$1:$2/status" | awk '/^Active/ {print $NF}'
UserParameter=Nginx.reading[*], /usr/bin/curl -s "http://$1:$2/status" | grep 'Reading' | cut -d" " -f2
UserParameter=Nginx.writing[*], /usr/bin/curl -s "http://$1:$2/status" | grep 'Writing' | cut -d" " -f4
UserParameter=Nginx.waiting[*], /usr/bin/curl -s "http://$1:$2/status" | grep 'Waiting' | cut -d" " -f6
UserParameter=Nginx.accepted[*], /usr/bin/curl -s "http://$1:$2/status" | awk '/^[ \t]+[0-9]+[ \t]+[0-9]+[ \t]+[0-9]+/ {print $$1}'
UserParameter=Nginx.handled[*], /usr/bin/curl -s "http://$1:$2/status" | awk '/^[ \t]+[0-9]+[ \t]+[0-9]+[ \t]+[0-9]+/ {print $$2}'
UserParameter=Nginx.requests[*], /usr/bin/curl -s "http://$1:$2/status" | awk '/^[ \t]+[0-9]+[ \t]+[0-9]+[ \t]+[0-9]+/ {print $$3}'
UserParameter=nginx.access_countaccess, /usr/lib/zabbix/externalscripts/logcheck_nginx.accesslog totalaccess
UserParameter=nginx.access_count200, /usr/lib/zabbix/externalscripts/logcheck_nginx.accesslog 200access
UserParameter=nginx.access_count202, /usr/lib/zabbix/externalscripts/logcheck_nginx.accesslog 202access
UserParameter=nginx.access_count4xx, /usr/lib/zabbix/externalscripts/logcheck_nginx.accesslog 4xxaccess
UserParameter=nginx.access_count3xx, /usr/lib/zabbix/externalscripts/logcheck_nginx.accesslog 3xxaccess
UserParameter=nginx.access_count5xx, /usr/lib/zabbix/externalscripts/logcheck_nginx.accesslog 5xxaccess
UserParameter=varnish.stat[*], /usr/lib/zabbix/externalscripts/varnishstatus varnish_stat $1
UserParameter=varnish.count[*], /usr/lib/zabbix/externalscripts/varnishstatus varnish_count $1
UserParameter=varnish.hitrate, /usr/lib/zabbix/externalscripts/varnishstatus varnish_hitrate
Delta (speed per second):保存为(value-prev_value)/(time-prev_time的计算结果,即当前值减去前一次获取的数据值,除以当前时间戳减去前一次值获取时的时间戳得到的结果;如果当前值小于前一次的值,其将会被丢弃;
Delta (simple change):保存为 (value-prev_value)的计算结果;
用户自定义参数:
/etc/zabbix
zabbix_agentd.conf, zabbix_agentd.d/*.conf
# This is a config file for Zabbix Agent (Unix)
# To get more information about Zabbix, visit http://www.zabbix.com
############ GENERAL PARAMETERS #################
### Option: PidFile
# Name of PID file.
#
# Mandatory: no
# Default:
PidFile=/var/log/zabbix/zabbix_agentd.pid
### Option: LogFile
# Name of log file.
# If not set, syslog is used.
#
# Mandatory: no
# Default:
# LogFile=
LogFile=/var/log/zabbix/zabbix_agentd.log
### Option: LogFileSize
# Maximum size of log file in MB.
# 0 - disable automatic log rotation.
#
# Mandatory: no
# Range: 0-1024
# Default:
# LogFileSize=1
LogFileSize=20
### Option: DebugLevel
# Specifies debug level
# 0 - no debug
# 1 - critical information
# 2 - error information
# 3 - warnings
# 4 - for debugging (produces lots of information)
#
# Mandatory: no
# Range: 0-4
# Default:
# DebugLevel=3
### Option: SourceIP
# Source IP address for outgoing connections.
#
# Mandatory: no
# Default:
# SourceIP=
### Option: EnableRemoteCommands
# Whether remote commands from Zabbix server are allowed.
# 0 - not allowed
# 1 - allowed
#
# Mandatory: no
# Default:
# EnableRemoteCommands=0
### Option: LogRemoteCommands
# Enable logging of executed shell commands as warnings.
# 0 - disabled
# 1 - enabled
#
# Mandatory: no
# Default:
# LogRemoteCommands=0
##### Passive checks related
### Option: Server
# List of comma delimited IP addresses (or hostnames) of Zabbix servers.
# Incoming connections will be accepted only from the hosts listed here.
# No spaces allowed.
# If IPv6 support is enabled then '127.0.0.1', '::127.0.0.1', '::ffff:127.0.0.1' are treated equally.
#
# Mandatory: no
# Default:
# Server=
Server=172.18.64.7,127.0.0.1
### Option: ListenPort
# Agent will listen on this port for connections from the server.
#
# Mandatory: no
# Range: 1024-32767
# Default:
# ListenPort=10050
### Option: ListenIP
# List of comma delimited IP addresses that the agent should listen on.
# First IP address is sent to Zabbix server if connecting to it to retrieve list of active checks.
#
# Mandatory: no
# Default:
# ListenIP=0.0.0.0
### Option: StartAgents
# Number of pre-forked instances of zabbix_agentd that process passive checks.
# If set to 0, disables passive checks and the agent will not listen on any TCP port.
#
# Mandatory: no
# Range: 0-100
# Default:
# StartAgents=3
StartAgents=2
##### Active checks related
### Option: ServerActive
# List of comma delimited IP:port (or hostname:port) pairs of Zabbix servers for active checks.
# If port is not specified, default port is used.
# IPv6 addresses must be enclosed in square brackets if port for that host is specified.
# If port is not specified, square brackets for IPv6 addresses are optional.
# If this parameter is not specified, active checks are disabled.
# Example: ServerActive=127.0.0.1:20051,zabbix.domain,[::1]:30051,::1,[12fc::1]
#
# Mandatory: no
# Default:
# ServerActive=
ServerActive=172.18.64.107
### Option: Hostname
# Unique, case sensitive hostname.
# Required for active checks and must match hostname as configured on the server.
# Value is acquired from HostnameItem if undefined.
#
# Mandatory: no
# Default:
# Hostname=
# Hostname=Zabbix server
### Option: HostnameItem
# Item used for generating Hostname if it is undefined.
# Ignored if Hostname is defined.
#
# Mandatory: no
# Default:
HostnameItem=system.hostname
### Option: RefreshActiveChecks
# How often list of active checks is refreshed, in seconds.
#
# Mandatory: no
# Range: 60-3600
# Default:
# RefreshActiveChecks=120
RefreshActiveChecks=60
### Option: BufferSend
# Do not keep data longer than N seconds in buffer.
#
# Mandatory: no
# Range: 1-3600
# Default:
# BufferSend=5
BufferSend=10
### Option: BufferSize
# Maximum number of values in a memory buffer. The agent will send
# all collected data to Zabbix Server or Proxy if the buffer is full.
#
# Mandatory: no
# Range: 2-65535
# Default:
# BufferSize=100
BufferSize=1000
### Option: MaxLinesPerSecond
# Maximum number of new lines the agent will send per second to Zabbix Server
# or Proxy processing 'log' and 'logrt' active checks.
# The provided value will be overridden by the parameter 'maxlines',
# provided in 'log' or 'logrt' item keys.
#
# Mandatory: no
# Range: 1-1000
# Default:
# MaxLinesPerSecond=100
MaxLinesPerSecond=200
### Option: AllowRoot
# Allow the agent to run as 'root'. If disabled and the agent is started by 'root', the agent
# will try to switch to user 'zabbix' instead. Has no effect if started under a regular user.
# 0 - do not allow
# 1 - allow
#
# Mandatory: no
# Default:
# AllowRoot=0
############ ADVANCED PARAMETERS #################
### Option: Alias
# Sets an alias for parameter. It can be useful to substitute long and complex parameter name with a smaller and simpler one.
#
# Mandatory: no
# Range:
# Default:
### Option: Timeout
# Spend no more than Timeout seconds on processing
#
# Mandatory: no
# Range: 1-30
# Default:
Timeout=20
### Option: Include
# You may include individual files or all files in a directory in the configuration file.
# Installing Zabbix will create include directory in /usr/local/etc, unless modified during the compile time.
#
# Mandatory: no
# Default:
# Include=
# Include=/usr/local/etc/zabbix_agentd.userparams.conf
Include=/usr/local/zabbix/etc/zabbix_agentd.conf.d/
####### USER-DEFINED MONITORED PARAMETERS #######
### Option: UnsafeUserParameters
# Allow all characters to be passed in arguments to user-defined parameters.
# 0 - do not allow
# 1 - allow
#
# Mandatory: no
# Range: 0-1
# Default:
# UnsafeUserParameters=0
### Option: UserParameter
# User-defined parameter to monitor. There can be several user-defined parameters.
# Format: UserParameter=<key>,<shell command>
# See 'zabbix_agentd' directory for examples.
#
# Mandatory: no
# Default:
# UserParameter=
Zabbix进阶
Python报警脚本示例:
#!/usr/bin/python
#coding:utf-8
import smtplib
from email.mime.text import MIMEText
from email.header import Header
from email.utils import parseaddr, formataddr
import sys
def formatAddr(s):
name, addr = parseaddr(s)
return formataddr((Header(name, 'utf-8').encode(), addr))
def send_mail(to_list,subject,content):
mail_host = 'smtp.exmail.qq.com'
mail_user = 'USERNAME@DOMAIN.TLD'
mail_pass = 'YOUR_PASSWORD'
#以上内容根据你的实际情况进行修改
msg = MIMEText(content,'','utf-8')
msg['Subject'] = Header(subject, 'utf-8').encode()
msg['From'] = formatAddr('zabbix监控 <%s>' % mail_user).encode()
msg['to'] = to_list
try:
s = smtplib.SMTP()
s.connect(mail_host)
s.login(mail_user,mail_pass)
s.sendmail(mail_user,to_list,msg.as_string())
s.close()
return True
except Exception,e:
print str(e)
return False
if __name__ == "__main__":
send_mail(sys.argv[1], sys.argv[2], sys.argv[3])
remote command远程action的执行
功能:在agent所在的主机上运行用户指定的命令或脚本;例如:重启服务;通过IPMI重启服务器;任何用户自定义脚本中定义的操作;
可执行的命令类型:
- IPMI
- ssh
- telnet
- Custom Script
- Global Script
前提:在agent需要完成的有关权限的配置:
(1) zabbix用户拥有所需要的管理权限;
编辑/etc/sudoers文件,注释如下行;
Defaults requiretty
添加如下行:
zabbix ALL=(ALL) NOPASSWD: ALL
(2) agent进程要允许执行远程命令;
编辑/etc/zabbix/zabbix_agentd.conf,设置如下配置:
EnableRemoteCommands=1
重启服务生效;
网络发现:
zabbix server扫描指定网络范围内的主机;
发现方式:在ip地址范围;
- 可用服务(ftp, ssh, http, ...)
- zabbix_agent的响应;
- snmp_agent的响应;
分两个阶段:
discovery :discovery events ; (Service, Host ) (UP/DOWN, DICOVERED/LOST )
actions:把discvery events当作前提条件;
可采取的动作:
- send message, remote command
- add/remove host
- enable/disable host
- add host to group
- link template to host
- ...
主动/被动 检测:
被动检测:相对于agent而言;agent, server向agent请求获取配置的各监控项相关的数据,agent接收请求、获取数据并响应给server;
主动检测:相对于agent而言;agent(active),agent向server请求与自己相关监控项配置,主动地将server配置的监控项相关的数据发送给server;
agent端所需要基本配置:
- ServerActive=
- Hostname=
- HostnameItem=
zabbix_sender发送数据:
zabbix server上的某主机上,直接定义Item时随便定义一个不与其它已有key冲突的key即可,即item type为“zabbix trapper";
zabbix_sender
- -z zabbix_server_ip
- -p zabbix_server_port
- -s zabbix_agent_hostname
- -k key
- -o value
基于SNMP(简单网络管理协议)监控:
Linux启用snmp的方法:
# yum install net-snmp net-snmp-utils
配置文件:/etc/snmp/snmpd.conf
定义ACL
.1.3.6.1.2.1.
- 1.1.0:系统描述信息,SysDesc
- 1.3.0:监控时间, SysUptime
- 1.5.0:主机名,SysName
- 1.7.0:主机提供的服务,SysService
- 2.1.0:网络接口数目
- 2.2.1.2:网络接口的描述信息
- 2.2.1.3:网络接口类型
- ……
view systemview included .1.3.6.1.2.1.1
view systemview included .1.3.6.1.2.1.2 # 网络接口的相关数据
view systemview included .1.3.6.1.4.1.2021 # 系统资料负载,memory, disk io, cpu load
view systemview included .1.3.6.1.2.1.25.1.1
启动服务:
systemctl start snmpd.service
测试工具:
# snmpget -v 2c -c public HOST OID
# snmpwalk -v 2c -c public HOST OID
Key <Unique string to be used as reference to triggers> For example, “my_param”.
JMX:
tomcat主机设置:
监控tomcat:在/etc/sysconfig/tomcat中添加下面一段
CATALINA_OPTS="-Djava.rmi.server.hostname=TOMCAT_SERVER_IP -Djavax.management.builder.initial= -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
zabbix-java-gateway主机设置:
安装 zabbix-java-gateway程序包,启动服务;
zabbix-server端设置(需要重启服务):
JavaGateway=172.16.0.70
JavaGatewayPort=10052
StartJavaPollers=5
添加监控项:
jmx[object_name,attribute_name]
object name - 它代表MBean的对象名称
attribute name - 一个MBean属性名称,可选的复合数据字段名称以点分隔
示例:
jmx["java.lang:type=Memory","HeapMemoryUsage.used"]
Zabbix Proxy的配置:
server-node-agent
server-proxy-agent
1、配置proxy主机:
(1) 安装程序包
zabbix-proxy-mysql zabbix-get
zabbix-agent zabbix-sender
(2) 准备数据库
创建、授权用户、导入schema.sql;
(3) 修改配置文件
Server=
zabbix server主机地址;
Hostname=
当前代理服务器的名称;在server添加proxy时,必须使用此处指定的名称;
需要事先确保server能解析此名称;
DBHost=
DBName=
DBUser=
DBPassword=
ConfigFrequency=10
DataSenderFrequency=1
2、在server端添加此Porxy
Administration --> Proxies
3、在Server端配置通过此Proxy监控的主机;
注意:zabbix agent端要允许zabbix proxy主机执行数据采集操作:
Server=
zabbix performace tuning:
nvps:new values per second
100w/m, 15000/s
Zibbix的优化
Database:历史数据不要保存太长时长且尽量让数据缓存在数据库服务器的内存中;
触发器表达式:减少使用min(), max(), avg();尽量使用last(),nodata();
数据收集:polling较慢(减少使用SNMP/agentless/agent);尽量使用trapping(agent(active));
数据类型:文本型数据处理速度较慢;尽量少收集类型为text或string类型的数据;多使用类型为numeric的;
zabbix服务器的进程:
(1) 服务器组件的数量;
- alerter, discoverer, escalator, http poller, hourekeeper, icmp pinger, ipmi polller, poller, trapper, configration syncer, ...
- StartPollers=60
- StartPingers=10
- ...
- StartDBSyncer=5
- ...
(2) 设定合理的缓存大小
- CacheSize=8M
- HistoryCacheSize=16M
- HistoryIndexCacheSize=4M
- TrendCacheSize=4M
- ValueCacheSize=4M
(3) 数据库优化
- 分表:
- history_*
- trends*
- events*