基于Prometheus+Grafana+AlertManager的监控系统
一、Prometheus
1.1 简介
Prometheus是一套开源的监控&报警&时间序列数据库的组合,基于应用的metrics来进行监控的开源工具 。
1.2 下载&安装
(1)下载地址:https://prometheus.io/download/
(2) 解压:tar zxvf prometheus-2.12.0.linux-amd64.tar.gz
(3) 编辑: prometheus.yml,其中包括全局、alertmanager、告警规则、监控job配置,具体内容如下。
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 192.168.88.69:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "test_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['192.168.88.69:9090'] - job_name: 'monitor' scrape_interval: 5s metrics_path: '/actuator/prometheus' static_configs: - targets: ['192.168.88.69:8008'] - job_name: 'node-exporter' static_configs: - targets: ['192.168.88.69:9100']
(4) 启动:./prometheus &
(5) 验证安装:访问地址:http://192.168.88.69:9090/targets
1.3 Spring Boot集成Prometheus
(1)配置pom文件
<!--监控--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency>
(2)配置yml
server:
port: 8008
spring:
application:
name: monitor
management:
endpoints:
web:
exposure:
include: '*'
metrics:
tags:
application: ${spring.application.name}
(3)添加配置类
@Configuration public class MeterRegistryConfig { @Bean MeterRegistryCustomizer<MeterRegistry> configurer(@Value("${spring.application.name}") String applicationName) { return (registry) -> registry.config().commonTags("application", applicationName); } }
二、AlertManager
2.1简介
Alertmanager 对收到的告警信息进行处理,包括去重,降噪,分组,策略路由告警通知。
2.2 配置
修改alertmanager.yml,当前配置的是邮箱告警,当然还支持企业微信、钉钉等,内容如下:
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.mxhichina.com:25' # smtp地址
smtp_from: 'test@163.com' # 发送邮箱地址
smtp_auth_username: 'test@163.com' # 邮箱用户
smtp_auth_password: '123456' # 邮箱密码
route:
group_by: ["instance"] # 分组名
group_wait: 10s # 当收到告警的时候,等待十秒看是否还有告警,如果有就一起发出去
igroup_interval: 10s # 发送警告间隔时间
repeat_interval: 1h # 重复报警的间隔时间
receiver: mail # 全局报警组,这个参数是必选的,和下面报警组名要相同
receivers:
- name: 'mail' # 报警组名
email_configs:
- to: 'receiver@163.com' # 收件人邮箱
headers: {Subject: "告警测试邮件"}
2.3 启动
命令:./alertmanager & ,端口号:9093
三、Grafana
3.1 简介
Grafana是一款用Go语言开发的开源数据可视化工具,可以做数据监控和数据统计,带有告警功能。
3.2 配置
1.解压grafana-6.3.5.linux-amd64.tar.gz,启动 ./grafana-server &,访问地址http://192.168.88.69:3000
2.配置Data Sources
3.安装exporter,如要监控服务器的运行状态,需要安装node_exporter,并启动项目,端口号:9100,并在prometheus里配置节点,并重启prometheus。
4.导入模板,可以在Grafana官网找下,地址:https://grafana.com/grafana/dashboards。