搭建SpringBoot+Prometheus+Grafana自动监控平台
promitheus作用:
它是一个开源的专门做系统监控和系统报警的软件,加入了CNCF基金会,而上一个加入基金会的是Kubernates,支持多种exporter采集指标数据,还支持PushGateway进行数据上报,Promethus性能足够支持上万台规模的集群。
Grafana是一个跨平台的开源的度量分析和可视化工具,可以通过将采集的指标数据查询然后可视化的展示。
指标监控(Monitoring):Linux占用内存,CPU负载占用率,磁盘IO输入输出,线程数量
链路追踪(Tracing):业务相关,多系统完成业务处理
日志收集(Logging):集成ELK,方便查看日志信息
对于微服务的监控架构图
Springboot部分:
pom.xml
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> <scope>runtime</scope> </dependency>
application.yml
spring:
application:
name: springboot-prometheus
management:
endpoint:
prometheus:
enabled: true
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
endpoints:
web:
exposure:
include: "*"
自动装载:
@Bean MeterRegistryCustomizer<MeterRegistry> configurer(@Value("${spring.application.name}")String applicationName){ return (registry) -> registry.config().commonTags("application",applicationName); }
服务器部分
mkdir /etc/prometheus
vi prometheus.yml
vi rule.yml #规则引擎配置
vi alermanager.yml #警告配置(发送邮件)
vi docker-compose.yml
prometheus.yml
# 全局配置 global: scrape_interval: 15s evaluation_interval: 15s # scrape_timeout is set to the global default (10s). # 告警配置 alerting: alertmanagers: - static_configs: - targets: ['#{程序所在IP}:9093'] # 加载一次规则,并根据全局“评估间隔”定期评估它们。 #rule_files: #- "/etc/prometheus/rules.yml" # 控制Prometheus监视哪些资源 # 默认配置中,有一个名为prometheus的作业,它会收集Prometheus服务器公开的时间序列数据。 scrape_configs: # 作业名称将作为标签“job=<job_name>`添加到此配置中获取的任何数据。 - job_name: 'springboot_prometheus' scrape_interval: 5s metrics_path: '/actuator/prometheus' static_configs: - targets: ['#{程序所在IP}:8081']
rule.yml
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
serverity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
alermanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'xxx@xxx:587'
smtp_from: 'zhaoysz@xxx'
smtp_auth_username: 'xxx@xxx'
smtp_auth_password: 'xxxx'
smtp_require_tls: true
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'test-mails'
receivers:
- name: 'test-mails'
email_configs:
- to: 'scottcho@qq.com'
docker-compose.yml
version: '3.7'
networks:
dispacher-network:
name: dispacher-network
external: true
services:
prometheus:
image: prom/prometheus
volumes:
- /etc/prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
- '--web.external-url=http://${程序所在IP}:9090/'
- '--web.enable-lifecycle'
- '--storage.tsdb.retention=15d'
ports:
- 9090:9090
links:
- alertmanager:alertmanager
restart: always
networks:
- dispacher-network
alertmanager:
image: prom/alertmanager
container_name: alertmanager_gpe
hostname: alertmanager
restart: always
volumes:
- /data/gpe/alertmanager/alertmanager.yml:/etc/prometheus/alertmanager.yml
ports:
- "9093:9093"
networks:
- dispacher-network
grafana:
image: grafana/grafana
ports:
- 3000:3000
volumes:
- /etc/grafana/:/etc/grafana/provisioning/
- grafana_data:/var/lib/grafana
environment:
- GF_INSTALL_PLUGINS=camptocamp-prometheus-alertmanager-datasource
links:
- prometheus:prometheus
- alertmanager:alertmanager
restart: always
volumes:
prometheus_data: {}
grafana_data: {}
alertmanager_data: {}
Promitheus访问:http://#{服务器IP}:9090/targets
Grafana访问:http://14.18.43.72:3000/ admin/admin 设置新密码