prometheus监控rabbitmq,rabbitmq_exporter
【1】介绍
RabbitMQ 指标的 Prometheus 导出器。数据由prometheus 抓取。
请注意这是一个非官方插件。还有一个来自RabbitMQ.com的官方插件。
(1.0)RabbitMQ 结合 Prometheus 的两种方式
第一种:RabbitMQ内部集成Prometheus来获取指标
(1.1)RabbitMq 启动prometheus监控插件
参考官网文档: https://www.rabbitmq.com/prometheus.html
rabbitmq内置的有prometheu插件,需要开启
#可以在线开启,注意必须要rabbitmq服务存货
rabbitmq-plugins enable rabbitmq_prometheus 关闭插件 rabbitmq-plugins disable rabbitmq_prometheus
可以在 rabbitmq UI界面看到 启动监听的端口,默认15692 端口
(1.2)核验
windows:http://yourIP:15692/metrics
linux:curl 127.0.0.1:15692/metrics ,非mq所在服务器需要把127.0.0.1改成你mq所在服务器的 ip
(1.3)默认配置
此导出器通过prometheus.*配置键支持以下选项:
prometheus.path 定义到处端点,默认是“/metrics”。
prometheus.tcp.* 控制匹配的HTTP监听器设置those used by the RabbitMQ HTTP API。
prometheus.ssl.* 控制匹配的TLS(HTTPS)监听器设置those used by the RabbitMQ HTTP API。
简单示例:# these values are defaults prometheus.path = /metrics prometheus.tcp.port = 15692
这些配置可以通过rabbitmq的配置文件来修改,配置文件默认路径如下:
/etc/rabbitmq/rabbitmq.conf
插件地址:https://github.com/rabbitmq/rabbitmq-prometheus/blob/master/metrics.md
指标说明:https://github.com/rabbitmq/rabbitmq-prometheus/blob/master/metrics.md
【2】整合 prometheus+alertmanager+grafana
(2.1)整合 prometheus
- job_name: 'rabbitmq' scrape_interval: 60s scrape_timeout: 60s static_configs: - targets: ['yourIP:15692','yourIP:15692']
(2.2)整合 grafana
进入grafana官网,搜索 dashboard,然后再搜索rabbitmq,会出现很多仪表盘自己选一个即可
(2.3)rule告警
groups: - name: example rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." # Alert for any instance that has a median request latency >1s. - alert: APIHighRequestLatency expr: api_http_request_latencies_second{quantile="0.5"} > 1 for: 10m annotations: summary: "High request latency on {{ $labels.instance }}" description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)" - name: Rabbitmq-运行状态 rules: - alert: Rabbitmq-down expr: rabbitmq_up{job='RabbitMQ'} != 1 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} is Down ! ! !" value: '{{ $value }}' summary: "The host node is down" - name: Rabbitmq disk free limit rules: - alert: Rabbitmq disk free limit status expr: rabbitmq_node_disk_free{job='RabbitMQ'} / 1024 / 1024 <= rabbitmq_node_disk_free_limit{job='RabbitMQ'} / 1024 / 1024 + 200 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} the rmq free disk is to low ! ! !" value: '{{ $value }} MB' summary: "The rmq free disk too low" - name: RabbitMQ-内存使用>300MB rules: - alert: RabbitMQ-内存使用>300MB status expr: rabbitmq_node_mem_used{job='RabbitMQ'} /1024 /1024 > 300 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} the rabbitmq use memory is to High ! ! !" value: '{{ $value }} MB' summary: "the rabbitmq use memory is to High" - name: RabbitMQ-没有ACK应答队列>0 rules: - alert: RabbitMQ-unack>0 status expr: rabbitmq_queue_messages_unacknowledged_global{job='RabbitMQ'} > 0 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} the rabbitmq_queue_messages_unacknowledged_global > 0 ! ! !" value: '{{ $value }} ' summary: "the rabbitmq_queue_messages_unacknowledged_global > 0"
【参考文档】