prometheus监控rabbitmq,rabbitmq_exporter

【1】介绍

RabbitMQ 指标的 Prometheus 导出器。数据由prometheus 抓取。

请注意这是一个非官方插件。还有一个来自RabbitMQ.com的官方插件

(1.0)RabbitMQ 结合 Prometheus 的两种方式

第一种:RabbitMQ内部集成Prometheus来获取指标

 
3.8.0之前版本,RabbitMQ可以使用单独的插件prometheus_rabbitmq_exporter来向Prometheus公开指标,要单独下载到RabbitMQ安装目录中进行安装;
 
3.8.0版开始,RabbitMQ附带了内置的Prometheus&Grafana支持。虽然内置了该插件,但也要进行安装
     rabbitmq-prometheus:https://github.com/rabbitmq/rabbitmq-prometheus
 
第二种:使用独立程序来获取指标(RabbitMQ_exporter)
 
  不管什么版本都能使用,要单独启动exporter进程
 
RabbitMQ 官方监控介绍:
本文是采用第1种方式实现。

(1.1)RabbitMq 启动prometheus监控插件

参考官网文档: https://www.rabbitmq.com/prometheus.html

       rabbitmq内置的有prometheu插件,需要开启

#可以在线开启,注意必须要rabbitmq服务存货
rabbitmq-plugins enable rabbitmq_prometheus 关闭插件 rabbitmq-plugins disable rabbitmq_prometheus

 

可以在 rabbitmq UI界面看到 启动监听的端口,默认15692 端口

  

(1.2)核验

  windows:http://yourIP:15692/metrics

  linux:curl 127.0.0.1:15692/metrics    ,非mq所在服务器需要把127.0.0.1改成你mq所在服务器的 ip

(1.3)默认配置


此导出器通过prometheus.*配置键支持以下选项:

prometheus.path 定义到处端点,默认是“/metrics”。

prometheus.tcp.* 控制匹配的HTTP监听器设置those used by the RabbitMQ HTTP API。

prometheus.ssl.* 控制匹配的TLS(HTTPS)监听器设置those used by the RabbitMQ HTTP API。

简单示例:
# these values are defaults
prometheus.path = /metrics
prometheus.tcp.port = 15692

这些配置可以通过rabbitmq的配置文件来修改,配置文件默认路径如下:

/etc/rabbitmq/rabbitmq.conf

插件地址:https://github.com/rabbitmq/rabbitmq-prometheus/blob/master/metrics.md

指标说明:https://github.com/rabbitmq/rabbitmq-prometheus/blob/master/metrics.md

【2】整合 prometheus+alertmanager+grafana

(2.1)整合 prometheus

 - job_name: 'rabbitmq'
   scrape_interval: 60s
   scrape_timeout: 60s
   static_configs:
     - targets: ['yourIP:15692','yourIP:15692']

(2.2)整合 grafana

进入grafana官网,搜索 dashboard,然后再搜索rabbitmq,会出现很多仪表盘自己选一个即可

(2.3)rule告警

groups:
- name: example
  rules:
  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

- name: Rabbitmq-运行状态
  rules:
  - alert: Rabbitmq-down
    expr: rabbitmq_up{job='RabbitMQ'} != 1
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} is Down ! ! !"
      value: '{{ $value }}'
      summary:  "The host node is down"

- name: Rabbitmq disk free limit
  rules:
  - alert: Rabbitmq disk free limit   status
    expr: rabbitmq_node_disk_free{job='RabbitMQ'} / 1024 / 1024  <= rabbitmq_node_disk_free_limit{job='RabbitMQ'} / 1024 / 1024 + 200
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rmq free disk is to low ! ! !"
      value: '{{ $value }} MB'
      summary:  "The rmq free disk too low"
      
- name: RabbitMQ-内存使用>300MB
  rules:
  - alert: RabbitMQ-内存使用>300MB   status
    expr: rabbitmq_node_mem_used{job='RabbitMQ'} /1024 /1024 > 300
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rabbitmq use memory is to High ! ! !"
      value: '{{ $value }} MB'
      summary:  "the rabbitmq use memory is to High"
      
- name: RabbitMQ-没有ACK应答队列>0
  rules:
  - alert: RabbitMQ-unack>0   status
    expr: rabbitmq_queue_messages_unacknowledged_global{job='RabbitMQ'}  > 0
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rabbitmq_queue_messages_unacknowledged_global > 0  ! ! !"
      value: '{{ $value }} '
      summary:  "the rabbitmq_queue_messages_unacknowledged_global > 0"

 

【参考文档】

对生产环境的rabbitMQ实时监控并告警

 

posted @ 2023-06-07 11:19  郭大侠1  阅读(1406)  评论(0编辑  收藏  举报