对生产环境的rabbitMQ实时监控并告警

对生产环境的rabbitMQ实时监控并告警

第一种:RabbitMQ内部集成Prometheus来获取指标
 
3.8.0之前版本,RabbitMQ可以使用单独的插件prometheus_rabbitmq_exporter来向Prometheus公开指标,要单独下载到RabbitMQ安装目录中进行安装;
 
3.8.0版开始,RabbitMQ附带了内置的Prometheus&Grafana支持。虽然内置了该插件,但也要进行安装
     rabbitmq-prometheus:https://github.com/rabbitmq/rabbitmq-prometheus
 
第二种:使用独立程序来获取指标(RabbitMQ_exporter)
 
  不管什么版本都能使用,要单独启动exporter进程
 
RabbitMQ 官方监控介绍:
本文是采用第二种方式实现。
 

一、rabbitmq_exporter 安装

 

1、下载二进制包

wget https://github.com/kbudde/rabbitmq_exporter/releases/download/v1.0.0-RC13/rabbitmq_exporter_1.0.0-RC13_linux_amd64.tar.gz
mkdir -p /usr/local/rabbitmq_exporter
tar xvf rabbitmq_exporter_1.0.0-RC13_linux_amd64.tar.gz -C /usr/local/rabbitmq_exporter
cat > /usr/local/rabbitmq_exporter/config.example.json <<- EOF
{
    "rabbit_url": "http://172.18.215.10:15672",
    "rabbit_user": "monitor",
    "rabbit_pass": "passwd",
    "publish_port": "9419",
    "publish_addr": "",
    "output_format": "TTY",
    "ca_file": "ca.pem",
    "cert_file": "client-cert.pem",
    "key_file": "client-key.pem",
    "insecure_skip_verify": false,
    "exlude_metrics": [],
    "include_queues": ".*",
    "skip_queues": "^$",
    "skip_vhost": "^$",
    "include_vhost": ".*",
    "rabbit_capabilities": "no_sort,bert",
    "enabled_exporters": [
            "exchange",
            "node",
            "overview",
            "queue"
    ],
    "timeout": 30,
    "max_queues": 0
}
EOF

2、创建用户

useradd -M -s /sbin/nologin prometheus

3、创建服务

cat > /usr/lib/systemd/system/rabbitmq_exporter.service <<- EOF
[Unit]
Description=rabbitmq_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/rabbitmq_exporter/rabbitmq_exporter -config-file /usr/local/rabbitmq_exporter/config.example.json
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

4、启动服务

systemctl enable rabbitmq_exporter && systemctl start rabbitmq_exporter
journalctl -fu rabbitmq_exporter.service

5、访问

 6、配置prometheue.yml

 

...
  - job_name: 'RabbitMQ'
    static_configs:
    - targets: ['172.18.215.10:9419']
      labels:
        instance: RabbitMQ-XH

7、导入模版

可以根据需要自己配
{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "description": "Basic rabbitmq host stats: Node Stats, Exchanges, Channels, Consumers,  Connections, Queues, Messages, Messages per Queue, Memory, File Descriptors, Sockets.",
  "editable": true,
  "gnetId": 4371,
  "graphTooltip": 0,
  "id": 30,
  "iteration": 1577172562108,
  "links": [],
  "panels": [
    {
      "cacheTimeout": null,
      "colorBackground": true,
      "colorValue": false,
      "colors": [
        "rgba(50, 172, 45, 0.97)",
        "rgba(237, 129, 40, 0.89)",
        "rgba(245, 54, 54, 0.9)"
      ],
      "datasource": "Prometheus",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 7,
        "w": 6,
        "x": 0,
        "y": 0
      },
      "id": 13,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "options": {},
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "rabbitmq_up{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "metric": "rabbitmq_up",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": "Up,Down",
      "timeFrom": "30s",
      "title": "RabbitMQ Server",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "N/A",
          "value": "null"
        },
        {
          "op": "=",
          "text": "Down",
          "value": "0"
        },
        {
          "op": "=",
          "text": "Up",
          "value": "1"
        }
      ],
      "valueName": "current"
    },
    {
      "alert": {
        "conditions": [
          {
            "evaluator": {
              "params": [
                1
              ],
              "type": "lt"
            },
            "operator": {
              "type": "and"
            },
            "query": {
              "params": [
                "A",
                "10s",
                "now"
              ]
            },
            "reducer": {
              "params": [],
              "type": "last"
            },
            "type": "query"
          },
          {
            "evaluator": {
              "params": [],
              "type": "no_value"
            },
            "operator": {
              "type": "and"
            },
            "query": {
              "params": [
                "A",
                "10s",
                "now"
              ]
            },
            "reducer": {
              "params": [],
              "type": "last"
            },
            "type": "query"
          }
        ],
        "executionErrorState": "alerting",
        "frequency": "60s",
        "handler": 1,
        "message": "Some of the RabbitMQ node is down",
        "name": "Node Stats alert",
        "noDataState": "no_data",
        "notifications": []
      },
      "aliasColors": {},
      "bars": true,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "decimals": 0,
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 18,
        "x": 6,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 12,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": false,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_running{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{node}}",
          "metric": "rabbitmq_running",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": [
        {
          "colorMode": "critical",
          "fill": true,
          "line": true,
          "op": "lt",
          "value": 1
        }
      ],
      "timeFrom": "30s",
      "timeRegions": [],
      "timeShift": null,
      "title": "Node up Stats",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "decimals": 0,
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 0,
        "y": 7
      },
      "hiddenSeries": false,
      "id": 6,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_exchanges{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{instance}}:exchanges",
          "metric": "rabbitmq_exchangesTotal",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Exchanges",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "decimals": 0,
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 8,
        "y": 7
      },
      "hiddenSeries": false,
      "id": 4,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_channels{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{instance}}:channels",
          "metric": "rabbitmq_channelsTotal",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Channels",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "decimals": 0,
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 16,
        "y": 7
      },
      "hiddenSeries": false,
      "id": 3,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_consumers{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{instance}}:consumers",
          "metric": "rabbitmq_consumersTotal",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Consumers",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "decimals": 0,
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 0,
        "y": 14
      },
      "hiddenSeries": false,
      "id": 5,
      "legend": {
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_connections{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{instance}}:connections",
          "metric": "rabbitmq_connectionsTotal",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Connections",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 8,
        "y": 14
      },
      "hiddenSeries": false,
      "id": 7,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_queues{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{instance}}:queues",
          "metric": "rabbitmq_queuesTotal",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Queues",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "decimals": 0,
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 16,
        "y": 14
      },
      "hiddenSeries": false,
      "id": 8,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "sum by (vhost)(rabbitmq_queue_messages_global {instance=~\"$instance.*\"})",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{vhost}}:total",
          "metric": "rabbitmq_queue_messages_global",
          "refId": "C",
          "step": 2
        },
        {
          "expr": "sum by (vhost)(rabbitmq_queue_messages_ready_global{instance=~\"$instance.*\"})",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{vhost}}:ready",
          "metric": "rabbitmq_queue_messages_ready_global",
          "refId": "A",
          "step": 2
        },
        {
          "expr": "sum by (vhost)(rabbitmq_queue_messages_unacknowledged_global{instance=~\"$instance.*\"})",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{vhost}}:unack",
          "metric": "rabbitmq_queue_messages_unacknowledged_global",
          "refId": "D",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Messages/host",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "decimals": 0,
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 0,
        "y": 21
      },
      "hiddenSeries": false,
      "id": 2,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": false,
        "min": false,
        "rightSide": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_queue_messages_global{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{queue}}:{{durable}}",
          "metric": "rabbitmq_queue_messages_global",
          "refId": "A",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Messages / Queue",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 12,
        "y": 21
      },
      "hiddenSeries": false,
      "id": 9,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_node_mem_used{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{node}}:used",
          "metric": "rabbitmq_node_mem_used",
          "refId": "A",
          "step": 2
        },
        {
          "expr": "rabbitmq_node_mem_limit{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{node}}:limit",
          "metric": "node_mem",
          "refId": "B",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Memory",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "decbytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 0,
        "y": 28
      },
      "hiddenSeries": false,
      "id": 10,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_fd_used{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{node}}:used",
          "metric": "",
          "refId": "A",
          "step": 2
        },
        {
          "expr": "rabbitmq_fd_total{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{node}}:total",
          "metric": "node_mem",
          "refId": "B",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "FIle descriptors",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 12,
        "y": 28
      },
      "hiddenSeries": false,
      "id": 11,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rabbitmq_sockets_used{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{node}}:used",
          "metric": "",
          "refId": "A",
          "step": 2
        },
        {
          "expr": "rabbitmq_sockets_total{instance=~\"$instance.*\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "{{node}}:total",
          "metric": "",
          "refId": "B",
          "step": 2
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Sockets",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "5s",
  "schemaVersion": 21,
  "style": "dark",
  "tags": [
    "rabbitmq",
    "prometheus"
  ],
  "templating": {
    "list": [
      {
        "current": {
          "text": "default",
          "value": "default"
        },
        "hide": 0,
        "includeAll": false,
        "label": null,
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "allValue": ".*",
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": "$datasource",
        "definition": "",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": false,
        "name": "instance",
        "options": [],
        "query": "label_values(rabbitmq_up, instance)",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-5m",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "browser",
  "title": "RabbitMQ Metrics",
  "uid": "3xTRkqBWk",
  "version": 7
}
View Code
 

 二、配置报警

1、创建报警规则

cat > /data/prometheus/alert_rules.yml <<- EOF
groups:
- name: example
  rules:
  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

- name: Rabbitmq-运行状态
  rules:
  - alert: Rabbitmq-down
    expr: rabbitmq_up{job='RabbitMQ'} != 1
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} is Down ! ! !"
      value: '{{ $value }}'
      summary:  "The host node is down"

- name: Rabbitmq disk free limit
  rules:
  - alert: Rabbitmq disk free limit   status
    expr: rabbitmq_node_disk_free{job='RabbitMQ'} / 1024 / 1024  <= rabbitmq_node_disk_free_limit{job='RabbitMQ'} / 1024 / 1024 + 200
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rmq free disk is to low ! ! !"
      value: '{{ $value }} MB'
      summary:  "The rmq free disk too low"
      
- name: RabbitMQ-内存使用>300MB
  rules:
  - alert: RabbitMQ-内存使用>300MB   status
    expr: rabbitmq_node_mem_used{job='RabbitMQ'} /1024 /1024 > 300
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rabbitmq use memory is to High ! ! !"
      value: '{{ $value }} MB'
      summary:  "the rabbitmq use memory is to High"
      
- name: RabbitMQ-没有ACK应答队列>0
  rules:
  - alert: RabbitMQ-unack>0   status
    expr: rabbitmq_queue_messages_unacknowledged_global{job='RabbitMQ'}  > 0
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rabbitmq_queue_messages_unacknowledged_global > 0  ! ! !"
      value: '{{ $value }} '
      summary:  "the rabbitmq_queue_messages_unacknowledged_global > 0"
EOF

2、修改prometheue.yml文件

...
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['172.18.156.179:59093']
rule_files:
  - "/etc/prometheus/alert_rules.yml"
...

  - job_name: 'alertmanagers'
    static_configs:
    - targets: ['172.18.156.179:59093']

要把alert_rules.yml规则映射到docker里,docker rm -f  prometheus,然后重构prometheus

docker run -d -p 59090:9090 \
-v /data/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
-v "/etc/localtime:/etc/localtime" \
-v /data/prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml \
--name prometheus \
prom/prometheus:v2.27.1

打开http://172.18.156.179:59090/alerts

 

 3、配置钉钉webhoob

https://hub.docker.com/r/timonwong/prometheus-webhook-dingtalk

#先进入钉钉创建报警群组--添加报警机器人--创建webhook--保存自动生成的秘钥

 

mkdir -p /usr/local/prometheus-webhook-dingtalk/
cat > /usr/local/prometheus-webhook-dingtalk/config.yml <<- EOF
targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxx
    # secret for signature
    secret: xxxxxxxxxxxxxxxxxxx
EOF
docker run -d  --name webhook \
-p 58060:8060 \
-v /usr/local/prometheus-webhook-dingtalk/config.yml:/etc/prometheus-webhook-dingtalk/config.yml \
timonwong/prometheus-webhook-dingtalk:latest

 

4、配置Altermanger

cat > /data/prometheus/alertmanager.yml <<- EOF
global:
  resolve_timeout: 2m
route:
  group_by: ['alertname', 'cluster']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 5m
  receiver: default # 优先使用default发送
receivers:
  - name: 'default'
    webhook_configs:
    - url: http://172.18.156.179:58060/dingtalk/webhook1/send  
      send_resolved: true # 发送已解决通知
EOF
docker run -d --name Alertmanager \
-p 59093:9093 \
-v /data/prometheus/alertmanager.yml:/opt/bitnami/alertmanager/conf/config.yml \
bitnami/alertmanager:latest

访问http://172.18.156.179:59093

 

 

  

报警触发

 

 

三. Prometheus与其他模块对接

  由于Prometheus灵活的接口配置和数据获取方式,可以很灵活的与其他模块进行对接,用于实时监控多个模块。

包括以下常用模块:

1.node_exporter

用户监控节点虚拟机的指标信息。

下载地址:https://github.com/prometheus/node_exporter/releases/download/v0.17.0-rc.0/node_exporter-0.17.0-rc.0.linux-386.tar.gz

2.jmx_exporter

下载地址:https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar

3.elasticsearch_exporter

下载地址:https://github.com/justwatchcom/elasticsearch_exporter/releases/download/v1.0.4rc1/elasticsearch_exporter-1.0.4rc1.linux-386.tar.gz

4.redis_exporter

下载地址:https://github.com/oliver006/redis_exporter/releases/download/v0.22.0/redis_exporter-v0.22.0.linux-386.tar.gz

5.mysqld_exporter

下载地址:  https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-386.tar.gz

6.postgres_exporter

下载地址:https://github.com/wrouesnel/postgres_exporter/releases/download/v0.4.7/postgres_exporter_v0.4.7_linux-amd64.tar.gz

7.mongodb_exporter

下载地址:https://github.com/dcu/mongodb_exporter/releases/download/v1.0.0/mongodb_exporter-linux-amd64

8.statsd_exporter

下载地址:https://github.com/prometheus/statsd_exporter/releases/download/v0.8.0/statsd_exporter-0.8.0.linux-amd64.tar.gz

9.mesos_exporter

下载地址:https://github.com/mesos/mesos_exporter/releases/download/v1.1.1/mesos_exporter-1.1.1.linux-arm64.tar.gz

10.apache_exporter

下载地址:https://github.com/Lusitaniae/apache_exporter/releases/download/v0.5.0/apache_exporter-0.5.0.linux-amd64.tar.gz

11.hadoop_exporter

下载地址:https://github.com/wyukawa/hadoop_exporter

12.logstash_exporter

下载地址:https://github.com/BonnierNews/logstash_exporter/archive/v0.1.2.tar.gz

13.sql_exporter

下载地址:https://github.com/justwatchcom/sql_exporter/releases/download/v0.2.0/sql_exporter-0.2.0.linux-amd64.tar.gz

14.oracle_exporter

下载地址:https://github.com/iamseth/oracledb_exporter/releases/download/0.0.8/oracledb_exporter.linux-amd64

15.zookeeper_exporter

下载地址1:https://github.com/carlpett/zookeeper_exporter/releases/download/v1.0.1/zookeeper_exporter-1.0.1.linux-amd64.tar.gz

下载地址2:https://github.com/carlpett/zookeeper_exporter/releases/download/v1.0.2/zookeeper_exporter

16.influxdb_exporter

下载地址:https://github.com/prometheus/influxdb_exporter/releases/download/v0.1.0/influxdb_exporter-0.1.0.linux-amd64.tar.gz

17.zabbix_exporter

下载地址:https://github.com/MyBook/zabbix-exporter/archive/1.0.2.tar.gz

18.opentsdb_exporter

下载地址:https://github.com/cloudflare/opentsdb_exporter/archive/0.0.3.tar.gz

19.grafana_exporter

下载地址:https://github.com/frodenas/grafana_exporter/releases/download/v0.1.0/grafana_exporter-0.1.0.linux-amd64.tar.gz

20.json_exporter

下载地址:https://github.com/sciffer/json_exporter

21.RocketMQ_exporter
下载地址:https://github.com/apache/rocketmq-exporter

参考文档

https://cloud.tencent.com/developer/article/1580187

https://blog.51cto.com/u_15111052/3101859

https://www.cnblogs.com/weifeng1463/p/12485368.html

posted @ 2022-04-22 14:59  梦里花落知多少sl  阅读(4479)  评论(0编辑  收藏  举报