对生产环境的rabbitMQ实时监控并告警
对生产环境的rabbitMQ实时监控并告警
一、rabbitmq_exporter 安装
1、下载二进制包
wget https://github.com/kbudde/rabbitmq_exporter/releases/download/v1.0.0-RC13/rabbitmq_exporter_1.0.0-RC13_linux_amd64.tar.gz mkdir -p /usr/local/rabbitmq_exporter tar xvf rabbitmq_exporter_1.0.0-RC13_linux_amd64.tar.gz -C /usr/local/rabbitmq_exporter
cat > /usr/local/rabbitmq_exporter/config.example.json <<- EOF { "rabbit_url": "http://172.18.215.10:15672", "rabbit_user": "monitor", "rabbit_pass": "passwd", "publish_port": "9419", "publish_addr": "", "output_format": "TTY", "ca_file": "ca.pem", "cert_file": "client-cert.pem", "key_file": "client-key.pem", "insecure_skip_verify": false, "exlude_metrics": [], "include_queues": ".*", "skip_queues": "^$", "skip_vhost": "^$", "include_vhost": ".*", "rabbit_capabilities": "no_sort,bert", "enabled_exporters": [ "exchange", "node", "overview", "queue" ], "timeout": 30, "max_queues": 0 } EOF
2、创建用户
useradd -M -s /sbin/nologin prometheus
3、创建服务
cat > /usr/lib/systemd/system/rabbitmq_exporter.service <<- EOF [Unit] Description=rabbitmq_exporter Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/rabbitmq_exporter/rabbitmq_exporter -config-file /usr/local/rabbitmq_exporter/config.example.json Restart=on-failure [Install] WantedBy=multi-user.target EOF
4、启动服务
systemctl enable rabbitmq_exporter && systemctl start rabbitmq_exporter
journalctl -fu rabbitmq_exporter.service
5、访问
6、配置prometheue.yml
...
- job_name: 'RabbitMQ'
static_configs:
- targets: ['172.18.215.10:9419']
labels:
instance: RabbitMQ-XH
7、导入模版
{ "annotations": { "list": [ { "builtIn": 1, "datasource": "-- Grafana --", "enable": true, "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", "type": "dashboard" } ] }, "description": "Basic rabbitmq host stats: Node Stats, Exchanges, Channels, Consumers, Connections, Queues, Messages, Messages per Queue, Memory, File Descriptors, Sockets.", "editable": true, "gnetId": 4371, "graphTooltip": 0, "id": 30, "iteration": 1577172562108, "links": [], "panels": [ { "cacheTimeout": null, "colorBackground": true, "colorValue": false, "colors": [ "rgba(50, 172, 45, 0.97)", "rgba(237, 129, 40, 0.89)", "rgba(245, 54, 54, 0.9)" ], "datasource": "Prometheus", "format": "none", "gauge": { "maxValue": 100, "minValue": 0, "show": false, "thresholdLabels": false, "thresholdMarkers": true }, "gridPos": { "h": 7, "w": 6, "x": 0, "y": 0 }, "id": 13, "interval": null, "links": [], "mappingType": 1, "mappingTypes": [ { "name": "value to text", "value": 1 }, { "name": "range to text", "value": 2 } ], "maxDataPoints": 100, "nullPointMode": "connected", "nullText": null, "options": {}, "postfix": "", "postfixFontSize": "50%", "prefix": "", "prefixFontSize": "50%", "rangeMaps": [ { "from": "null", "text": "N/A", "to": "null" } ], "sparkline": { "fillColor": "rgba(31, 118, 189, 0.18)", "full": false, "lineColor": "rgb(31, 120, 193)", "show": false }, "tableColumn": "", "targets": [ { "expr": "rabbitmq_up{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "metric": "rabbitmq_up", "refId": "A", "step": 2 } ], "thresholds": "Up,Down", "timeFrom": "30s", "title": "RabbitMQ Server", "type": "singlestat", "valueFontSize": "80%", "valueMaps": [ { "op": "=", "text": "N/A", "value": "null" }, { "op": "=", "text": "Down", "value": "0" }, { "op": "=", "text": "Up", "value": "1" } ], "valueName": "current" }, { "alert": { "conditions": [ { "evaluator": { "params": [ 1 ], "type": "lt" }, "operator": { "type": "and" }, "query": { "params": [ "A", "10s", "now" ] }, "reducer": { "params": [], "type": "last" }, "type": "query" }, { "evaluator": { "params": [], "type": "no_value" }, "operator": { "type": "and" }, "query": { "params": [ "A", "10s", "now" ] }, "reducer": { "params": [], "type": "last" }, "type": "query" } ], "executionErrorState": "alerting", "frequency": "60s", "handler": 1, "message": "Some of the RabbitMQ node is down", "name": "Node Stats alert", "noDataState": "no_data", "notifications": [] }, "aliasColors": {}, "bars": true, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "decimals": 0, "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 18, "x": 6, "y": 0 }, "hiddenSeries": false, "id": 12, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": false, "min": false, "show": true, "total": false, "values": true }, "lines": false, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_running{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{node}}", "metric": "rabbitmq_running", "refId": "A", "step": 2 } ], "thresholds": [ { "colorMode": "critical", "fill": true, "line": true, "op": "lt", "value": 1 } ], "timeFrom": "30s", "timeRegions": [], "timeShift": null, "title": "Node up Stats", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "decimals": 0, "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 8, "x": 0, "y": 7 }, "hiddenSeries": false, "id": 6, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_exchanges{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{instance}}:exchanges", "metric": "rabbitmq_exchangesTotal", "refId": "A", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Exchanges", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "decimals": 0, "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 8, "x": 8, "y": 7 }, "hiddenSeries": false, "id": 4, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_channels{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{instance}}:channels", "metric": "rabbitmq_channelsTotal", "refId": "A", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Channels", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "decimals": 0, "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 8, "x": 16, "y": 7 }, "hiddenSeries": false, "id": 3, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_consumers{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{instance}}:consumers", "metric": "rabbitmq_consumersTotal", "refId": "A", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Consumers", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "decimals": 0, "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 8, "x": 0, "y": 14 }, "hiddenSeries": false, "id": 5, "legend": { "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_connections{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{instance}}:connections", "metric": "rabbitmq_connectionsTotal", "refId": "A", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Connections", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 8, "x": 8, "y": 14 }, "hiddenSeries": false, "id": 7, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_queues{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{instance}}:queues", "metric": "rabbitmq_queuesTotal", "refId": "A", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Queues", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "decimals": 0, "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 8, "x": 16, "y": 14 }, "hiddenSeries": false, "id": 8, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "sum by (vhost)(rabbitmq_queue_messages_global {instance=~\"$instance.*\"})", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{vhost}}:total", "metric": "rabbitmq_queue_messages_global", "refId": "C", "step": 2 }, { "expr": "sum by (vhost)(rabbitmq_queue_messages_ready_global{instance=~\"$instance.*\"})", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{vhost}}:ready", "metric": "rabbitmq_queue_messages_ready_global", "refId": "A", "step": 2 }, { "expr": "sum by (vhost)(rabbitmq_queue_messages_unacknowledged_global{instance=~\"$instance.*\"})", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{vhost}}:unack", "metric": "rabbitmq_queue_messages_unacknowledged_global", "refId": "D", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Messages/host", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "decimals": 0, "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 12, "x": 0, "y": 21 }, "hiddenSeries": false, "id": 2, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": false, "min": false, "rightSide": false, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_queue_messages_global{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{queue}}:{{durable}}", "metric": "rabbitmq_queue_messages_global", "refId": "A", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Messages / Queue", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 12, "x": 12, "y": 21 }, "hiddenSeries": false, "id": 9, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_node_mem_used{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{node}}:used", "metric": "rabbitmq_node_mem_used", "refId": "A", "step": 2 }, { "expr": "rabbitmq_node_mem_limit{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{node}}:limit", "metric": "node_mem", "refId": "B", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Memory", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "decbytes", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 12, "x": 0, "y": 28 }, "hiddenSeries": false, "id": 10, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_fd_used{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{node}}:used", "metric": "", "refId": "A", "step": 2 }, { "expr": "rabbitmq_fd_total{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{node}}:total", "metric": "node_mem", "refId": "B", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "FIle descriptors", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "fill": 1, "fillGradient": 0, "gridPos": { "h": 7, "w": 12, "x": 12, "y": 28 }, "hiddenSeries": false, "id": 11, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "rabbitmq_sockets_used{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{node}}:used", "metric": "", "refId": "A", "step": 2 }, { "expr": "rabbitmq_sockets_total{instance=~\"$instance.*\"}", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{node}}:total", "metric": "", "refId": "B", "step": 2 } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Sockets", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } } ], "refresh": "5s", "schemaVersion": 21, "style": "dark", "tags": [ "rabbitmq", "prometheus" ], "templating": { "list": [ { "current": { "text": "default", "value": "default" }, "hide": 0, "includeAll": false, "label": null, "multi": false, "name": "datasource", "options": [], "query": "prometheus", "refresh": 1, "regex": "", "skipUrlSync": false, "type": "datasource" }, { "allValue": ".*", "current": { "selected": false, "text": "All", "value": "$__all" }, "datasource": "$datasource", "definition": "", "hide": 0, "includeAll": true, "label": null, "multi": false, "name": "instance", "options": [], "query": "label_values(rabbitmq_up, instance)", "refresh": 1, "regex": "", "skipUrlSync": false, "sort": 1, "tagValuesQuery": "", "tags": [], "tagsQuery": "", "type": "query", "useTags": false } ] }, "time": { "from": "now-5m", "to": "now" }, "timepicker": { "refresh_intervals": [ "5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h", "2h", "1d" ], "time_options": [ "5m", "15m", "1h", "6h", "12h", "24h", "2d", "7d", "30d" ] }, "timezone": "browser", "title": "RabbitMQ Metrics", "uid": "3xTRkqBWk", "version": 7 }
二、配置报警
1、创建报警规则
cat > /data/prometheus/alert_rules.yml <<- EOF groups: - name: example rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." # Alert for any instance that has a median request latency >1s. - alert: APIHighRequestLatency expr: api_http_request_latencies_second{quantile="0.5"} > 1 for: 10m annotations: summary: "High request latency on {{ $labels.instance }}" description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)" - name: Rabbitmq-运行状态 rules: - alert: Rabbitmq-down expr: rabbitmq_up{job='RabbitMQ'} != 1 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} is Down ! ! !" value: '{{ $value }}' summary: "The host node is down" - name: Rabbitmq disk free limit rules: - alert: Rabbitmq disk free limit status expr: rabbitmq_node_disk_free{job='RabbitMQ'} / 1024 / 1024 <= rabbitmq_node_disk_free_limit{job='RabbitMQ'} / 1024 / 1024 + 200 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} the rmq free disk is to low ! ! !" value: '{{ $value }} MB' summary: "The rmq free disk too low" - name: RabbitMQ-内存使用>300MB rules: - alert: RabbitMQ-内存使用>300MB status expr: rabbitmq_node_mem_used{job='RabbitMQ'} /1024 /1024 > 300 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} the rabbitmq use memory is to High ! ! !" value: '{{ $value }} MB' summary: "the rabbitmq use memory is to High" - name: RabbitMQ-没有ACK应答队列>0 rules: - alert: RabbitMQ-unack>0 status expr: rabbitmq_queue_messages_unacknowledged_global{job='RabbitMQ'} > 0 labels: status: High team: Rabbitmq_monitor annotations: description: "Instance: {{ $labels.instance }} the rabbitmq_queue_messages_unacknowledged_global > 0 ! ! !" value: '{{ $value }} ' summary: "the rabbitmq_queue_messages_unacknowledged_global > 0" EOF
2、修改prometheue.yml文件
...
alerting:
alertmanagers:
- static_configs:
- targets: ['172.18.156.179:59093']
rule_files:
- "/etc/prometheus/alert_rules.yml"
...
- job_name: 'alertmanagers'
static_configs:
- targets: ['172.18.156.179:59093']
要把alert_rules.yml规则映射到docker里,docker rm -f prometheus,然后重构prometheus
docker run -d -p 59090:9090 \ -v /data/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \ -v "/etc/localtime:/etc/localtime" \ -v /data/prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml \ --name prometheus \ prom/prometheus:v2.27.1
打开http://172.18.156.179:59090/alerts
3、配置钉钉webhoob
https://hub.docker.com/r/timonwong/prometheus-webhook-dingtalk
#先进入钉钉创建报警群组--添加报警机器人--创建webhook--保存自动生成的秘钥
mkdir -p /usr/local/prometheus-webhook-dingtalk/
cat > /usr/local/prometheus-webhook-dingtalk/config.yml <<- EOF targets: webhook1: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxx # secret for signature secret: xxxxxxxxxxxxxxxxxxx EOF
docker run -d --name webhook \ -p 58060:8060 \ -v /usr/local/prometheus-webhook-dingtalk/config.yml:/etc/prometheus-webhook-dingtalk/config.yml \ timonwong/prometheus-webhook-dingtalk:latest
4、配置Altermanger
cat > /data/prometheus/alertmanager.yml <<- EOF global: resolve_timeout: 2m route: group_by: ['alertname', 'cluster'] group_wait: 30s group_interval: 5m repeat_interval: 5m receiver: default # 优先使用default发送 receivers: - name: 'default' webhook_configs: - url: http://172.18.156.179:58060/dingtalk/webhook1/send send_resolved: true # 发送已解决通知 EOF
docker run -d --name Alertmanager \ -p 59093:9093 \ -v /data/prometheus/alertmanager.yml:/opt/bitnami/alertmanager/conf/config.yml \ bitnami/alertmanager:latest
访问http://172.18.156.179:59093
报警触发
三. Prometheus与其他模块对接
由于Prometheus灵活的接口配置和数据获取方式,可以很灵活的与其他模块进行对接,用于实时监控多个模块。
包括以下常用模块:
1.node_exporter
用户监控节点虚拟机的指标信息。
2.jmx_exporter
3.elasticsearch_exporter
4.redis_exporter
5.mysqld_exporter
6.postgres_exporter
7.mongodb_exporter
下载地址:https://github.com/dcu/mongodb_exporter/releases/download/v1.0.0/mongodb_exporter-linux-amd64
8.statsd_exporter
9.mesos_exporter
10.apache_exporter
11.hadoop_exporter
下载地址:https://github.com/wyukawa/hadoop_exporter
12.logstash_exporter
下载地址:https://github.com/BonnierNews/logstash_exporter/archive/v0.1.2.tar.gz
13.sql_exporter
14.oracle_exporter
下载地址:https://github.com/iamseth/oracledb_exporter/releases/download/0.0.8/oracledb_exporter.linux-amd64
15.zookeeper_exporter
下载地址2:https://github.com/carlpett/zookeeper_exporter/releases/download/v1.0.2/zookeeper_exporter
16.influxdb_exporter
17.zabbix_exporter
下载地址:https://github.com/MyBook/zabbix-exporter/archive/1.0.2.tar.gz
18.opentsdb_exporter
下载地址:https://github.com/cloudflare/opentsdb_exporter/archive/0.0.3.tar.gz
19.grafana_exporter
20.json_exporter
下载地址:https://github.com/sciffer/json_exporter
21.RocketMQ_exporter
下载地址:https://github.com/apache/rocketmq-exporter
参考文档
https://cloud.tencent.com/developer/article/1580187
https://blog.51cto.com/u_15111052/3101859
https://www.cnblogs.com/weifeng1463/p/12485368.html