杨梅冲
每天在想什么呢?

1.监控redis

1.1 redis_exporter安装方式

1.1.1 二进制源码安装方式

参考nginx二进制安装方法

redis_exporter下载地址:https://github.com/oliver006/redis_exporter/releases

系统服务:
cat > /etc/systemd/system/redis_exporter.service <<"EOF"
[Unit]
Description=Prometheus Redis Exporter
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=always
ExecStart=/opt/prometheus/redis_exporter/redis_exporter \
-redis.addr localhost:6379 \
-redis.password 123456

[Install]
WantedBy=multi-user.target
EOF
1.1.2 docker安装
# docker直接运行
docker run -d --restart=always --name redis_exporter -p 9121:9121 oliver006/redis_exporter --redis.addr redis://192.168.10.100:6379 --redis.password '123456'

# docker-compose方式
cat >docker-compose.yaml <<EOF
version: '3.3'
services:
  redis_exporter:
    image: oliver006/redis_exporter
    container_name: redis_exporter
    restart: always
    environment:
      REDIS_ADDR: "192.168.10.100:6379"
      REDIS_PASSWORD: 123456
    ports:
      - "9121:9121"
EOF

# 启动
docker-compose up -d

# metrics地址
http://192.168.10.100:9121/metrics

1.2 Prometheus配置

# 配置prometheus去采集(拉取)redis_exporter的监控样本数据

cd /data/docker-prometheus 

#在scrape_configs(搜刮配置):下面增加如下配置:

cat >> prometheus/prometheus.yml << "EOF"
  - job_name: 'redis_exporter'
    static_configs:
    - targets: ['192.168.10.100:9121']
      labels:
        instance: test服务器
EOF

# 重载
curl -X POST http://localhost:9090/-/reload

 1.3 granfa展示

https://grafana.com/grafana/dashboards/11835-redis-dashboard-for-prometheus-redis-exporter-helm-stable-redis-ha/

 1.5 常用监控指标

redis_up # 服务器是否在线
redis_uptime_in_seconds # 运行时长,单位 s
rate(redis_cpu_sys_seconds_total[1m]) + rate(redis_cpu_user_seconds_total[1m]) # 占用 CPU 核数
redis_memory_used_bytes # 占用内存量
redis_memory_max_bytes # 限制的最大内存,如果没限制则为 0
delta(redis_net_input_bytes_total[1m]) # 网络接收的 bytes
delta(redis_net_output_bytes_total[1m]) # 网络发送的 bytes


redis_connected_clients # 客户端连接数
redis_connected_clients / redis_config_maxclients # 连接数使用率
redis_rejected_connections_total # 拒绝的客户端连接数
redis_connected_slaves # slave 连接数

1.6 触发器配置

将触发器根据服务不同,进行分开,避免规则列表过长

cd /data/docker-prometheus
mkdir prometheus/rules
vim prometheus/prometheus.yml
# 报警(触发器)配置
rule_files:
  - "alert.yml"  
  - "rules/*.yml"

redis触发器(告警规则)

cat >> prometheus/rules/redis.yml <<"EOF"
groups:
- name: redis
  rules:
  - alert: RedisDown
    expr: redis_up == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: 'Redis Down,实例:{{ $labels.instance }}'
      description: "Redis实例 is down"
  - alert: RedisMissingBackup
    expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: "Redis备份丢失,实例:{{ $labels.instance }}"
      description: "Redis 24小时未备份"

  - alert: RedisOutOfConfiguredMaxmemory
    expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Redis超出配置的最大内存,实例:{{ $labels.instance }}"
      description: "Redis内存使用超过配置最大内存的90%"
  - alert: RedisTooManyConnections
    expr: redis_connected_clients > 100
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Redis连接数过多,实例:{{ $labels.instance }}"
      description: "Redis当前连接数为: {{ $value }}"
  - alert: RedisNotEnoughConnections
    expr: redis_connected_clients < 1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Redis没有足够的连接,实例:{{ $labels.instance }}"
      description: "Redis当前连接数为: {{ $value }}"
  - alert: RedisRejectedConnections
    expr: increase(redis_rejected_connections_total[1m]) > 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: "Redis有拒绝连接,实例:{{ $labels.instance }}"
      description: "与Redis 的某些连接被拒绝{{ $value }}"
EOF
redis告警规则

检查配置,重新加载

docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

curl -X POST http://localhost:9090/-/reload

检查:

http://192.168.10.14:9090/rules#redis

http://192.168.10.14:9090/alerts?search=

 2.监控rabbitmq

 2.1 安装方式

rabbitmq_exporter二进制安装

rabbit_exporter下载地址: https://github.com/kbudde/rabbitmq_exporter/releases

系统服务:
cat > /etc/systemd/system/rabbitmq_exporter.service <<"EOF"
[Unit]
Description=prometheus rabbitmq exporter
After=network.target

[Service]
Environment=RABBIT_USER=guest
Environment=RABBIT_PASSWORD=guest
Environment=RABBIT_URL=http://localhost:15672
OUTPUT_FORMAT=JSON
Type=simple
User=prometheus
Group=prometheus
Restart=always
ExecStart=/opt/prometheus/rabbitmq_exporter/rabbitmq_exporter

[Install]
WantedBy=multi-user.target
EOF

docker-compose安装rabbitmq

cd /data/rabbitmq
cat >>docker-compose.yaml<<"EOF"

version: '3'
services:
  rabbitmq:
    image: rabbitmq:3.7.15-management
    container_name: rabbitmq
    restart: always
    volumes:
      - /data/rabbitmq/data:/var/lib/rabbitmq
      - /data/rabbitmq/log:/var/log/rabbitmq
    ports:
      - 5672:5672
      - 15672:15672
EOF

# 启动
docker-compose up -d

2.2 安装 rabbitmq_exporter方式

docker安装rabbitmq_exporter

# docker直接安装
docker run -d --restart=always -p 9419:9419  --name rabbitmq_exporter -e RABBIT_URL=http://192.168.10.100:15672 -e RABBIT_USER=guest -e RABBIT_PASSWORD=guest kbudde/rabbitmq-exporter

# docker-compose安装
cat >docker-compose.yaml <<EOF
version: '3.3'
services:
  rabbitmq_exporter:
    image: kbudde/rabbitmq-exporter
    container_name: rabbitmq_exporter
    restart: always
    environment:
      RABBIT_URL: "http://192.168.10.100:15672"
      RABBIT_USER: "guest"
      RABBIT_PASSWORD: "guest"
      PUBLISH_PORT: "9419"
      OUTPUT_FORMAT: "JSON"
    ports:
      - "9419:9419"
EOF

# 启动
docker-compose up -d

# 参数解释

Environment variable

default

description

RABBIT_URL

http://127.0.0.1:15672

rabbitMQ管理插件的url(必须以http(s)://开头)

RABBIT_USER

guest

rabbitMQ 管理插件的用户名。

RABBIT_PASSWORD

guest

rabbitMQ 管理插件的密码。

OUTPUT_FORMAT

JSON

输出格式

PUBLISH_PORT

9419

运行端口(监听端口)

 

 

 

 

 

 

 

 

 

 

metrics地址:http://192.168.10.100:9419/metrics

2.3 Prometheus配置

配置prometheus去采集(拉取)rabbitmq_exporter的监控样本数据

cd /data/docker-prometheus 

#在scrape_configs(搜刮配置):下面增加如下配置:

cat >> prometheus/prometheus.yml << "EOF"
  - job_name: 'rabbitmq_exporter'
    static_configs:
    - targets: ['192.168.10.100:9419']
      labels:
        instance: test服务器
EOF

# 重新加载配置
curl -X POST http://localhost:9090/-/reload

 2.4 常用监控指标

rabbitmq_queue_messages_unacknowledged_global 队列中有未确认的消息总数(未被消费的消息)

rabbitmq_node_disk_free_limit  使用磁盘大小
rabbitmq_node_disk_free        磁盘总大小

rabbitmq_node_mem_used        使用内存大小
rabbitmq_node_mem_limit       内存总大小

rabbitmq_sockets_used         使用sockets的数量
rabbitmq_sockets_available    可用的sockets总数

rabbitmq_fd_used              使用文件描述符的数量
rabbitmq_fd_available         可用的文件描述符总数

2.5 rabbitmq触发器告警规则

cat > prometheus/rules/rabbitmq.yml <<"EOF"
groups:
- name: Rabbitmq
  rules:
  - alert: RabbitMQDown
    expr: rabbitmq_up != 1
    labels:
      severity: High
    annotations:
      summary: "Rabbitmq Down,实例:{{ $labels.instance }}"
      description: "Rabbitmq_exporter连不上RabbitMQ! ! !"
  - alert: RabbitMQ有未确认消息
    expr: rabbitmq_queue_messages_unacknowledged_global  > 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "RabbitMQ有未确认消息,实例:{{ $labels.instance }}"
      description: 'RabbitMQ未确认消息>0,当前值为:{{ $value }}'      
  - alert: RabbitMQ可用磁盘空间不足告警
    expr: rabbitmq_node_disk_free_alarm != 0
    #expr: rabbitmq_node_disk_free_limit / rabbitmq_node_disk_free *100 > 90
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: "RabbitMQ可用磁盘空间不足,实例:{{ $labels.instance }}"
      description: "RabbitMQ可用磁盘空间不足,请检查"
  - alert: RabbitMQ可用内存不足告警
    expr: rabbitmq_node_mem_alarm != 0
    #expr: rabbitmq_node_mem_used / rabbitmq_node_mem_limit * 100 > 90
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: "RabbitMQ可用内存不足,实例:{{ $labels.instance }}"
      description: "RabbitMQ可用内存不足,请检查"
  - alert: RabbitMQ_socket连接数使用过高告警
    expr: rabbitmq_sockets_used / rabbitmq_sockets_available * 100 > 60
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: "RabbitMQ_socket连接数使用过高,实例:{{ $labels.instance }}"
      description: 'RabbitMQ_sockets使用>60%,当前值为:{{ $value }}'
  - alert: RabbitMQ文件描述符使用过高告警
    expr: rabbitmq_fd_used / rabbitmq_fd_available * 100 > 60
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: "RabbitMQ文件描述符使用过高,实例:{{ $labels.instance }}"
      description: 'RabbitMQ文件描述符使用>60%,当前值为:{{ $value }}'
EOF
rabbitmq报警规则
# 检查配置
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

#重新加载配置
curl -X POST http://localhost:9090/-/reload

# 查看
http://192.168.10.14:9090/rules
http://192.168.10.14:9090/alerts?search=

2.6 grafana dashboard展示

grafana展示prometheus从rabbitmq_exporter收集到的的数据

id:4279

 3.监控mongodb

安装方式自己定

3.1 docker-compose安装mongo

cd /data/mongodb

cat >>docker-compose.yaml<<"EOF"
version: '3'
services:
  mongo:
    image: mongo:4.2.5
    container_name: mongo
    restart: always
    volumes:
      - /data/mongo/db:/data/db
    ports:
      - 27017:27017
    command: [--auth]
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: 123456
EOF

docker-compose up -d

3.2  监控mongodb

3.2.1 创建监控用户

登陆mongodb创建监控用户,权限为“readAnyDatabase”,如果是cluster环境,需要有权限“clusterMonitor”

docker exec -it mongo mongo admin

创建监控用户

> db.auth('root','123456')
1
> db.createUser({ user:'exporter',pwd:'password',roles:[ { role:'readAnyDatabase', db: 'admin'},{ role: "clusterMonitor", db: "admin" }]});
#测试 使用上面创建的用户信息进行连接。
> db.auth('exporter', 'password')
1
#表示成功
> exit
3.2.2 安装mongodb_exporter
3.2.2.1 mongodb_exporter安装方式自选

二进制安装:

mongodb_exporter地址:https://github.com/percona/mongodb_exporter/releases

或:https://github.com/prometheus/mysqld_exporter/releases

systemd服务

cat <<EOF >/usr/lib/systemd/system/mongodb_exporter.service
[Unit]
Description=mongodb_exporter
Documentation=https://github.com/percona/mongodb_exporter
After=network.target

[Service]
Type=simple
User=prometheus
Environment="MONGODB_URI=mongodb://exporter:password@localhost:27017/admin"
ExecStart=/opt/prometheus/mongodb_exporter/mongodb_exporter --log.level=error --collect-all --compatible-mode
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

docker 安装方式:

docker run -d --restart=always -p 9216:9216 -p 17001:17001 --restart=always --name=mongodb-exporter  bitnami/mongodb-exporter:latest --collect-all --compatible-mode --mongodb.uri=mongodb://exporter:password@192.168.10.100:27017/admin?ssl=false

docker-compose安装方式

cat >docker-compose.yaml <<EOF
version: '3.3'
services:
  mongodb_exporter:
    image: bitnami/mongodb-exporter:latest
    container_name: mongodb_exporter
    restart: always
    environment:
      MONGODB_URI: "mongodb://exporter:password@192.168.11.62:27017/admin?ssl=false"
    command:
      - '--collect-all'
      - '--compatible-mode'
    ports:
      - "9216:9216"
EOF

参数解释

Flag

含义

案例

-h, --help

显示上下文相关的帮助

 

--[no-]compatible-mode

启用旧的 mongodb-exporter 兼容指标

 

--[no-]discovering-mode

启用自动发现集合

 

--mongodb.collstats-colls

逗号分隔的 databases.collections 列表以获取 $collStats

--mongodb.collstats-colls=db1,db2.col2

--mongodb.indexstats-colls

逗号分隔的 databases.collections 列表以获取 $indexStats

--mongodb.indexstats-colls=db1.col1,db2.col2

--[no-]mongodb.direct-connect

是否应该进行直接连接。如果指定了多个主机或使用了 SRV URI,则直接连接无效

 

--[no-]mongodb.global-conn-pool

使用全局连接池而不是为每个 http 请求创建新池

 

--mongodb.uri

MongoDB 连接 URI ($MONGODB_URI)

--mongodb.uri=mongodb://user:pass@127.0.0.1:27017/admin?ssl=true

--web.listen-address

用于侦听 Web 界面和遥测的地址

--web.listen-address=":9216"

--web.telemetry-path

指标公开路径

--web.telemetry-path="/metrics"

--web.config

具有用于基本身份验证的 Prometheus TLS 配置的文件的路径

--web.config=STRING

--log.level

仅记录具有给定严重性或更高严重性的消息。有效级别:[调试、信息、警告、错误、致命]

--log.level="error"

--collector.diagnosticdata

启用从 getDiagnosticData 收集指标

 

--collector.replicasetstatus

启用从 replSetGetStatus 收集指标

 

--collector.dbstats

启用从 dbStats 收集指标

 

--collector.topmetrics

启用从 top admin command 收集指标

 

--collector.indexstats

启用从 $indexStats 收集指标

 

--collector.collstats

启用从 $collStats 收集指标

 

--collect-all

启用所有收集器。与指定所有 --collector. 相同

 

--collector.collstats-limit=0

如果有超过 个集合,请禁用 collstats、dbstats、topmetrics 和 indexstats 收集器。0=无限制

 

--metrics.overridedescendingindex

启用降序索引名称覆盖以将 -1 替换为 _DESC

 

--version

显示版本并退出

 

metrics地址:http://192.168.10.100:9216/metrics

3.3 Prometheus配置

配置prometheus去采集(拉取)mongodb_exporter的监控样本数据

cd /data/docker-prometheus 

#在scrape_configs(搜刮配置):下面增加如下配置:

cat >> prometheus/prometheus.yml << "EOF"
  - job_name: 'mongodb_exporter'
    static_configs:
    - targets: ['192.168.10.100:9216']
      labels:
        instance: test服务器
EOF

检查:

 3.4 常用的监控指标

mongodb_ss_connections{conn_type="available"} 可用的连接总数

mongodb_ss_mem_virtual
mongodb_ss_mem_resident



# 关于 server status
mongodb_up                                              # 服务器是否在线
mongodb_ss_ok{cl_id="", cl_role="mongod", rs_state="0"} # 服务器是否正常运行,取值为 1、0 。标签中记录了 Cluster、ReplicaSet 的信息
mongodb_ss_uptime                                       # 服务器的运行时长,单位为秒
mongodb_ss_connections{conn_type="current"}             # 客户端连接数

# 关于主机
mongodb_sys_cpu_num_cpus                                # 主机的 CPU 核数

# 关于 collection
mongodb_collstats_storageStats_count{database="xx", collection="xx"}  # collection 全部文档的数量
mongodb_collstats_storageStats_size                     # collection 全部文档的体积,单位 bytes
mongodb_collstats_storageStats_storageSize              # collection 全部文档占用的磁盘空间,默认会压缩
delta(mongodb_collstats_latencyStats_reads_ops[1m])     # collection 读操作的数量(每分钟)
delta(mongodb_collstats_latencyStats_reads_latency[1m]) # collection 读操作的延迟(每分钟),单位为微秒
mongodb_collstats_latencyStats_write_ops
mongodb_collstats_latencyStats_write_latency

# 关于 index
mongodb_collstats_storageStats_nindexes                 # collection 的 index 数量
mongodb_collstats_storageStats_totalIndexSize           # collection 的 index 占用的磁盘空间
delta(mongodb_indexstats_accesses_ops[1m])   # index 被访问次数

# 关于操作
delta(mongodb_ss_opcounters[1m])                        # 执行各种操作的数量
delta(mongodb_ss_opLatencies_latency[1m])               # 执行各种操作的延迟,单位为微秒
delta(mongodb_ss_metrics_document[1m])                  # 各种文档的变化数量

# 关于锁
delta(mongodb_ss_locks_acquireCount{lock_mode="w"}[1m]) # 新加锁的数量。R 表示共享锁,W 表示独占锁,r 表示意向共享锁,w 表示意向独占锁
mongodb_ss_globalLock_currentQueue{count_type="total"}  # 被锁阻塞的操作数

3.5 mongodb触发器告警规则配置

cat >> prometheus/rules/mongodb.yml <<"EOF"
groups:
- name: PerconaMongodbExporter
  rules:
    - alert: MongodbDown
      expr: 'mongodb_up == 0'
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: "MongoDB Down 容器: $labels.instance"
        description: "MongoDB 容器 is down, 当前值:{{ $value }}"

    - alert: MongodbNumberCursorsOpen
      expr: 'mongodb_ss_metrics_cursor_open{csr_type="total"} > 10 * 1000'
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "MongoDB 数字有标打开告警 容器: $labels.instance"
        description: "MongoDB 为客户端打开的游标过多 > 10k, 当前值:{{ $value }}"

    - alert: MongodbCursorsTimeouts
      expr: 'increase(mongodb_ss_metrics_cursor_timedOut[1m]) > 100'
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "MongoDB 游标超时 容器: $labels.instance"
        description: "太多游标超时, 当前值:{{ $value }}"

    - alert: MongodbTooManyConnections
      expr: 'avg by(instance) (rate(mongodb_ss_connections{conn_type="current"}[1m])) / avg by(instance) (sum (mongodb_ss_connections) by (instance)) * 100 > 80'
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "MongoDB 太多连接 容器: $labels.instance"
        description: "MongoDB 连接数 > 80%, 当前值:{{ $value }}"

    - alert: MongodbVirtualMemoryUsage
      expr: '(sum(mongodb_ss_mem_virtual) BY (instance) / sum(mongodb_ss_mem_resident) BY (instance)) > 3'
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "MongoDB虚拟内存使用告警 容器: $labels.instance"
        description: "虚拟内存使用过高, 当前值:{{ $value }}"
EOF
mongodb告警规则

检查并重新加载配置

docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

curl -X POST http://localhost:9090/-/reload

页面检查:

http://192.168.10.14:9090/alerts?search=

http://192.168.10.14:9090/rules

3.6 grafana dashboard展示

grafana展示prometheus从mongodb_exporter收集到的的数据

https://github.com/percona/grafana-dashboards/tree/main/dashboards/MongoDB

 

 

 

 

posted on 2024-04-24 18:44  杨梅冲  阅读(288)  评论(0编辑  收藏  举报