Prometheus监控Docker主机

cAdvisor(Container Advisor)用于收集正在运行的容器资源使用和性能信息

一.Docker运行cAdvisor

sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8899:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

二、访问cAdvisor

http://10.210.13.20:8899

三、Prometheus对接cAdvisor

- job_name: 'dev_docker'
  scrape_interval: 5s
  static_configs:
  - targets: ['10.210.13.20:8899']

四、重启Prometheus容器

docker restart prometheus

五、Grafana添加模板

但是这个模板,无法选择根据主机选择。推荐另外一个模板,它是可以选择主机的。

https://grafana.com/grafana/dashboards/10566

六、配置告警:我们新创建一个规则文件:alert_rules.yml,把它和prometheus.yml放在一起,官方有一个模板 Templating,直接copy过来

  groups:
  - name: example
    rules:
    # Alert for any instance that is unreachable for >5 minutes.
    - alert: InstanceDown
      expr: up == 0
      for: 5m
      labels:
        severity: page
      annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
    # Alert for any instance that has a median request latency >1s.
    - alert: APIHighRequestLatency
      expr: api_http_request_latencies_second{quantile="0.5"} > 1
      for: 10m
      annotations:
        summary: "High request latency on {{ $labels.instance }}"
        description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

上面规则文件大意:就是创建了2条alert规则 alert: InstanceDown 和 alert: APIHighRequestLatency :
InstanceDown 就是实例宕机(up==0)触发告警,5分钟后告警(for: 5m);
APIHighRequestLatency 表示有一半的 API 请求延迟大于 1s 时(api_http_request_latencies_second{quantile="0.5"} > 1)触发告警

七、重启Prometheus容器

docker restart prometheus
posted @ 2021-12-10 13:57  記憶や空白  阅读(113)  评论(0编辑  收藏  举报