Prometheus(六)监控Docker Swarm集群
所有组件都以容器形式启动,部分启动文件参考prometheus for swarm
-
部署Prometheus
-
编写启动文件
$ mkdir -p /opt/k8s/prometheus/conf $ cd /opt/k8s/prometheus/ $ cat > prome-stack.yml<<EOF version: "3" services: prometheus: image: prom/prometheus:v2.16.0 ports: - "9090:9090" volumes: - ./conf/:/etc/prometheus/ - prometheus_data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/usr/share/prometheus/console_libraries' - '--web.console.templates=/usr/share/prometheus/consoles' networks: - mcsas-network deploy: replicas: 1 restart_policy: condition: on-failure placement: constraints: - node.role == manager networks: mcsas-network: external: true volumes: prometheus_data: {} EOF
-
编辑配置文件
$ cd /opt/k8s/prometheus/prom/conf/ $ cat > prometheus.yml<<EOF global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'springboot' metrics_path: /actuator/prometheus file_sd_configs: - files: - /etc/prometheus/service.yaml - job_name: 'node-exporter' scrape_interval: 5s dns_sd_configs: - names: - 'tasks.node-exporter' type: 'A' port: 9100 - job_name: 'cadvisor' scrape_interval: 5s dns_sd_configs: - names: - 'tasks.cadvisor' type: 'A' port: 8080 EOF
- 对node-exporter、cadvisor采用dns服务发现形式
- 对于系统应用采用file_sd_configs,通过conf/service.yaml中配置,来是prometheus对我们提供的服务进行监控
- 因为Prometheus没有专门针对swarm的服务发现组件,需要手动向file_sd_configs对应的文件中追加,Prometheus官方上有一个方案,具体可参考prometheus-swarm-discovery
-
启动prometheus
$ cd /opt/k8s/prometheus/prom $ docker stack deploy -c prome-stack.yml prom
-
-
部署node-exporter
Node-Exporter并不是为了Mac平台设计的,在Mac上运行时不会正确收集系统相关的信息,如果平台是Mac,不要部署这个组件
$ cd /opt/k8s/prometheus $ cat > node-exporter-stack.yml<<EOF version: "3" services: node-exporter: image: quay.azk8s.cn/prometheus/node-exporter:v0.18.1 volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: - '--path.procfs=/host/proc' - '--path.sysfs=/host/sys' - --collector.filesystem.ignored-mount-points - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)" #ports: # - 9100:9100 networks: - mcsas-network deploy: mode: global restart_policy: condition: on-failure networks: mcsas-network: external: true EOF
启动node-exporter
$ cd /opt/k8s/prometheus $ docker stack deploy -c node-exporter-stack.yml node
-
部署cadvisor
$ cd /opt/k8s/prometheus $ cat > cadvisor-stack.yml<<EOF version: "3" services: cadvisor: image: gcr.azk8s.cn/google_containers/cadvisor:v0.35.0 volumes: - /:/rootfs:ro - /var/run:/var/run:rw - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro #ports: # - 8080:8080 networks: - mcsas-network deploy: mode: global restart_policy: condition: on-failure networks: mcsas-network: external: true EOF
- 关于镜像:google/cadvisor已经不推荐再用,新镜像已不再更新,使用gcr.io/google-containers/cadvisor,但是国内没发下载,更换成从gcr.azk8s.cn下载
启动cadvisor
$ docker stack deploy -c cadvisor-stack.yml cadvisor
-
部署grafana
$ cd /opt/k8s/prometheus $ cat > grafana-stack.yml<<EOF version: "3" services: grafana: image: grafana/grafana:6.6.2 volumes: - grafana-data:/var/lib/grafana deploy: replicas: 1 restart_policy: condition: on-failure resources: limits: cpus: "0.2" memory: 200M ports: - 3000:3000 networks: - mcsas-network volumes: grafana-data: {} networks: mcsas-network: external: true EOF
启动grafana
$ docker stack deploy -c grafana-stack.yml grafana
部署组件全部完成,关于在grafana中配置dashboard进行指标监控的具体步骤,参考[Prometheus grafana安装](Prometheus grafana安装.md)