Fork me on GitHub

thanos 实现 prometheus 高可用 数据持久化2

Prometheus 官方的高可用有几种方案:

    1. HA:即两套 Prometheus 采集完全一样的数据,外边挂负载均衡
    2. HA + 远程存储:除了基础的多副本 Prometheus,还通过 Remote write 写入到远程存储,解决存储持久化问题
    3. 联邦集群:即 Federation,按照功能进行分区,不同的 Shard 采集不同的数据,由 Global 节点来统一存放,解决监控数据规模的问题.

Thanos 的默认模式:sidecar 方式.  or Receiver方式 ;

 

 

 

  • Thanos Query. 主要是对从Promethues Pod采集来的数据进行merge,提供查询接口给客户端;
  • Thanos SideCar. 将Promethues container的数据进行封装,以提供接口给Thanos Query;
  • Prometheus Container. 采集数据,通过Remote Read API提供接口给Thanos SideCar
  • Thanos store Gateway: 将对象存储的数据暴露给 Thanos Query 去查询。
  • thanos compact: 将对象存储中的数据进行压缩和降低采样率,加速大时间区间监控数据查询的速度。
  • Thanos Ruler: 对监控数据进行评估和告警,还可以计算出新的监控数据,将这些新数据提供给 Thanos Query 查询并且/或者上传到对象存储,以供长期存储。

thanos 主:

thanos compact --data-dir ./thanos/comp --http-address 0.0.0.0:19192 --objstore.config-file ./bucket_config.yaml
thanos store --data-dir ./thanos/store --objstore.config-file ./bucket_config.yaml --http-address 0.0.0.0:19191 --grpc-address 0.0.0.0:19090
thanos query --http-address 0.0.0.0:8080 --grpc-address 0.0.0.0:8081 --query.replica-label slave --store 172.16.10.11:10901 --store 172.16.10.10:10901 --store 127.0.0.1:19090
thanos sidecar --tsdb.path /data/ --prometheus.url http://localhost:9090 --objstore.config-file ./bucket_config.yaml --shipper.upload-compacted

thanos 备:

thanos sidecar --tsdb.path /data/ --prometheus.url http://localhost:9090 --objstore.config-file ./bucket_config.yaml --shipper.upload-compacted
thanos query --http-address 0.0.0.0:8080 --grpc-address 0.0.0.0:8081 --query.replica-label slave --store 172.16.10.10:10901 --store 172.16.10.11:10901 --store 172.16.10.10:19090

 

注: bucket_config.yaml为云存储;

 

promtheus 配置:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    slave: 02

# Alertmanager configuration
#alerting:
#  alertmanagers:
#  - static_configs:
#     - targets: ['127.0.0.1:9093']
alerting:
  alertmanagers:
    - scheme: http
      static_configs:
        - targets:
            - "172.16.10.10:9093"

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/data/alert/etc/*.rule"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'Node_Exporter'
    consul_sd_configs:
      - server: '172.16.10.10:8500'
    relabel_configs:
      - source_labels: ["__meta_consul_service_address"]
        regex: "(.*)"
        replacement: $1
        action: replace
        target_label: "address"
      - source_labels: ["__meta_consul_service"]
        regex: "(.*)"
        replacement: $1
        action: replace
        target_label: "hostname"
      - source_labels: ["__meta_consul_service_address"]
        regex: "10.2.*"
        action: drop
      - source_labels: ["__meta_consul_service_address"]
        regex: "10.3.*"
        action: drop
      - source_labels: ["__meta_consul_service_address"]
        regex: "10.7.*"
        action: drop
      - source_labels: ["__meta_consul_tags"]
        regex: ".*测试环境.*"
        action: drop
      - source_labels: ["__meta_consul_tags"]
        regex: ".*安全组.*"
        action: drop
      - source_labels: ["__meta_consul_tags"]
        regex: ",(.*),(.*),(.*),(.*),(.*),(.*),"
        replacement: $1
        action: replace
        target_label: "department"
      - source_labels: ["__meta_consul_tags"]
        regex: ",(.*),(.*),(.*),(.*),(.*),(.*),"
        replacement: $2
        action: replace
        target_label: "group"
      - source_labels: ["__meta_consul_tags"]
        regex: ",(.*),(prod|dev|pre|test),"
        replacement: $2
        action: replace
        target_label: "env"
      - source_labels: ["__meta_consul_tags"]
        regex: ",(.*),(.*),(.*),(.*),(.*),(.*),"
        replacement: $3
        action: replace
        target_label: "application"
      - source_labels: ["__meta_consul_tags"]
        regex: ",(.*),(.*),(.*),(.*),(.*),(.*),"
        replacement: $4
        action: replace
        target_label: "type"
      - source_labels: ["__meta_consul_tags"]
        regex: ",(.*),(.*),(.*),(.*),(.*),(.*),"
        replacement: $5
        action: replace
        target_label: "dc"
      - source_labels: ["__meta_consul_tags"]
        regex: ",(.*),(.*),(.*),(.*),(.*),(.*),"
        replacement: $6
        action: replace
        target_label: "appCode"
      - source_labels: ["__meta_consul_service_id"]
        regex: "(.*)"
        replacement: $1
        action: replace
        target_label: "id"


  - job_name: kubetake

    kubernetes_sd_configs:
      - role: node
        api_server: 'https://172.16.10.11:6443'
        tls_config:
          ca_file: /data/alert/etc/ca.crt
        bearer_token_file: /data/alert/etc/token
    relabel_configs:
      - action: labelmap
        regex: (.+)
      - target_label: __address__
        source_labels: [__meta_kubernetes_node_address_InternalIP]
        regex: (.+)
        replacement: ${1}:9100

 

posted @ 2020-06-01 11:38  patrick-py  阅读(727)  评论(0编辑  收藏  举报