Kubernetes Prometheus

简介


广告:给自己做的Grafana Dashboard宣传一下,不敢保证比原厂做的好,但是在Grafana官方中绝对是最靓的

https://grafana.com/hyomin

Promethues是一款原生云计算基金项目,完全开源用于监控系统、服务、容器、数据库等,收集根据客户端的target配置监控项,依据监控判断表达式,进行数据展示及警报阀值进行告警

与其它监控系统比较有如下特点

  1.   多维度数据模型(时间序列,KEY/VALUE)
  2.   灵活的查询语言
  3.   不依赖其它分布式存储 ,可自主独立存储
  4.   时间序列收集通过HTTP PULL MODEL
  5.   通过服务发现&配置文件来发现客户端
  6.   支持多种模型的dashboard
可以采用 push gateway 的方式把时间序列数据推送至 Prometheus server 端

采集方式

由于数据采集可能会有丢失,所以 Prometheus 不适用对采集数据要 100% 准确的情形。但如果用于记录时间序列数据,Prometheus 具有很大的查询优势,此外,Prometheus 适用于微服务的体系架构

pull方式

Prometheus采集数据是用的pull也就是拉模型,通过HTTP协议去采集指标,只要应用系统能够提供HTTP接口就可以接入监控系统,相比于私有协议或二进制协议来说开发、简单。

push方式

对于定时任务这种短周期的指标采集,如果采用pull模式,可能造成任务结束了,Prometheus还没有来得及采集,这个时候可以使用加一个中转层,客户端推数据到Push Gateway缓存一下,由Prometheus从push gateway pull指标过来。(需要额外搭建Push Gateway,同时需要新增job去从gateway采数据)

组件架构


Prometheus组件

  • Prometheus Server 数据库,数据处理,数据存储,数据查询,报警规则等数据图表展示功能
  • alertmanager 报警接收及发送管理,提供报警定义模板配置,报警发送,报警路由定义
  • Pushgateway 数据采集中转站,数据缓存管理
  • exporters 数据采集
  • Client library 

架构图

部署方式


部署方式helm

引用文档,如下

官方,如何配置服务发现来监控kubernetes https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

 Kubernetes service discoveries暴露以下role以供Prometheus采集(Prometheus从kubernetes的REST API接口来scrape以下目标资源信息,并同时保持状态的同步)

  • node
  • endpoint
  • service
  • pod
  • ingress

安装方式

  1. 在集群外部安装Prometheus,需要在prometheus.yml配置集群的ca证书
  2. 在集群内部安装Prometheus,需要创建rbac鉴权策略,具体安装可参考helm官方安装,本指南主要使用helm安装

helm安装共有二个链接,一个官方helm目前已经弃用,其来源指向也是artifacthub,一个是artifacthub专业kubernetes安装集成部署CNCF projects

  1. https://github.com/helm/charts
  2. https://artifacthub.io

prometheus artifacthub.io 安装地址

https://artifacthub.io/packages/helm/prometheus-community/prometheus

链接已经详细的指出如何使用helm安装Prometheus,及预安装的必须条件

为什么必须依赖于kube-state-metrics

因为kube-state-metrics主要用来监控kubernetes的副本集的活动状态

  1. 添加repo,如下
    <root@PROD-K8S-CP1 ~># helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    "prometheus-community" has been added to your repositories
    <root@PROD-K8S-CP1 ~># helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
    "kube-state-metrics" has been added to your repositories
    <root@PROD-K8S-CP1 ~># helm repo update
    Hang tight while we grab the latest from your chart repositories...
    ...Successfully got an update from the "kube-state-metrics" chart repository
    ...Successfully got an update from the "cilium" chart repository
    ...Successfully got an update from the "prometheus-community" chart repository
    Update Complete. ⎈ Happy Helming!⎈ 
  2. 安装Prometheus
    <root@PROD-K8S-CP1 ~>#  helm install prometheus prometheus-community/prometheus --version 14.1.0
    NAME: prometheus
    LAST DEPLOYED: Thu Sep  2 15:57:19 2021
    NAMESPACE: default
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
    prometheus-server.default.svc.cluster.local
    
    
    Get the Prometheus server URL by running these commands in the same shell:
      export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
      kubectl --namespace default port-forward $POD_NAME 9090
    
    
    The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster:
    prometheus-alertmanager.default.svc.cluster.local
    
    
    Get the Alertmanager URL by running these commands in the same shell:
      export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
      kubectl --namespace default port-forward $POD_NAME 9093
    #################################################################################
    ######   WARNING: Pod Security Policy has been moved to a global property.  #####
    ######            use .Values.podSecurityPolicy.enabled with pod-based      #####
    ######            annotations                                               #####
    ######            (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #####
    #################################################################################
    
    
    The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
    prometheus-pushgateway.default.svc.cluster.local
    
    
    Get the PushGateway URL by running these commands in the same shell:
      export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
      kubectl --namespace default port-forward $POD_NAME 9091
    
    For more information on running Prometheus, visit:
    https://prometheus.io/
  3. 查看部署状态
    <root@PROD-K8S-CP1 ~># kubectl get pods --all-namespaces  -o wide| grep node
    default         prometheus-node-exporter-9rm22                   1/1     Running   0          51m     10.1.17.237     prod-be-k8s-wn7     <none>           <none>
    default         prometheus-node-exporter-qfxgz                   1/1     Running   0          51m     10.1.17.238     prod-be-k8s-wn8     <none>           <none>
    default         prometheus-node-exporter-z5znx                   1/1     Running   0          51m     10.1.17.236     prod-be-k8s-wn6     <none>           <none>
    <root@PROD-K8S-CP1 ~># kubectl get pods --all-namespaces  -o wide| grep prometheus
    default         prometheus-alertmanager-755d84cf4f-rfz4n         0/2     Pending   0          51m     <none>          <none>              <none>           <none>
    default         prometheus-kube-state-metrics-86dc6bb59f-wlpcl   1/1     Running   0          51m     172.21.12.167   prod-be-k8s-wn8     <none>           <none>
    default         prometheus-node-exporter-9rm22                   1/1     Running   0          51m     10.1.17.237     prod-be-k8s-wn7     <none>           <none>
    default         prometheus-node-exporter-qfxgz                   1/1     Running   0          51m     10.1.17.238     prod-be-k8s-wn8     <none>           <none>
    default         prometheus-node-exporter-z5znx                   1/1     Running   0          51m     10.1.17.236     prod-be-k8s-wn6     <none>           <none>
    default         prometheus-pushgateway-745d67dd5f-7ckvv          1/1     Running   0          51m     172.21.12.2     prod-be-k8s-wn7     <none>           <none>
    default         prometheus-server-867f854484-lcrq6 

    # 默认部署,启用了持久化存储,由于默认的持久化存储没有指定详细的PVC,所以在安装完需要调整持久化配置段,当然也可以选择不需要持久化存储,可以选择将本地卷映射入Pod中
  4. 自定义安装
    helm install prometheus prometheus-community/prometheus -f prometheus-values.yaml
    <root@PROD-K8S-CP1 ~># helm show values prometheus-community/prometheus --version 14.1.0 > prometheus-values.yaml
    
    修订版
    rbac:
      create: true
    
    podSecurityPolicy:
      enabled: false
    
    imagePullSecrets:
    # - name: "image-pull-secret"
    
    ## Define serviceAccount names for components. Defaults to component's fully qualified name.
    ##
    serviceAccounts:
      alertmanager:
        create: true
        name:
        annotations: {}
      nodeExporter:
        create: true
        name:
        annotations: {}
      pushgateway:
        create: true
        name:
        annotations: {}
      server:
        create: true
        name:
        annotations: {}
    
    alertmanager:
      ## If false, alertmanager will not be installed
      ##
      enabled: true
    
      ## Use a ClusterRole (and ClusterRoleBinding)
      ## - If set to false - we define a Role and RoleBinding in the defined namespaces ONLY
      ## This makes alertmanager work - for users who do not have ClusterAdmin privs, but wants alertmanager to operate on their own namespaces, instead of clusterwide.
      useClusterRole: true
    
      ## Set to a rolename to use existing role - skipping role creating - but still doing serviceaccount and rolebinding to the rolename set here.
      useExistingRole: false
    
      ## alertmanager container name
      ##
      name: alertmanager
    
      ## alertmanager container image
      ##
      image:
        repository: quay.io/prometheus/alertmanager
        tag: v0.21.0
        pullPolicy: IfNotPresent
    
      ## alertmanager priorityClassName
      ##
      priorityClassName: ""
    
      ## Additional alertmanager container arguments
      ##
      extraArgs: {}
    
      ## Additional InitContainers to initialize the pod
      ##
      extraInitContainers: []
    
      ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug
      ## so that the various internal URLs are still able to access as they are in the default case.
      ## (Optional)
      prefixURL: ""
    
      ## External URL which can access alertmanager
      baseURL: "http://localhost:9093"
    
      ## Additional alertmanager container environment variable
      ## For instance to add a http_proxy
      ##
      extraEnv: {}
    
      ## Additional alertmanager Secret mounts
      # Defines additional mounts with secrets. Secrets must be manually created in the namespace.
      extraSecretMounts: []
        # - name: secret-files
        #   mountPath: /etc/secrets
        #   subPath: ""
        #   secretName: alertmanager-secret-files
        #   readOnly: true
    
      ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.alertmanager.configMapOverrideName}}
      ## Defining configMapOverrideName will cause templates/alertmanager-configmap.yaml
      ## to NOT generate a ConfigMap resource
      ##
      configMapOverrideName: ""
    
      ## The name of a secret in the same kubernetes namespace which contains the Alertmanager config
      ## Defining configFromSecret will cause templates/alertmanager-configmap.yaml
      ## to NOT generate a ConfigMap resource
      ##
      configFromSecret: ""
    
      ## The configuration file name to be loaded to alertmanager
      ## Must match the key within configuration loaded from ConfigMap/Secret
      ##
      configFileName: alertmanager.yml
    
      ingress:
        ## If true, alertmanager Ingress will be created
        ##
        enabled: false
    
        ## alertmanager Ingress annotations
        ##
        annotations: {}
        #   kubernetes.io/ingress.class: nginx
        #   kubernetes.io/tls-acme: 'true'
    
        ## alertmanager Ingress additional labels
        ##
        extraLabels: {}
    
        ## alertmanager Ingress hostnames with optional path
        ## Must be provided if Ingress is enabled
        ##
        hosts: []
        #   - alertmanager.domain.com
        #   - domain.com/alertmanager
    
        ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
        extraPaths: []
        # - path: /*
        #   backend:
        #     serviceName: ssl-redirect
        #     servicePort: use-annotation
    
        ## alertmanager Ingress TLS configuration
        ## Secrets must be manually created in the namespace
        ##
        tls: []
        #   - secretName: prometheus-alerts-tls
        #     hosts:
        #       - alertmanager.domain.com
    
      ## Alertmanager Deployment Strategy type
      # strategy:
      #   type: Recreate
    
      ## Node tolerations for alertmanager scheduling to nodes with taints
      ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
      ##
      ##  配置alertmanager污点容忍
      tolerations:
        - key: resource
        #   operator: "Equal|Exists"
          value: base
        #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
          effect: NoExecute
    
      ## Node labels for alertmanager pod assignment
      ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
      ##
      nodeSelector:
        kubernetes.io/hostname: prod-sys-k8s-wn3
    
      ## Pod affinity
      ##
      ## 配置alertmanager节点亲和性
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/resource
                    operator: In
                    values:
                      - base
    
      ## PodDisruptionBudget settings
      ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
      ##
      podDisruptionBudget:
        enabled: false
        maxUnavailable: 1
    
      ## Use an alternate scheduler, e.g. "stork".
      ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
      ##
      # schedulerName:
    
      persistentVolume:
        ## If true, alertmanager will create/use a Persistent Volume Claim
        ## If false, use emptyDir
        ##
        enabled: true
    
        ## alertmanager data Persistent Volume access modes
        ## Must match those of existing PV or dynamic provisioner
        ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
        ##
        accessModes:
          - ReadWriteOnce
    
        ## alertmanager data Persistent Volume Claim annotations
        ##
        annotations: {}
    
        ## alertmanager data Persistent Volume existing claim name
        ## Requires alertmanager.persistentVolume.enabled: true
        ## If defined, PVC must be created manually before volume will be bound
        existingClaim: ""
    
        ## alertmanager data Persistent Volume mount root path
        ##
        mountPath: /data
    
        ## alertmanager data Persistent Volume size
        ##
        size: 20Gi
    
        ## alertmanager data Persistent Volume Storage Class
        ## If defined, storageClassName: <storageClass>
        ## If set to "-", storageClassName: "", which disables dynamic provisioning
        ## If undefined (the default) or set to null, no storageClassName spec is
        ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
        ##   GKE, AWS & OpenStack)
        ##
        # storageClass: "-"
        ## 配置alertmanager持久化存储
        storageClass: "alicloud-disk-essd"
    
        ## alertmanager data Persistent Volume Binding Mode
        ## If defined, volumeBindingMode: <volumeBindingMode>
        ## If undefined (the default) or set to null, no volumeBindingMode spec is
        ##   set, choosing the default mode.
        ##
        # volumeBindingMode: ""
    
        ## Subdirectory of alertmanager data Persistent Volume to mount
        ## Useful if the volume's root directory is not empty
        ##
        subPath: ""
    
      emptyDir:
        ## alertmanager emptyDir volume size limit
        ##
        sizeLimit: ""
    
      ## Annotations to be added to alertmanager pods
      ##
      podAnnotations: {}
        ## Tell prometheus to use a specific set of alertmanager pods
        ## instead of all alertmanager pods found in the same namespace
        ## Useful if you deploy multiple releases within the same namespace
        ##
        ## prometheus.io/probe: alertmanager-teamA
    
      ## Labels to be added to Prometheus AlertManager pods
      ##
      podLabels: {}
    
      ## Specify if a Pod Security Policy for node-exporter must be created
      ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
      ##
      podSecurityPolicy:
        annotations: {}
          ## Specify pod annotations
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
          ##
          # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
          # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
          # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
    
      ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below)
      ##
      replicaCount: 1
    
      ## Annotations to be added to deployment
      ##
      deploymentAnnotations: {}
    
      statefulSet:
        ## If true, use a statefulset instead of a deployment for pod management.
        ## This allows to scale replicas to more than 1 pod
        ##
        enabled: false
    
        annotations: {}
        labels: {}
        podManagementPolicy: OrderedReady
    
        ## Alertmanager headless service to use for the statefulset
        ##
        headless:
          annotations: {}
          labels: {}
    
          ## Enabling peer mesh service end points for enabling the HA alert manager
          ## Ref: https://github.com/prometheus/alertmanager/blob/master/README.md
          enableMeshPeer: false
    
          servicePort: 80
    
      ## alertmanager resource requests and limits
      ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
      ##
      ## 配置资源限额
      resources:
        limits:
          cpu: 2
          memory: 2Gi
        requests:
          cpu: 10m
          memory: 32Mi
    
      # Custom DNS configuration to be added to alertmanager pods
      dnsConfig: {}
        # nameservers:
        #   - 1.2.3.4
        # searches:
        #   - ns1.svc.cluster-domain.example
        #   - my.dns.search.suffix
        # options:
        #   - name: ndots
        #     value: "2"
      #   - name: edns0
    
      ## 配置网络模式
      hostNetwork: false
    
      ## Security context to be added to alertmanager pods
      ##
      securityContext:
        runAsUser: 65534
        runAsNonRoot: true
        runAsGroup: 65534
        fsGroup: 65534
    
      service:
        annotations: {}
        labels: {}
        clusterIP: ""
    
        ## Enabling peer mesh service end points for enabling the HA alert manager
        ## Ref: https://github.com/prometheus/alertmanager/blob/master/README.md
        # enableMeshPeer : true
    
        ## List of IP addresses at which the alertmanager service is available
        ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
        ##
        ## 配置service外部地址
        externalIPs:
          - 10.1.0.11
        loadBalancerIP: ""
        loadBalancerSourceRanges: []
        servicePort: 9093
        # nodePort: 30000
        sessionAffinity: None
        type: ClusterIP
    
    ## Monitors ConfigMap changes and POSTs to a URL
    ## Ref: https://github.com/jimmidyson/configmap-reload
    ##
    configmapReload:
      prometheus:
        ## If false, the configmap-reload container will not be deployed
        ##
        enabled: true
    
        ## configmap-reload container name
        ##
        name: configmap-reload
    
        ## configmap-reload container image
        ##
        image:
          repository: jimmidyson/configmap-reload
          tag: v0.5.0
          pullPolicy: IfNotPresent
    
        ## Additional configmap-reload container arguments
        ##
        extraArgs: {}
        ## Additional configmap-reload volume directories
        ##
        extraVolumeDirs: []
    
    
        ## Additional configmap-reload mounts
        ##
        extraConfigmapMounts: []
          # - name: prometheus-alerts
          #   mountPath: /etc/alerts.d
          #   subPath: ""
          #   configMap: prometheus-alerts
          #   readOnly: true
    
    
        ## configmap-reload resource requests and limits
        ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
        ##
        resources: {}
      alertmanager:
        ## If false, the configmap-reload container will not be deployed
        ##
        enabled: true
    
        ## configmap-reload container name
        ##
        name: configmap-reload
    
        ## configmap-reload container image
        ##
        image:
          repository: jimmidyson/configmap-reload
          tag: v0.5.0
          pullPolicy: IfNotPresent
    
        ## Additional configmap-reload container arguments
        ##
        extraArgs: {}
        ## Additional configmap-reload volume directories
        ##
        extraVolumeDirs: []
    
    
        ## Additional configmap-reload mounts
        ##
        extraConfigmapMounts: []
          # - name: prometheus-alerts
          #   mountPath: /etc/alerts.d
          #   subPath: ""
          #   configMap: prometheus-alerts
          #   readOnly: true
    
    
        ## configmap-reload resource requests and limits
        ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
        ##
        resources: {}
    
    kubeStateMetrics:
      ## If false, kube-state-metrics sub-chart will not be installed
      ##
      enabled: true
      ## 配置节点亲和性
      nodeSelector:
        kubernetes.io/hostname: prod-sys-k8s-wn3
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/resource
                    operator: In
                    values:
                      - base
      priorityClassName: "monitor-service"
      resources:
        limits:
          cpu: 1
          memory: 128Mi
        requests:
          cpu: 100m
          memory: 30Mi
      tolerations:
        - key: resource
        #   operator: "Equal|Exists"
          value: base
          effect: NoExecute
    
    ## kube-state-metrics sub-chart configurable values
    ## Please see https://github.com/kubernetes/kube-state-metrics/tree/master/charts/kube-state-metrics
    ##
    # kube-state-metrics:
    
    nodeExporter:
      ## If false, node-exporter will not be installed
      ##
      enabled: true
    
      ## If true, node-exporter pods share the host network namespace
      ##
      hostNetwork: true
    
      ## If true, node-exporter pods share the host PID namespace
      ##
      hostPID: true
    
      ## If true, node-exporter pods mounts host / at /host/root
      ##
      hostRootfs: true
    
      ## node-exporter container name
      ##
      name: node-exporter
    
      ## node-exporter container image
      ##
      image:
        repository: quay.io/prometheus/node-exporter
        tag: v1.1.2
        pullPolicy: IfNotPresent
    
      ## 优先级配置
      priorityClassName: "monitor-service"
    
      ## Specify if a Pod Security Policy for node-exporter must be created
      ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
      ##
      podSecurityPolicy:
        annotations: {}
          ## Specify pod annotations
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
          ##
          # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
          # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
          # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
    
      ## node-exporter priorityClassName
      ## 配置node-exporter优先级
      priorityClassName: "monitor-service"
    
      ## Custom Update Strategy
      ## 配置滚动更新策略
      updateStrategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 1
    
      ## Additional node-exporter container arguments
      ##
      extraArgs: {}
    
      ## Additional InitContainers to initialize the pod
      ##
      extraInitContainers: []
    
      ## Additional node-exporter hostPath mounts
      ##
      extraHostPathMounts: []
        # - name: textfile-dir
        #   mountPath: /srv/txt_collector
        #   hostPath: /var/lib/node-exporter
        #   readOnly: true
        #   mountPropagation: HostToContainer
    
      extraConfigmapMounts: []
        # - name: certs-configmap
        #   mountPath: /prometheus
        #   configMap: certs-configmap
        #   readOnly: true
    
      ## Node tolerations for node-exporter scheduling to nodes with taints
      ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
      ##
      ## 配置node-exporter污点容忍
      tolerations:
        # - key: "key"
        #   operator: "Equal|Exists"
        - operator: Exists
        #   value: "value"
        #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
    
      ## Node labels for node-exporter pod assignment
      ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
      ##
      nodeSelector: {}
    
      ## Annotations to be added to node-exporter pods
      ##
      podAnnotations: {}
    
      ## Labels to be added to node-exporter pods
      ##
      pod:
        labels: {}
    
      ## PodDisruptionBudget settings
      ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
      ##
      podDisruptionBudget:
        enabled: false
        maxUnavailable: 1
    
      ## node-exporter resource limits & requests
      ## Ref: https://kubernetes.io/docs/user-guide/compute-resources/
      ##
      ## 配置node-exporter资源限额
      resources:
        limits:
          cpu: 1
          memory: 128Mi
        requests:
          cpu: 100m
          memory: 30Mi
    
      # Custom DNS configuration to be added to node-exporter pods
      dnsConfig: {}
        # nameservers:
        #   - 1.2.3.4
        # searches:
        #   - ns1.svc.cluster-domain.example
        #   - my.dns.search.suffix
        # options:
        #   - name: ndots
        #     value: "2"
      #   - name: edns0
    
      ## Security context to be added to node-exporter pods
      ##
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
    
      service:
        annotations:
          prometheus.io/scrape: "true"
        labels: {}
    
        # Exposed as a headless service:
        # https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
        clusterIP: None
    
        ## List of IP addresses at which the node-exporter service is available
        ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
        ##
        externalIPs: []
    
        hostPort: 9100
        loadBalancerIP: ""
        loadBalancerSourceRanges: []
        servicePort: 9100
        type: ClusterIP
    
    server:
      ## Prometheus server container name
      ##
      enabled: true
    
      ## Use a ClusterRole (and ClusterRoleBinding)
      ## - If set to false - we define a RoleBinding in the defined namespaces ONLY
      ##
      ## NB: because we need a Role with nonResourceURL's ("/metrics") - you must get someone with Cluster-admin privileges to define this role for you, before running with this setting enabled.
      ##     This makes prometheus work - for users who do not have ClusterAdmin privs, but wants prometheus to operate on their own namespaces, instead of clusterwide.
      ##
      ## You MUST also set namespaces to the ones you have access to and want monitored by Prometheus.
      ##
      # useExistingClusterRoleName: nameofclusterrole
    
      ## namespaces to monitor (instead of monitoring all - clusterwide). Needed if you want to run without Cluster-admin privileges.
      # namespaces:
      #   - yournamespace
    
      ## 配置prometheus容器名称
      name: server
    
      # sidecarContainers - add more containers to prometheus server
      # Key/Value where Key is the sidecar `- name: <Key>`
      # Example:
      #   sidecarContainers:
      #      webserver:
      #        image: nginx
      sidecarContainers: {}
    
      ## Prometheus server container image
      ##
      image:
        repository: quay.io/prometheus/prometheus
        tag: v2.26.0
        pullPolicy: IfNotPresent
    
      ## prometheus server priorityClassName
      ##
      priorityClassName: "monitor-service"
    
      ## EnableServiceLinks indicates whether information about services should be injected
      ## into pod's environment variables, matching the syntax of Docker links.
      ## WARNING: the field is unsupported and will be skipped in K8s prior to v1.13.0.
      ##
      enableServiceLinks: true
    
      ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug
      ## so that the various internal URLs are still able to access as they are in the default case.
      ## (Optional)
      prefixURL: ""
    
      ## External URL which can access prometheus
      ## Maybe same with Ingress host name
      baseURL: ""
    
      ## Additional server container environment variables
      ##
      ## You specify this manually like you would a raw deployment manifest.
      ## This means you can bind in environment variables from secrets.
      ##
      ## e.g. static environment variable:
      ##  - name: DEMO_GREETING
      ##    value: "Hello from the environment"
      ##
      ## e.g. secret environment variable:
      ## - name: USERNAME
      ##   valueFrom:
      ##     secretKeyRef:
      ##       name: mysecret
      ##       key: username
      env: []
    
      extraFlags:
        - web.enable-lifecycle
        ## web.enable-admin-api flag controls access to the administrative HTTP API which includes functionality such as
        ## deleting time series. This is disabled by default.
        - web.enable-admin-api
        ##
        ## storage.tsdb.no-lockfile flag controls BD locking
        # - storage.tsdb.no-lockfile
        ##
        ## storage.tsdb.wal-compression flag enables compression of the write-ahead log (WAL)
        # - storage.tsdb.wal-compression
    
      ## Path to a configuration file on prometheus server container FS
      configPath: /etc/config/prometheus.yml
    
      ### The data directory used by prometheus to set --storage.tsdb.path
      ### When empty server.persistentVolume.mountPath is used instead
      ## 配置持久化存储SC
      storagePath: ""
    
      ## Prometheus配置文件自定义
      global:
        ## How frequently to scrape targets by default
        ##
        scrape_interval: 15s
        ## How long until a scrape request times out
        ##
        scrape_timeout: 10s
        ## How frequently to evaluate rules
        ##
        evaluation_interval: 15s
      ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write
      ##
      remoteWrite: []
      ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read
      ##
      remoteRead: []
    
      ## Additional Prometheus server container arguments
      ##
      extraArgs: {}
    
      ## Additional InitContainers to initialize the pod
      ##
      extraInitContainers: []
    
      ## Additional Prometheus server Volume mounts
      ##
      extraVolumeMounts: []
    
      ## Additional Prometheus server Volumes
      ##
      extraVolumes: []
    
      ## Additional Prometheus server hostPath mounts
      ##
      extraHostPathMounts: []
        # - name: certs-dir
        #   mountPath: /etc/kubernetes/certs
        #   subPath: ""
        #   hostPath: /etc/kubernetes/certs
        #   readOnly: true
    
      extraConfigmapMounts: []
        # - name: certs-configmap
        #   mountPath: /prometheus
        #   subPath: ""
        #   configMap: certs-configmap
        #   readOnly: true
    
      ## Additional Prometheus server Secret mounts
      # Defines additional mounts with secrets. Secrets must be manually created in the namespace.
      extraSecretMounts: []
        # - name: secret-files
        #   mountPath: /etc/secrets
        #   subPath: ""
        #   secretName: prom-secret-files
        #   readOnly: true
    
      ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.server.configMapOverrideName}}
      ## Defining configMapOverrideName will cause templates/server-configmap.yaml
      ## to NOT generate a ConfigMap resource
      ##
      configMapOverrideName: ""
    
      ingress:
        ## If true, Prometheus server Ingress will be created
        ##
        enabled: false
    
        ## Prometheus server Ingress annotations
        ##
        annotations: {}
        #   kubernetes.io/ingress.class: nginx
        #   kubernetes.io/tls-acme: 'true'
    
        ## Prometheus server Ingress additional labels
        ##
        extraLabels: {}
    
        ## Prometheus server Ingress hostnames with optional path
        ## Must be provided if Ingress is enabled
        ##
        hosts: []
        #   - prometheus.domain.com
        #   - domain.com/prometheus
    
        ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
        extraPaths: []
        # - path: /*
        #   backend:
        #     serviceName: ssl-redirect
        #     servicePort: use-annotation
    
        ## Prometheus server Ingress TLS configuration
        ## Secrets must be manually created in the namespace
        ##
        tls: []
        #   - secretName: prometheus-server-tls
        #     hosts:
        #       - prometheus.domain.com
    
      ## Server Deployment Strategy type
      # strategy:
      #   type: Recreate
    
      ## hostAliases allows adding entries to /etc/hosts inside the containers
      hostAliases: []
      #   - ip: "127.0.0.1"
      #     hostnames:
      #       - "example.com"
    
      ## Node tolerations for server scheduling to nodes with taints
      ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
      ##
      ## 配置prometheus-server污点容忍
      tolerations:
        - key: resource
        #   operator: "Equal|Exists"
          value: base
        #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
          effect: NoExecute
    
      ## Node labels for Prometheus server pod assignment
      ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
      ##
      nodeSelector:
        kubernetes.io/hostname: prod-sys-k8s-wn4
    
      ## Pod affinity
      ##
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/resource
                    operator: In
                    values:
                      - base
    
      ## PodDisruptionBudget settings
      ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
      ##
      podDisruptionBudget:
        enabled: false
        maxUnavailable: 1
    
      ## Use an alternate scheduler, e.g. "stork".
      ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
      ##
      # schedulerName:
    
      persistentVolume:
        ## If true, Prometheus server will create/use a Persistent Volume Claim
        ## If false, use emptyDir
        ##
        enabled: true
    
        ## Prometheus server data Persistent Volume access modes
        ## Must match those of existing PV or dynamic provisioner
        ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
        ##
        accessModes:
          - ReadWriteOnce
    
        ## Prometheus server data Persistent Volume annotations
        ##
        annotations: {}
    
        ## Prometheus server data Persistent Volume existing claim name
        ## Requires server.persistentVolume.enabled: true
        ## If defined, PVC must be created manually before volume will be bound
        existingClaim: ""
    
        ## Prometheus server data Persistent Volume mount root path
        ##
        mountPath: /data
    
        ## Prometheus server data Persistent Volume size
        ##
        size: 500Gi
    
        ## Prometheus server data Persistent Volume Storage Class
        ## If defined, storageClassName: <storageClass>
        ## If set to "-", storageClassName: "", which disables dynamic provisioning
        ## If undefined (the default) or set to null, no storageClassName spec is
        ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
        ##   GKE, AWS & OpenStack)
        ##
        storageClass: "alicloud-disk-essd"
    
        ## Prometheus server data Persistent Volume Binding Mode
        ## If defined, volumeBindingMode: <volumeBindingMode>
        ## If undefined (the default) or set to null, no volumeBindingMode spec is
        ##   set, choosing the default mode.
        ##
        # volumeBindingMode: ""
    
        ## Subdirectory of Prometheus server data Persistent Volume to mount
        ## Useful if the volume's root directory is not empty
        ##
        subPath: ""
    
      emptyDir:
        ## Prometheus server emptyDir volume size limit
        ##
        sizeLimit: ""
    
      ## Annotations to be added to Prometheus server pods
      ##
      podAnnotations: {}
        # iam.amazonaws.com/role: prometheus
    
      ## Labels to be added to Prometheus server pods
      ##
      podLabels: {}
    
      ## Prometheus AlertManager configuration
      ##
      alertmanagers: []
    
      ## Specify if a Pod Security Policy for node-exporter must be created
      ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
      ##
      podSecurityPolicy:
        annotations: {}
          ## Specify pod annotations
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
          ##
          # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
          # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
          # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
    
      ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below)
      ##
      replicaCount: 1
    
      ## Annotations to be added to deployment
      ##
      deploymentAnnotations: {}
    
      statefulSet:
        ## If true, use a statefulset instead of a deployment for pod management.
        ## This allows to scale replicas to more than 1 pod
        ##
        enabled: false
    
        annotations: {}
        labels: {}
        podManagementPolicy: OrderedReady
    
        ## Alertmanager headless service to use for the statefulset
        ##
        headless:
          annotations: {}
          labels: {}
          servicePort: 80
          ## Enable gRPC port on service to allow auto discovery with thanos-querier
          gRPC:
            enabled: false
            servicePort: 10901
            # nodePort: 10901
    
      ## Prometheus server readiness and liveness probe initial delay and timeout
      ## Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
      ##
      readinessProbeInitialDelay: 30
      readinessProbePeriodSeconds: 5
      readinessProbeTimeout: 4
      readinessProbeFailureThreshold: 3
      readinessProbeSuccessThreshold: 1
      livenessProbeInitialDelay: 30
      livenessProbePeriodSeconds: 15
      livenessProbeTimeout: 10
      livenessProbeFailureThreshold: 3
      livenessProbeSuccessThreshold: 1
    
      ## Prometheus server resource requests and limits
      ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
      ##
      ## prometheus-server资源限额
      resources:
        limits:
          cpu: 4
          memory: 30Gi
        requests:
          cpu: 500m
          memory: 5Gi
    
      # Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico),
      # because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working
      ##
      hostNetwork: false
    
      # When hostNetwork is enabled, you probably want to set this to ClusterFirstWithHostNet
      dnsPolicy: ClusterFirst
    
      ## Vertical Pod Autoscaler config
      ## Ref: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
      verticalAutoscaler:
        ## If true a VPA object will be created for the controller (either StatefulSet or Deployemnt, based on above configs)
        enabled: false
        # updateMode: "Auto"
        # containerPolicies:
        # - containerName: 'prometheus-server'
    
      # Custom DNS configuration to be added to prometheus server pods
      dnsConfig: {}
        # nameservers:
        #   - 1.2.3.4
        # searches:
        #   - ns1.svc.cluster-domain.example
        #   - my.dns.search.suffix
        # options:
        #   - name: ndots
        #     value: "2"
      #   - name: edns0
      ## Security context to be added to server pods
      ##
      securityContext:
        runAsUser: 65534
        runAsNonRoot: true
        runAsGroup: 65534
        fsGroup: 65534
    
      service:
        annotations: {}
        labels: {}
        clusterIP: ""
    
        ## List of IP addresses at which the Prometheus server service is available
        ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
        ##
        ## 配置prometheus-server-service外部IP访问
        externalIPs:
          - 10.1.0.10
    
        loadBalancerIP: ""
        loadBalancerSourceRanges: []
        servicePort: 9090
        sessionAffinity: None
        type: ClusterIP
    
        ## Enable gRPC port on service to allow auto discovery with thanos-querier
        gRPC:
          enabled: false
          servicePort: 10901
          # nodePort: 10901
    
        ## If using a statefulSet (statefulSet.enabled=true), configure the
        ## service to connect to a specific replica to have a consistent view
        ## of the data.
        statefulsetReplica:
          enabled: false
          replica: 0
    
      ## Prometheus server pod termination grace period
      ##
      terminationGracePeriodSeconds: 300
    
      ## Prometheus data retention period (default if not specified is 15 days)
      ##
      retention: "30d"
    
    pushgateway:
      ## If false, pushgateway will not be installed
      ##
      enabled: true
    
      ## Use an alternate scheduler, e.g. "stork".
      ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
      ##
      # schedulerName:
    
      ## pushgateway container name
      ##
      name: pushgateway
    
      ## pushgateway container image
      ##
      image:
        repository: prom/pushgateway
        tag: v1.3.1
        pullPolicy: IfNotPresent
    
      ## pushgateway priorityClassName
      ##
      priorityClassName: "monitor-service"
    
      ## Additional pushgateway container arguments
      ##
      ## for example: persistence.file: /data/pushgateway.data
      extraArgs: {}
    
      ## Additional InitContainers to initialize the pod
      ##
      extraInitContainers: []
    
      ingress:
        ## If true, pushgateway Ingress will be created
        ##
        enabled: false
    
        ## pushgateway Ingress annotations
        ##
        annotations: {}
        #   kubernetes.io/ingress.class: nginx
        #   kubernetes.io/tls-acme: 'true'
    
        ## pushgateway Ingress hostnames with optional path
        ## Must be provided if Ingress is enabled
        ##
        hosts: []
        #   - pushgateway.domain.com
        #   - domain.com/pushgateway
    
        ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
        extraPaths: []
        # - path: /*
        #   backend:
        #     serviceName: ssl-redirect
        #     servicePort: use-annotation
    
        ## pushgateway Ingress TLS configuration
        ## Secrets must be manually created in the namespace
        ##
        tls: []
        #   - secretName: prometheus-alerts-tls
        #     hosts:
        #       - pushgateway.domain.com
    
      ## Node tolerations for pushgateway scheduling to nodes with taints
      ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
      ##
      ## 配置PushGateway污点容忍
      tolerations:
        - key: resource
        #   operator: "Equal|Exists"
          value: base
        #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
          effect: NoExecute
    
      ## Node labels for pushgateway pod assignment
      ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
      ##
      nodeSelector: {}
    
      ## 配置节点亲和性
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/resource
                    operator: In
                    values:
                      - base
    
      ## Annotations to be added to pushgateway pods
      ##
      podAnnotations: {}
    
      ## Labels to be added to pushgateway pods
      ##
      podLabels: {}
    
      ## Specify if a Pod Security Policy for node-exporter must be created
      ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
      ##
      podSecurityPolicy:
        annotations: {}
          ## Specify pod annotations
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
          ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
          ##
          # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
          # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
          # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
    
      replicaCount: 1
    
      ## Annotations to be added to deployment
      ##
      deploymentAnnotations: {}
    
      ## PodDisruptionBudget settings
      ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
      ##
      podDisruptionBudget:
        enabled: false
        maxUnavailable: 1
    
      ## pushgateway resource requests and limits
      ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
      ##
      ## 配置PushGateway资源限额
      resources:
        limits:
          cpu: 10m
          memory: 32Mi
        requests:
          cpu: 10m
          memory: 32Mi
    
      # Custom DNS configuration to be added to push-gateway pods
      dnsConfig: {}
        # nameservers:
        #   - 1.2.3.4
        # searches:
        #   - ns1.svc.cluster-domain.example
        #   - my.dns.search.suffix
        # options:
        #   - name: ndots
        #     value: "2"
      #   - name: edns0
    
      ## Security context to be added to push-gateway pods
      ##
      securityContext:
        runAsUser: 65534
        runAsNonRoot: true
    
      service:
        annotations:
          prometheus.io/probe: pushgateway
        labels: {}
        clusterIP: ""
    
        ## List of IP addresses at which the pushgateway service is available
        ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
        ##
        externalIPs: []
    
        loadBalancerIP: ""
        loadBalancerSourceRanges: []
        servicePort: 9091
        type: ClusterIP
    
      ## pushgateway Deployment Strategy type
      # strategy:
      #   type: Recreate
    
      persistentVolume:
        ## If true, pushgateway will create/use a Persistent Volume Claim
        ##
        enabled: false
    
        ## pushgateway data Persistent Volume access modes
        ## Must match those of existing PV or dynamic provisioner
        ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
        ##
        accessModes:
          - ReadWriteOnce
    
        ## pushgateway data Persistent Volume Claim annotations
        ##
        annotations: {}
    
        ## pushgateway data Persistent Volume existing claim name
        ## Requires pushgateway.persistentVolume.enabled: true
        ## If defined, PVC must be created manually before volume will be bound
        existingClaim: ""
    
        ## pushgateway data Persistent Volume mount root path
        ##
        mountPath: /data
    
        ## pushgateway data Persistent Volume size
        ##
        size: 2Gi
    
        ## pushgateway data Persistent Volume Storage Class
        ## If defined, storageClassName: <storageClass>
        ## If set to "-", storageClassName: "", which disables dynamic provisioning
        ## If undefined (the default) or set to null, no storageClassName spec is
        ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
        ##   GKE, AWS & OpenStack)
        ##
        # storageClass: "-"
    
        ## pushgateway data Persistent Volume Binding Mode
        ## If defined, volumeBindingMode: <volumeBindingMode>
        ## If undefined (the default) or set to null, no volumeBindingMode spec is
        ##   set, choosing the default mode.
        ##
        # volumeBindingMode: ""
    
        ## Subdirectory of pushgateway data Persistent Volume to mount
        ## Useful if the volume's root directory is not empty
        ##
        subPath: ""
    
    
    ## alertmanager ConfigMap entries
    ##
    alertmanagerFiles:
      alertmanager.yml:
        global: {}
          # slack_api_url: ''
    
        receivers:
          - name: default-receiver
            # slack_configs:
            #  - channel: '@you'
            #    send_resolved: true
    
        route:
          group_wait: 10s
          group_interval: 5m
          receiver: default-receiver
          repeat_interval: 1h
    
    ## Prometheus server ConfigMap entries
    ##
    serverFiles:
    
      ## Alerts configuration
      ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
      alerting_rules.yml: {}
      # groups:
      #   - name: Instances
      #     rules:
      #       - alert: InstanceDown
      #         expr: up == 0
      #         for: 5m
      #         labels:
      #           severity: page
      #         annotations:
      #           description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
      #           summary: 'Instance {{ $labels.instance }} down'
      ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use alerting_rules.yml
      alerts: {}
    
      ## Records configuration
      ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
      recording_rules.yml: {}
      ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use recording_rules.yml
      rules: {}
    
      prometheus.yml:
        rule_files:
          - /etc/config/recording_rules.yml
          - /etc/config/alerting_rules.yml
        ## Below two files are DEPRECATED will be removed from this default values file
          - /etc/config/rules
          - /etc/config/alerts
    
        scrape_configs:
          - job_name: prometheus
            static_configs:
              - targets:
                - localhost:9090
    
          # A scrape configuration for running Prometheus on a Kubernetes cluster.
          # This uses separate scrape configs for cluster components (i.e. API server, node)
          # and services to allow each to use different authentication configs.
          #
          # Kubernetes labels will be added as Prometheus labels on metrics via the
          # `labelmap` relabeling action.
    
          # Scrape config for API servers.
          #
          # Kubernetes exposes API servers as endpoints to the default/kubernetes
          # service so this uses `endpoints` role and uses relabelling to only keep
          # the endpoints associated with the default/kubernetes service using the
          # default named port `https`. This works for single API server deployments as
          # well as HA API server deployments.
          - job_name: 'kubernetes-apiservers'
    
            kubernetes_sd_configs:
              - role: endpoints
    
            # Default to scraping over https. If required, just disable this or change to
            # `http`.
            scheme: https
    
            # This TLS & bearer token file config is used to connect to the actual scrape
            # endpoints for cluster components. This is separate to discovery auth
            # configuration because discovery & scraping are two separate concerns in
            # Prometheus. The discovery auth config is automatic if Prometheus runs inside
            # the cluster. Otherwise, more config options have to be provided within the
            # <kubernetes_sd_config>.
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              # If your node certificates are self-signed or use a different CA to the
              # master CA, then disable certificate verification below. Note that
              # certificate verification is an integral part of a secure infrastructure
              # so this should only be disabled in a controlled environment. You can
              # disable certificate verification by uncommenting the line below.
              #
              insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
            # Keep only the default/kubernetes service endpoints for the https port. This
            # will add targets for each API server which Kubernetes adds an endpoint to
            # the default/kubernetes service.
            relabel_configs:
              - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
                action: keep
                regex: default;kubernetes;https
    
          - job_name: 'kubernetes-nodes'
    
            # Default to scraping over https. If required, just disable this or change to
            # `http`.
            scheme: https
    
            # This TLS & bearer token file config is used to connect to the actual scrape
            # endpoints for cluster components. This is separate to discovery auth
            # configuration because discovery & scraping are two separate concerns in
            # Prometheus. The discovery auth config is automatic if Prometheus runs inside
            # the cluster. Otherwise, more config options have to be provided within the
            # <kubernetes_sd_config>.
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              # If your node certificates are self-signed or use a different CA to the
              # master CA, then disable certificate verification below. Note that
              # certificate verification is an integral part of a secure infrastructure
              # so this should only be disabled in a controlled environment. You can
              # disable certificate verification by uncommenting the line below.
              #
              insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
            kubernetes_sd_configs:
              - role: node
    
            relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_node_label_(.+)
              - target_label: __address__
                replacement: kubernetes.default.svc:443
              - source_labels: [__meta_kubernetes_node_name]
                regex: (.+)
                target_label: __metrics_path__
                replacement: /api/v1/nodes/$1/proxy/metrics
    
    
          - job_name: 'kubernetes-nodes-cadvisor'
    
            # Default to scraping over https. If required, just disable this or change to
            # `http`.
            scheme: https
    
            # This TLS & bearer token file config is used to connect to the actual scrape
            # endpoints for cluster components. This is separate to discovery auth
            # configuration because discovery & scraping are two separate concerns in
            # Prometheus. The discovery auth config is automatic if Prometheus runs inside
            # the cluster. Otherwise, more config options have to be provided within the
            # <kubernetes_sd_config>.
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              # If your node certificates are self-signed or use a different CA to the
              # master CA, then disable certificate verification below. Note that
              # certificate verification is an integral part of a secure infrastructure
              # so this should only be disabled in a controlled environment. You can
              # disable certificate verification by uncommenting the line below.
              #
              insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
            kubernetes_sd_configs:
              - role: node
    
            # This configuration will work only on kubelet 1.7.3+
            # As the scrape endpoints for cAdvisor have changed
            # if you are using older version you need to change the replacement to
            # replacement: /api/v1/nodes/$1:4194/proxy/metrics
            # more info here https://github.com/coreos/prometheus-operator/issues/633
            relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_node_label_(.+)
              - target_label: __address__
                replacement: kubernetes.default.svc:443
              - source_labels: [__meta_kubernetes_node_name]
                regex: (.+)
                target_label: __metrics_path__
                replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
    
          # Scrape config for service endpoints.
          #
          # The relabeling allows the actual service scrape endpoint to be configured
          # via the following annotations:
          #
          # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
          # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
          # to set this to `https` & most likely set the `tls_config` of the scrape config.
          # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
          # * `prometheus.io/port`: If the metrics are exposed on a different port to the
          # service then set this appropriately.
          - job_name: 'kubernetes-service-endpoints'
    
            kubernetes_sd_configs:
              - role: endpoints
    
            relabel_configs:
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
                action: keep
                regex: true
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
                action: replace
                target_label: __scheme__
                regex: (https?)
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
                action: replace
                target_label: __metrics_path__
                regex: (.+)
              - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
                action: replace
                target_label: __address__
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $1:$2
              - action: labelmap
                regex: __meta_kubernetes_service_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_service_name]
                action: replace
                target_label: kubernetes_name
              - source_labels: [__meta_kubernetes_pod_node_name]
                action: replace
                target_label: kubernetes_node
    
          # Scrape config for slow service endpoints; same as above, but with a larger
          # timeout and a larger interval
          #
          # The relabeling allows the actual service scrape endpoint to be configured
          # via the following annotations:
          #
          # * `prometheus.io/scrape-slow`: Only scrape services that have a value of `true`
          # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
          # to set this to `https` & most likely set the `tls_config` of the scrape config.
          # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
          # * `prometheus.io/port`: If the metrics are exposed on a different port to the
          # service then set this appropriately.
          - job_name: 'kubernetes-service-endpoints-slow'
    
            scrape_interval: 5m
            scrape_timeout: 30s
    
            kubernetes_sd_configs:
              - role: endpoints
    
            relabel_configs:
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
                action: keep
                regex: true
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
                action: replace
                target_label: __scheme__
                regex: (https?)
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
                action: replace
                target_label: __metrics_path__
                regex: (.+)
              - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
                action: replace
                target_label: __address__
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $1:$2
              - action: labelmap
                regex: __meta_kubernetes_service_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_service_name]
                action: replace
                target_label: kubernetes_name
              - source_labels: [__meta_kubernetes_pod_node_name]
                action: replace
                target_label: kubernetes_node
    
          - job_name: 'prometheus-pushgateway'
            honor_labels: true
    
            kubernetes_sd_configs:
              - role: service
    
            relabel_configs:
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
                action: keep
                regex: pushgateway
    
          # Example scrape config for probing services via the Blackbox Exporter.
          #
          # The relabeling allows the actual service scrape endpoint to be configured
          # via the following annotations:
          #
          # * `prometheus.io/probe`: Only probe services that have a value of `true`
          - job_name: 'kubernetes-services'
    
            metrics_path: /probe
            params:
              module: [http_2xx]
    
            kubernetes_sd_configs:
              - role: service
    
            relabel_configs:
              - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
                action: keep
                regex: true
              - source_labels: [__address__]
                target_label: __param_target
              - target_label: __address__
                replacement: blackbox
              - source_labels: [__param_target]
                target_label: instance
              - action: labelmap
                regex: __meta_kubernetes_service_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_service_name]
                target_label: kubernetes_name
    
          # Example scrape config for pods
          #
          # The relabeling allows the actual pod scrape endpoint to be configured via the
          # following annotations:
          #
          # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
          # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
          # to set this to `https` & most likely set the `tls_config` of the scrape config.
          # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
          # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`.
          - job_name: 'kubernetes-pods'
    
            kubernetes_sd_configs:
              - role: pod
    
            relabel_configs:
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                action: keep
                regex: true
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
                action: replace
                regex: (https?)
                target_label: __scheme__
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                action: replace
                target_label: __metrics_path__
                regex: (.+)
              - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                action: replace
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $1:$2
                target_label: __address__
              - action: labelmap
                regex: __meta_kubernetes_pod_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_pod_name]
                action: replace
                target_label: kubernetes_pod_name
              - source_labels: [__meta_kubernetes_pod_phase]
                regex: Pending|Succeeded|Failed
                action: drop
    
          # Example Scrape config for pods which should be scraped slower. An useful example
          # would be stackriver-exporter which queries an API on every scrape of the pod
          #
          # The relabeling allows the actual pod scrape endpoint to be configured via the
          # following annotations:
          #
          # * `prometheus.io/scrape-slow`: Only scrape pods that have a value of `true`
          # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
          # to set this to `https` & most likely set the `tls_config` of the scrape config.
          # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
          # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`.
          - job_name: 'kubernetes-pods-slow'
    
            scrape_interval: 5m
            scrape_timeout: 30s
    
            kubernetes_sd_configs:
              - role: pod
    
            relabel_configs:
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow]
                action: keep
                regex: true
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
                action: replace
                regex: (https?)
                target_label: __scheme__
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                action: replace
                target_label: __metrics_path__
                regex: (.+)
              - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                action: replace
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $1:$2
                target_label: __address__
              - action: labelmap
                regex: __meta_kubernetes_pod_label_(.+)
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_pod_name]
                action: replace
                target_label: kubernetes_pod_name
              - source_labels: [__meta_kubernetes_pod_phase]
                regex: Pending|Succeeded|Failed
                action: drop
    
    # adds additional scrape configs to prometheus.yml
    # must be a string so you have to add a | after extraScrapeConfigs:
    # example adds prometheus-blackbox-exporter scrape config
    extraScrapeConfigs:
      # - job_name: 'prometheus-blackbox-exporter'
      #   metrics_path: /probe
      #   params:
      #     module: [http_2xx]
      #   static_configs:
      #     - targets:
      #       - https://example.com
      #   relabel_configs:
      #     - source_labels: [__address__]
      #       target_label: __param_target
      #     - source_labels: [__param_target]
      #       target_label: instance
      #     - target_label: __address__
      #       replacement: prometheus-blackbox-exporter:9115
    
    # Adds option to add alert_relabel_configs to avoid duplicate alerts in alertmanager
    # useful in H/A prometheus with different external labels but the same alerts
    alertRelabelConfigs:
      # alert_relabel_configs:
      # - source_labels: [dc]
      #   regex: (.+)\d+
      #   target_label: dc
    
    networkPolicy:
      ## Enable creation of NetworkPolicy resources.
      ##
      enabled: false
    
    # Force namespace of namespaced resources
    forceNamespace: null
  5. 访问Prometheus & Alertmanager后台管理界面
    <root@PROD-K8S-CP1 ~># kubectl describe svc prometheus-prometheus-server 
    Name:              prometheus-prometheus-server
    Namespace:         default
    Labels:            app=prometheus
                       app.kubernetes.io/managed-by=Helm
                       chart=prometheus-14.6.0
                       component=prometheus-server
                       heritage=Helm
                       release=prometheus
    Annotations:       meta.helm.sh/release-name: prometheus
                       meta.helm.sh/release-namespace: default
    Selector:          app=prometheus,component=prometheus-server,release=prometheus
    Type:              ClusterIP
    IP:                10.12.0.20
    External IPs:      10.1.0.10
    Port:              http  9090/TCP
    TargetPort:        9090/TCP
    Endpoints:         172.21.3.157:9090
    Session Affinity:  None
    Events:            <none>
    <root@PROD-K8S-CP1 ~># kubectl describe svc prometheus-alertmanager 
    Name:              prometheus-alertmanager
    Namespace:         default
    Labels:            app=prometheus
                       app.kubernetes.io/managed-by=Helm
                       chart=prometheus-14.6.0
                       component=alertmanager
                       heritage=Helm
                       release=prometheus
    Annotations:       meta.helm.sh/release-name: prometheus
                       meta.helm.sh/release-namespace: default
    Selector:          app=prometheus,component=alertmanager,release=prometheus
    Type:              ClusterIP
    IP:                10.12.0.13
    External IPs:      10.1.0.11
    Port:              http  9093/TCP
    TargetPort:        9093/TCP
    Endpoints:         172.21.3.251:9093
    Session Affinity:  None
    Events:            <none>
  6. 访问方式 10.1.0.10:80
  7. 修改时区 lens直接修改
                - name: localtime
                  mountPath: /etc/localtime
    
    ----------------------------------------------
    
            - name: localtime
              hostPath:
                path: /etc/localtime
                type: ''
  8. 注意修改Prometheus-server的滚动更新策略,最大不可用设置为1,否则会出现DB锁定错误
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 1
  9. 如果手动查看节点expose的metrics
    <root@PRE-K8S-CP1 ~># curl -ik https://10.1.0.233:6443/metrics
    HTTP/1.1 403 Forbidden
    Cache-Control: no-cache, private
    Content-Type: application/json
    X-Content-Type-Options: nosniff
    Date: Wed, 02 Jun 2021 03:40:55 GMT
    Content-Length: 240
    
    {
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {
    
    },
    "status": "Failure",
    "message": "forbidden: User \"system:anonymous\" cannot get path \"/metrics\"",
    "reason": "Forbidden",
    "details": {
    
    },
    "code": 403
    }
    
    
    解决方案如下,是因为默认情况下,Kubernetes不允许未经授权下不能访问集群资源信息
    
    kubectl create clusterrolebinding prometheus-admin --clusterrole cluster-admin --user system:anonymous

Prometheus配置文件


reable_action参数解释

  • replace 默认,通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组
  • keep 删除regex与连接不匹配的目标 source_labels
  • drop 删除regex与连接匹配的目标 source_labels
  • labelmap 匹配regex所有标签名称。然后复制匹配标签的值进行分组,replacement分组引用(${1},${2},…)替代
  • labeldrop 删除regex匹配的标签
  • labelkeep 删除regex不匹配的标签

默认配置文件

job_name一般用来定义收集什么类型的metrics,下面的job_name定义了收集的是 

  1. api-server的性能指标,具体的配置解释如下
    - job_name: kubernetes-apiservers
      honor_timestamps: true
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: https
      authorization:
        type: Bearer
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      follow_redirects: true
      relabel_configs:   ## 标签重新定义
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]  ##定义原标签的名称作为匹配的条件
        separator: ; ## 分隔符的定义,用于分隔源标签值之间的符号
        regex: default;kubernetes;https ## 匹配原标签source_lable的值,比如原标签是__meta_kubernetes_service_name="kubernetes"  __meta_kubernetes_namespace="default" __meta_kubernetes_endpoint_port_name="https" 匹配规则需要匹配到source_lable为“__meta_kubernetes_endpoint_port_name”的值是"https"这个endpoints,这里的endpoints就是一个监控点,不要与kubernetes的endpoints混淆
        replacement: $1 ##默认操作
        action: keep ## 删除regex与source_lable的不匹配的标签,并保留匹配的regex
      kubernetes_sd_configs:
      - role: endpoints
        follow_redirects: true
  2. 节点性能指标采集job_name kubernetes_nodes
    - job_name: kubernetes-nodes
      honor_timestamps: true
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: https
      authorization:
        type: Bearer
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      follow_redirects: true
      relabel_configs:
      - separator: ;
        regex: __meta_kubernetes_node_label_(.+)
        replacement: $1
        action: labelmap ##保留__meta_kubernetes_node_label_(.+)中的".+"的值
      - separator: ;
        regex: (.*)
        target_label: __address__
        replacement: kubernetes.default.svc:443
        action: replace ## 新增__address标签并且值替换为replacement的值
      - source_labels: [__meta_kubernetes_node_name]
        separator: ;
        regex: (.+) ##匹配source_lable任意的值(实际就是node节点的名称)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics
        action: replace ##替换面的$1就是把regex匹配到的node节点的名称的值
      kubernetes_sd_configs:
      - role: node
        follow_redirects: true
    
    整段的配置意思就是修改了默认的__address__的值,第一段将IP修改成域名形式,第二配置将默认的/metrics修改成/api/v1/nodes/$1/proxy/metrics,整合起来最终的值就是 https://kubernetes.default.svc/api/v1/nodes/pre-k8s-cp3/proxy/metrics
  3. 节点容器指标采集说明,job_name kubernetes_nodes_cadvisor
    - job_name: kubernetes-nodes-cadvisor
      honor_timestamps: true
      scrape_interval: 10s
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: https
      authorization:
        type: Bearer
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      follow_redirects: true
      relabel_configs:
      - separator: ;
        regex: __meta_kubernetes_node_label_(.+)
        replacement: $1
        action: labelmap
      - separator: ;
        regex: (.*)
        target_label: __address__
        replacement: kubernetes.default.svc:443
        action: replace
      - source_labels: [__meta_kubernetes_node_name]
        separator: ;
        regex: (.+)
        target_label: __metrics_path__
        # 与节点性能指标采集唯一不同之处,URI路径不一样
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
        action: replace
      kubernetes_sd_configs:
      - role: node
        follow_redirects: true
  4. endpoints的service的监控配置(配置在service中)
    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus-kube-state-metrics
      namespace: default
      selfLink: /api/v1/namespaces/default/services/prometheus-kube-state-metrics
      uid: 161fff0c-fffe-4561-90ba-0bec0608fbe4
      resourceVersion: '23976299'
      creationTimestamp: '2021-06-01T06:51:21Z'
      labels:
        app.kubernetes.io/instance: prometheus
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: kube-state-metrics
        helm.sh/chart: kube-state-metrics-3.1.0
      annotations:
        meta.helm.sh/release-name: prometheus
        meta.helm.sh/release-namespace: default
        prometheus.io/scrape: 'true'

     

    - job_name: kubernetes-service-endpoints
      honor_timestamps: true
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: http
      follow_redirects: true
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        separator: ;
        regex: "true"
        replacement: $1
        action: keep ##注意这段配置,相当重要,直接决定这个job监控那到那些endpoints监控点,这一段意思是保留匹配__meta_kubernetes_service_annotation_prometheus_io_scrape的值为true的endpoints,并删除没有匹配到标签__meta_kubernetes_service_annotation_prometheus_io_scrape且值为true的endpoints
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        separator: ;
        regex: (https?)
        target_label: __scheme__
        replacement: $1
        action: replace  ##将__scheme__原值替换为regex的值
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        separator: ;
        regex: (.+)
        target_label: __metrics_path__
        replacement: $1
        action: replace ##source_lable __meta_kubernetes_service_annotation_prometheus_io_path的值赋予__metrics_path__
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        separator: ;
        regex: ([^:]+)(?::\d+)?;(\d+)
        target_label: __address__
        replacement: $1:$2
        action: replace ## 将匹配到的__address__ __meta_kubernetes_service_annotation_prometheus_io_port的标签,根据regex的规则匹配到的值赋予__address__
      - separator: ;
        regex: __meta_kubernetes_service_label_(.+)
        replacement: $1
        action: labelmap ##保留.+的值为新增的label
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: kubernetes_namespace
        replacement: $1
        action: replace ## 替换标签值,同上
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: kubernetes_name
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_pod_node_name]
        separator: ;
        regex: (.*)
        target_label: kubernetes_node
        replacement: $1
        action: replace
      kubernetes_sd_configs:
      - role: endpoints
        follow_redirects: true
  5. service监控配置,http_2xx
    - job_name: kubernetes-services
      honor_timestamps: true
      params:
        module:
        - http_2xx ##引入的模块
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /probe
      scheme: http
      follow_redirects: true
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        separator: ;
        regex: "true"
        replacement: $1
        action: keep ## 保留匹配__meta_kubernetes_service_annotation_prometheus_io_probe并且删除没有匹配到__meta_kubernetes_service_annotation_prometheus_io_probe的endpoints
      - source_labels: [__address__]
        separator: ;
        regex: (.*)
        target_label: __param_target
        replacement: $1
        action: replace ## 将regex的的值赋予__param_target
      - separator: ;
        regex: (.*)
        target_label: __address__
        replacement: blackbox
        action: replace
      - source_labels: [__param_target]
        separator: ;
        regex: (.*)
        target_label: instance
        replacement: $1
        action: replace ##经过上面标签重标,最终target_label instance的值就是上面的__address__的值
      - separator: ;
        regex: __meta_kubernetes_service_label_(.+)
        replacement: $1
        action: labelmap
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: kubernetes_namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: kubernetes_name
        replacement: $1
        action: replace
      kubernetes_sd_configs:
      - role: service
        follow_redirects: true
  6. Pod监控配置
    - job_name: kubernetes-pods
      honor_timestamps: true
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: http
      follow_redirects: true
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        separator: ;
        regex: "true"
        replacement: $1
        action: keep #
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
        separator: ;
        regex: (https?)
        target_label: __scheme__
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        separator: ;
        regex: (.+)
        target_label: __metrics_path__
        replacement: $1
        action: replace
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        separator: ;
        regex: ([^:]+)(?::\d+)?;(\d+)
        target_label: __address__
        replacement: $1:$2
        action: replace
      - separator: ;
        regex: __meta_kubernetes_pod_label_(.+)
        replacement: $1
        action: labelmap
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: kubernetes_namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_pod_name]
        separator: ;
        regex: (.*)
        target_label: kubernetes_pod_name
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_pod_phase]
        separator: ;
        regex: Pending|Succeeded|Failed
        replacement: $1
        action: drop
      kubernetes_sd_configs:
      - role: pod
        follow_redirects: true

 JVM RabbitMQ监控


  1. JVM监控(需要在deployment.yaml指定prometheus_io_scrape标签)如下

    spec:
      replicas: 1
      selector:
        matchLabels:
          app: pre-common-gateway
          component: spring
          part-of: pre
          tier: backend
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: pre-common-gateway
            component: spring
            part-of: pre
            tier: backend
          annotations: # 新增监控注释说明
            prometheus.io/port: '8888'
            prometheus.io/scrape: 'true'

     

      #############  Jvm监控   ##############
      - job_name: kubernetes-pods-jvm
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - action: keep
            regex: true
            source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels: 
            - __meta_kubernetes_pod_container_name
            target_label: kubernetes_container_name
          - action: replace
            source_labels:
              - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
          - action: drop
            regex: Pending|Succeeded|Failed
            source_labels:
              - __meta_kubernetes_pod_phase
        scrape_interval: 10s
        scrape_timeout: 10s
  2. RabbitMQ监控
    • rabbitmq-exporter(第三方提供的prometheus-exporter方式)
      https://github.com/kbudde/rabbitmq_exporter
      主要针对rabbitmq的系统服务使用状态监控,比如 Queue/Channel/Exchange/Connection及队列消耗内存使用监控/及节点城内存/磁盘等
    • rabbitmq-prometheus(官方自带)
      https://www.rabbitmq.com/prometheus.html
      主要针对RabbitMQ的基础设施,比如元数据/erlang进程资源使用/内存配置/CPU使用/open file分配,一句话就是偏底层监控

    以下的配置仅供参考,创建后的状态
    apiVersion: v1
    kind: Service
    metadata:
      name: pre-rabbitmq-monitor
      namespace: pre
      annotations:
        prometheus.io/scrape: rabbitmq
    spec:
      ports:
        - name: rabbitmq-exporter
          protocol: TCP
          port: 9419
          targetPort: 9419
        - name: rabbitmq-prometheus-port
          protocol: TCP
          port: 15692
          targetPort: 15692
      selector:
        app: pre-rabbitmq
      clusterIP: 10.11.0.90
      type: ClusterIP
      sessionAffinity: None

     

      #############  rabbitmq监控   ##############
      - job_name: kubernetes-service-rabbitmq
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          # 新增自定义监控标签rabbitmq
          - action: keep
            regex: rabbitmq
            source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
            target_label: __metrics_path__
          # 删除不需要监控的端口
          - action: drop
            regex: (5672|15672)
            source_labels:
              - __meta_kubernetes_pod_container_port_number
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - action: labelmap
            regex: __meta_kubernetes_pod_label_statefulset_kubernetes_io_(.+)
          - action: replace
            source_labels:
              - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
              - __meta_kubernetes_service_name
            target_label: kubernetes_name
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_node_name
            target_label: kubernetes_node
  3. 参考rabbitmq-exporter.yaml配置(lens直接修改)
            - name: rabbitmq-exporter
              image: hub.qiangyun.com/rabbitmq-exporter
              ports:
                - name: mq-monitor
                  containerPort: 9419
                  protocol: TCP
              env:
                - name: RABBIT_USER
                  value: guest
                - name: RABBIT_PASSWORD
                  value: guest
                - name: RABBIT_CAPABILITIES
                  value: bert
              resources:
                limits:
                  cpu: 500m
                  memory: 1Gi
              livenessProbe:
                httpGet:
                  path: /metrics
                  port: 9419
                  scheme: HTTP
                initialDelaySeconds: 60
                timeoutSeconds: 15
                periodSeconds: 60
                successThreshold: 1
                failureThreshold: 3
              readinessProbe:
                httpGet:
                  path: /metrics
                  port: 9419
                  scheme: HTTP
                initialDelaySeconds: 20
                timeoutSeconds: 10
                periodSeconds: 60
                successThreshold: 1
                failureThreshold: 3
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              imagePullPolicy: IfNotPresent
  4. rabbitmq-monitor-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        # 注意以下的配置与下面Prometheus的配置相呼应
        prometheus.io/scrape: rabbitmq
      name: rabbitmq-monitor
      namespace: prod
    spec:
      ports:
        - name: rabbitmq-exporter
          port: 9419
          protocol: TCP
          targetPort: 9419
        - name: rabbitmq-prometheus-port
          port: 15692
          protocol: TCP
          targetPort: 15692
      selector:
        app: rabbitmq
      type: ClusterIP 

最终参考配置


  • Prometheus完整配置

    global:
      evaluation_interval: 15s
      scrape_interval: 15s
      scrape_timeout: 10s
    rule_files:
    - /etc/config/recording_rules.yml
    - /etc/config/alerting_rules.yml
    - /etc/config/rules
    - /etc/config/alerts
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
        - localhost:9090
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
    ##############  Kubernetes apiserver 监控配置 ##############
      job_name: kubernetes-apiservers
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: default;kubernetes;https
        source_labels:
        - __meta_kubernetes_namespace
        - __meta_kubernetes_service_name
        - __meta_kubernetes_endpoint_port_name
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
    ##############  Kubernetes node 性能指标监控配置 ##############
      job_name: kubernetes-nodes
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - regex: (.+)
        replacement: /api/v1/nodes/$1/proxy/metrics
        source_labels:
        - __meta_kubernetes_node_name
        target_label: __metrics_path__
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
    ##############  Kubernetes 节点Pod性能指标监控配置 ##############
      job_name: kubernetes-nodes-cadvisor
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - regex: (.+)
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
        source_labels:
        - __meta_kubernetes_node_name
        target_label: __metrics_path__
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    
    ##############  Kubernetes service endpoints 监控配置 ##############
    - job_name: kubernetes-service-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: kubernetes_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: kubernetes_node
    
    ##############  Kubernetes service endpoints RabbitMQ 监控配置 ##############
    - job_name: kubernetes-service-rabbitmq
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: rabbitmq
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: keep
        regex: (15692|9419)
        source_labels:
        - __meta_kubernetes_pod_container_port_number
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: kubernetes_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: kubernetes_node
    
    ##############  暂时没有理解这个 slow鬼 监控配置 ##############
    - job_name: kubernetes-service-endpoints-slow
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: kubernetes_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: kubernetes_node
      scrape_interval: 5m
      scrape_timeout: 30s
    - honor_labels: true
    
    ##############  Prometheus pushgateway 监控配置 ##############
      job_name: prometheus-pushgateway
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - action: keep
        regex: pushgateway
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
    
    ##############  Kubernetes service http_2xx 监控配置 ##############
    - job_name: kubernetes-services
      kubernetes_sd_configs:
      - role: service
      metrics_path: /probe
      params:
        module:
        - http_2xx
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
      - source_labels:
        - __address__
        target_label: __param_target
      - replacement: blackbox
        target_label: __address__
      - source_labels:
        - __param_target
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - source_labels:
        - __meta_kubernetes_service_name
        target_label: kubernetes_name
    
    ##############  Kubernetes 用户应用 Pod 自定义监控配置 ##############
    - job_name: kubernetes-pods
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: kubernetes_pod_name
      - action: drop
        regex: Pending|Succeeded|Failed
        source_labels:
        - __meta_kubernetes_pod_phase
    
    ##############  Kubernetes 用户应用 Pod JVM 监控配置 ##############
    - job_name: kubernetes-pods-jvm
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_jvm_scrape
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_jvm_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: kubernetes_pod_name
      - action: drop
        regex: Pending|Succeeded|Failed
        source_labels:
        - __meta_kubernetes_pod_phase
    
    ##############  暂时没有理解这个 slow鬼 监控配置 ##############
    - job_name: kubernetes-pods-slow
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: kubernetes_pod_name
      - action: drop
        regex: Pending|Succeeded|Failed
        source_labels:
        - __meta_kubernetes_pod_phase
      scrape_interval: 5m
      scrape_timeout: 30s
    alerting:
      alertmanagers:
      - kubernetes_sd_configs:
          - role: pod
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace]
          regex: default
          action: keep
        - source_labels: [__meta_kubernetes_pod_label_app]
          regex: prometheus
          action: keep
        - source_labels: [__meta_kubernetes_pod_label_component]
          regex: alertmanager
          action: keep
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_probe]
          regex: .*
          action: keep
        - source_labels: [__meta_kubernetes_pod_container_port_number]
          regex: "9093"
          action: keep
  • ConfigMap形式
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-server
      namespace: default
      labels:
        app: prometheus
        app.kubernetes.io/managed-by: Helm
        chart: prometheus-14.6.0
        component: server
        heritage: Helm
        release: prometheus
    data:
      alerting_rules.yml: |
        {}
      alerts: |
        {}
      prometheus.yml: |
        global:
          evaluation_interval: 15s
          scrape_interval: 15s
          scrape_timeout: 10s
        rule_files:
        - /etc/config/recording_rules.yml
        - /etc/config/alerting_rules.yml
        - /etc/config/rules
        - /etc/config/alerts
        scrape_configs:
        - job_name: prometheus
          static_configs:
          - targets:
            - localhost:9090
        - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        ##############  Kubernetes apiserver 监控配置 ##############
          job_name: kubernetes-apiservers
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: default;kubernetes;https
            source_labels:
            - __meta_kubernetes_namespace
            - __meta_kubernetes_service_name
            - __meta_kubernetes_endpoint_port_name
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
        - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        ##############  Kubernetes node 性能指标监控配置 ##############
          job_name: kubernetes-nodes
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - replacement: kubernetes.default.svc:443
            target_label: __address__
          - regex: (.+)
            replacement: /api/v1/nodes/$1/proxy/metrics
            source_labels:
            - __meta_kubernetes_node_name
            target_label: __metrics_path__
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
        - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        ##############  Kubernetes 节点Pod性能指标监控配置 ##############
          job_name: kubernetes-nodes-cadvisor
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - replacement: kubernetes.default.svc:443
            target_label: __address__
          - regex: (.+)
            replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
            source_labels:
            - __meta_kubernetes_node_name
            target_label: __metrics_path__
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
    
        ##############  Kubernetes service endpoints 监控配置 ##############
        - job_name: kubernetes-service-endpoints
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_service_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_node_name
            target_label: kubernetes_node
    
        ##############  Kubernetes service endpoints RabbitMQ 监控配置 ##############
        - job_name: kubernetes-service-rabbitmq
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: rabbitmq
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scrape
          - action: keep
            regex: (15692|9419)
            source_labels:
            - __meta_kubernetes_pod_container_port_number
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_service_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_node_name
            target_label: kubernetes_node
    
        ##############  暂时没有理解这个 slow鬼 监控配置 ##############
        - job_name: kubernetes-service-endpoints-slow
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_service_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_node_name
            target_label: kubernetes_node
          scrape_interval: 5m
          scrape_timeout: 30s
        - honor_labels: true
    
        ##############  Prometheus pushgateway 监控配置 ##############
          job_name: prometheus-pushgateway
          kubernetes_sd_configs:
          - role: service
          relabel_configs:
          - action: keep
            regex: pushgateway
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_probe
    
        ##############  Kubernetes service http_2xx 监控配置 ##############
        - job_name: kubernetes-services
          kubernetes_sd_configs:
          - role: service
          metrics_path: /probe
          params:
            module:
            - http_2xx
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_probe
          - source_labels:
            - __address__
            target_label: __param_target
          - replacement: blackbox
            target_label: __address__
          - source_labels:
            - __param_target
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
    
        ##############  Kubernetes 用户应用 Pod 自定义监控配置 ##############
        - job_name: kubernetes-pods
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
          - action: drop
            regex: Pending|Succeeded|Failed
            source_labels:
            - __meta_kubernetes_pod_phase
    
        ##############  Kubernetes 用户应用 Pod JVM 监控配置 ##############
        - job_name: kubernetes-pods-jvm
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_jvm_scrape
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_jvm_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels: 
            - __meta_kubernetes_pod_container_name
            target_label: kubernetes_container_name
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
          - action: drop
            regex: Pending|Succeeded|Failed
            source_labels:
            - __meta_kubernetes_pod_phase
    
        ##############  暂时没有理解这个 slow鬼 监控配置 ##############
        - job_name: kubernetes-pods-slow
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
          - action: drop
            regex: Pending|Succeeded|Failed
            source_labels:
            - __meta_kubernetes_pod_phase
          scrape_interval: 5m
          scrape_timeout: 30s
        alerting:
          alertmanagers:
          - kubernetes_sd_configs:
              - role: pod
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            relabel_configs:
            - source_labels: [__meta_kubernetes_namespace]
              regex: default
              action: keep
            - source_labels: [__meta_kubernetes_pod_label_app]
              regex: prometheus
              action: keep
            - source_labels: [__meta_kubernetes_pod_label_component]
              regex: alertmanager
              action: keep
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_probe]
              regex: .*
              action: keep
            - source_labels: [__meta_kubernetes_pod_container_port_number]
              regex: "9093"
              action: keep
      recording_rules.yml: |
        {}
      rules: |
        {}

     

 

posted @ 2021-09-02 15:09  MacoPlus  阅读(692)  评论(0编辑  收藏  举报