k8s prometheus api

rancher api:

https://10.10.10.90/k8s/clusters/c-x985z/api/v1/namespaces/cattle-prometheus/services/expose-kubernetes-metrics:8080/proxy/

https://10.11.30.119/k8s/clusters/c-k8598/api/v1/namespaces/ilsuat/pods/http:ils-aiservices-6bfbcbff85-s64hx:5000/proxy/ai/

 

prom conf案例一:(获取所有node的cadvisor)

global:
  scrape_interval:     15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
  - static_configs:
    - targets:

rule_files:

scrape_configs:
- job_name: alexcadvisor
  static_configs:
    - targets: ["10.10.10.68:10250","10.10.10.95:10250","10.10.10.96:10250","10.10.10.211:10250","10.10.10.212:10250","10.10.10.217:10250","10.10.10.216:10250","10.10.10.18:10250","10.10.10.19:10250","10.10.10.20:10250"]
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics/cadvisor
  scheme: https
  bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token
  tls_config:
    insecure_skip_verify: true

 

 

案例2:获取所有node的node_exporter

- job_name: alexnodeexporter 
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /metrics
  bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    api_server: 'https://10.10.10.68:6443'
    namespaces:
      names:
      - cattle-prometheus
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: exporter-node
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_chart]
    separator: ;
    regex: exporter-node-0.0.1
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_monitoring_coreos_com]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_release]
    separator: ;
    regex: cluster-monitoring
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: metrics
    action: replace
  - source_labels: [__meta_kubernetes_pod_host_ip]
    separator: ;
    regex: (.+)
    target_label: host_ip
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_node_name]
    separator: ;
    regex: (.+)
    target_label: node
    replacement: $1
    action: replace

 

 

curl -k https://10.10.10.68:6443/api/v1/nodes -H "Authorization: Bearer token-mj6c7:8m2zlxp5qhr25hh8dtzlrl8cn472wws94m9ntbkggqt8x9sfg7q4w4"

curl -k https://10.10.10.68:6443/api/v1/nodes  --cacert kube-ca.pem --cert alex.pem --key alexkey.pem

curl -k https://10.10.10.68:6443/api/v1/nodes  --cacert kube-ca.pem --cert kube-node.pem --key kube-node-key.pem

curl -k https://10.10.10.68:6443/api/v1/nodes  --cacert kube-ca.pem --cert kube-controller-manager.pem  --key kube-controller-manager-key.pem 

curl -k https://10.10.10.68:6443/api/v1/nodes  --cacert kube-ca.pem --cert  kube-scheduler.pem  --key  kube-scheduler-key.pem

curl -k https://10.10.10.96:10250/metrics  --cacert kube-ca.pem --cert kube-service-account-token.pem --key kube-service-account-token-key.pem 

 

curl -k https://10.10.10.68:10250/metrics/cadvisor --cacert kube-ca.pem -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJjYXR0bGUtcHJvbWV0aGV1cyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJjbHVzdGVyLW1vbml0b3JpbmctdG9rZW4tbjlmbjIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiY2x1c3Rlci1tb25pdG9yaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiOTcxNjY1NmQtNWRmMS0xMWVhLTk1YzktMDAxNTVkMGEzNjAxIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50OmNhdHRsZS1wcm9tZXRoZXVzOmNsdXN0ZXItbW9uaXRvcmluZyJ9.KcO1v8qfhCeXRT3zSG2lckl3bqzofFFhM2pZEum02u3PS7m4anQw6ldP806ncme21JH0Hq0SrjscFxvrkDaKnOPR3eX2dqoQxyXyN-t7jJ9B1YHAAOanLVYfUiXUm7EJMekgsAVac9aueAIwzfFtkERK-kvHYsHvSC0nOIBUxSjZs4YfhZbf3ys-tyZB5sspM5_P_P54NQQJD2B-sn-3VJuFWTE2Wy_pa3D6kdjywG_9_T5yBHFXAQ2dneLOcqfUUoox2q-4gRWslv0Dziy1DwwAQiZA6uMZYkIKN_ngueynoxKg4d2OIVYGiHqzzBFllAKysvKIZ7uVPs4RLkqPqA"

 

 

container_memory_cache{namespace="local-ils",image!~".+pause.+",container_name=~"ils-system"}

 

按照内存大小降序显示:

sort_desc(container_memory_usage_bytes{namespace="local-boss",image!~".+pause.+",namespace=~".+boss.*" ,image=~".+",container!~"filebeat"})

 

pod用了多少内存:

container_memory_working_set_bytes{container!='POD',namespace=~'.*local.*|.*rc.*'}

 

pod网络流量:

sum (rate (container_network_receive_bytes_total{image!="",name=~"^k8s_.*",node=~"^$Node$"}[5m])) by (pod_name)    接收

- sum (rate (container_network_transmit_bytes_total{image!="",name=~"^k8s_.*",node=~"^$Node$"}[5m])) by (pod_name)    发送

 

多少pod在跑:

sum(kube_pod_status_phase{namespace=~".*", phase="Running"})

 

申通一年截止202112130:

token-4xncr:lkpddtfmmkpskqm52mpwn94cg68vxmrtfd7s8sllr84d5z87c5q97n

 

 

grafana的配置:

最终效果图:

 

 

 

2个variables:

 

 

 

 

 

内存使用情况:ram

 

 

cpu实时负载:

 

 

 

grafana cpu使用率:

100 - (avg by (instance) (rate(node_cpu_seconds_total{ mode="idle"}[5m])) * 100)

 

pods cpu使用情况:

sum (rate (container_cpu_usage_seconds_total{image!="",name=~"^k8s_.*",namespace=~"$stNs"}[5m])) by (pod_name)

 

 

alertmanager.yml

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'

 

prometheus.yml

global:
  scrape_interval:     15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

rule_files:
  - alexrules.yml 

scrape_configs:
- job_name: alexcadvisor
  static_configs:
    - targets: ["10.10.10.68:10250","10.10.10.95:10250","10.10.10.96:10250","10.10.10.211:10250","10.10.10.212:10250","10.10.10.217:10250","10.10.10.216:10250","10.10.10.18:10250","10.10.10.19:10250","10.10.10.20:10250"]
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics/cadvisor
  scheme: https
  bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token
  tls_config:
    insecure_skip_verify: true
- job_name: alexnodeexporter 
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /metrics
  bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    api_server: 'https://10.10.10.68:6443'
    namespaces:
      names:
      - cattle-prometheus
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: exporter-node
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_chart]
    separator: ;
    regex: exporter-node-0.0.1
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_monitoring_coreos_com]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_release]
    separator: ;
    regex: cluster-monitoring
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: metrics
    action: replace
  - source_labels: [__meta_kubernetes_pod_host_ip]
    separator: ;
    regex: (.+)
    target_label: host_ip
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_node_name]
    separator: ;
    regex: (.+)
    target_label: node
    replacement: $1
    action: replace

 

alexrules.yml

groups:
- name: alexexample
  rules:
  - alert: node cpu high than 12%
    expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{ mode="idle"}[5m])) * 100) > 12
    for: 10s
    labels:
      severity: critical
    annotations:
      description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过12%"
      value: "{{ $value }}%"
      threshold: "80%" 

 

posted @ 2020-12-24 09:25  alexhe  阅读(544)  评论(0编辑  收藏  举报