k8s prometheus api
rancher api:
https://10.10.10.90/k8s/clusters/c-x985z/api/v1/namespaces/cattle-prometheus/services/expose-kubernetes-metrics:8080/proxy/
https://10.11.30.119/k8s/clusters/c-k8598/api/v1/namespaces/ilsuat/pods/http:ils-aiservices-6bfbcbff85-s64hx:5000/proxy/ai/
prom conf案例一:(获取所有node的cadvisor)
global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: rule_files: scrape_configs: - job_name: alexcadvisor static_configs: - targets: ["10.10.10.68:10250","10.10.10.95:10250","10.10.10.96:10250","10.10.10.211:10250","10.10.10.212:10250","10.10.10.217:10250","10.10.10.216:10250","10.10.10.18:10250","10.10.10.19:10250","10.10.10.20:10250"] scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics/cadvisor scheme: https bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token tls_config: insecure_skip_verify: true
案例2:获取所有node的node_exporter
- job_name: alexnodeexporter scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token scheme: http kubernetes_sd_configs: - role: endpoints api_server: 'https://10.10.10.68:6443' namespaces: names: - cattle-prometheus tls_config: insecure_skip_verify: true bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token relabel_configs: - source_labels: [__meta_kubernetes_service_label_app] separator: ; regex: exporter-node replacement: $1 action: keep - source_labels: [__meta_kubernetes_service_label_chart] separator: ; regex: exporter-node-0.0.1 replacement: $1 action: keep - source_labels: [__meta_kubernetes_service_label_monitoring_coreos_com] separator: ; regex: "true" replacement: $1 action: keep - source_labels: [__meta_kubernetes_service_label_release] separator: ; regex: cluster-monitoring replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: metrics replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: metrics action: replace - source_labels: [__meta_kubernetes_pod_host_ip] separator: ; regex: (.+) target_label: host_ip replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_node_name] separator: ; regex: (.+) target_label: node replacement: $1 action: replace
curl -k https://10.10.10.68:6443/api/v1/nodes -H "Authorization: Bearer token-mj6c7:8m2zlxp5qhr25hh8dtzlrl8cn472wws94m9ntbkggqt8x9sfg7q4w4"
curl -k https://10.10.10.68:6443/api/v1/nodes --cacert kube-ca.pem --cert alex.pem --key alexkey.pem
curl -k https://10.10.10.68:6443/api/v1/nodes --cacert kube-ca.pem --cert kube-node.pem --key kube-node-key.pem
curl -k https://10.10.10.68:6443/api/v1/nodes --cacert kube-ca.pem --cert kube-controller-manager.pem --key kube-controller-manager-key.pem
curl -k https://10.10.10.68:6443/api/v1/nodes --cacert kube-ca.pem --cert kube-scheduler.pem --key kube-scheduler-key.pem
curl -k https://10.10.10.96:10250/metrics --cacert kube-ca.pem --cert kube-service-account-token.pem --key kube-service-account-token-key.pem
curl -k https://10.10.10.68:10250/metrics/cadvisor --cacert kube-ca.pem -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJjYXR0bGUtcHJvbWV0aGV1cyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJjbHVzdGVyLW1vbml0b3JpbmctdG9rZW4tbjlmbjIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiY2x1c3Rlci1tb25pdG9yaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiOTcxNjY1NmQtNWRmMS0xMWVhLTk1YzktMDAxNTVkMGEzNjAxIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50OmNhdHRsZS1wcm9tZXRoZXVzOmNsdXN0ZXItbW9uaXRvcmluZyJ9.KcO1v8qfhCeXRT3zSG2lckl3bqzofFFhM2pZEum02u3PS7m4anQw6ldP806ncme21JH0Hq0SrjscFxvrkDaKnOPR3eX2dqoQxyXyN-t7jJ9B1YHAAOanLVYfUiXUm7EJMekgsAVac9aueAIwzfFtkERK-kvHYsHvSC0nOIBUxSjZs4YfhZbf3ys-tyZB5sspM5_P_P54NQQJD2B-sn-3VJuFWTE2Wy_pa3D6kdjywG_9_T5yBHFXAQ2dneLOcqfUUoox2q-4gRWslv0Dziy1DwwAQiZA6uMZYkIKN_ngueynoxKg4d2OIVYGiHqzzBFllAKysvKIZ7uVPs4RLkqPqA"
container_memory_cache{namespace="local-ils",image!~".+pause.+",container_name=~"ils-system"}
按照内存大小降序显示:
sort_desc(container_memory_usage_bytes{namespace="local-boss",image!~".+pause.+",namespace=~".+boss.*" ,image=~".+",container!~"filebeat"})
pod用了多少内存:
container_memory_working_set_bytes{container!='POD',namespace=~'.*local.*|.*rc.*'}
pod网络流量:
sum (rate (container_network_receive_bytes_total{image!="",name=~"^k8s_.*",node=~"^$Node$"}[5m])) by (pod_name) 接收
- sum (rate (container_network_transmit_bytes_total{image!="",name=~"^k8s_.*",node=~"^$Node$"}[5m])) by (pod_name) 发送
多少pod在跑:
sum(kube_pod_status_phase{namespace=~".*", phase="Running"})
申通一年截止202112130:
token-4xncr:lkpddtfmmkpskqm52mpwn94cg68vxmrtfd7s8sllr84d5z87c5q97n
grafana的配置:
最终效果图:
2个variables:
内存使用情况:ram
cpu实时负载:
grafana cpu使用率:
100 - (avg by (instance) (rate(node_cpu_seconds_total{ mode="idle"}[5m])) * 100)
pods cpu使用情况:
sum (rate (container_cpu_usage_seconds_total{image!="",name=~"^k8s_.*",namespace=~"$stNs"}[5m])) by (pod_name)
alertmanager.yml
global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1m receiver: 'web.hook' receivers: - name: 'web.hook' webhook_configs: - url: 'http://127.0.0.1:5001/'
prometheus.yml
global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: - localhost:9093 rule_files: - alexrules.yml scrape_configs: - job_name: alexcadvisor static_configs: - targets: ["10.10.10.68:10250","10.10.10.95:10250","10.10.10.96:10250","10.10.10.211:10250","10.10.10.212:10250","10.10.10.217:10250","10.10.10.216:10250","10.10.10.18:10250","10.10.10.19:10250","10.10.10.20:10250"] scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics/cadvisor scheme: https bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token tls_config: insecure_skip_verify: true - job_name: alexnodeexporter scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token scheme: http kubernetes_sd_configs: - role: endpoints api_server: 'https://10.10.10.68:6443' namespaces: names: - cattle-prometheus tls_config: insecure_skip_verify: true bearer_token_file: /root/prometheus-2.22.0.linux-amd64/token relabel_configs: - source_labels: [__meta_kubernetes_service_label_app] separator: ; regex: exporter-node replacement: $1 action: keep - source_labels: [__meta_kubernetes_service_label_chart] separator: ; regex: exporter-node-0.0.1 replacement: $1 action: keep - source_labels: [__meta_kubernetes_service_label_monitoring_coreos_com] separator: ; regex: "true" replacement: $1 action: keep - source_labels: [__meta_kubernetes_service_label_release] separator: ; regex: cluster-monitoring replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: metrics replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: metrics action: replace - source_labels: [__meta_kubernetes_pod_host_ip] separator: ; regex: (.+) target_label: host_ip replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_node_name] separator: ; regex: (.+) target_label: node replacement: $1 action: replace
alexrules.yml
groups: - name: alexexample rules: - alert: node cpu high than 12% expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{ mode="idle"}[5m])) * 100) > 12 for: 10s labels: severity: critical annotations: description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过12%" value: "{{ $value }}%" threshold: "80%"