运维别卷系列 - 云原生监控平台 之 07.prometheus kubernetes 监控实践
kubernetes_sd_config
首先就要说一下 Prometheus 自带的 kubernetes 服务发现
kubernetes_sd_config
,允许从 Kubernetes 的 REST API 中检索抓取目标,并始终与集群状态保持同步。
- 可以配置以下
role
类型之一来发现目标:
node
地址默认为 Kubelet 的 HTTP 端口。目标地址默认为 Kubernetes 节点对象的第一个现有地址,地址类型顺序为
NodeInternalIP
、NodeExternalIP
、NodeLegacyHostIP
和NodeHostName
。
- 可用的 metalabel (节点的
instance
标签将设置为从 API 服务器检索到的节点名称。)__meta_kubernetes_node_name
:k8s node 名称。__meta_kubernetes_node_provider_id
:k8s node 的云提供商名称。__meta_kubernetes_node_label_
:k8s node 中的每个节点 label,任何不支持的字符都转换为下划线。__meta_kubernetes_node_labelpresent_
:true
表示 k8s node 中的每个 label,任何不支持的字符都转换为下划线。__meta_kubernetes_node_annotation_
:来自 k8s node 的每个 annotation 注释。__meta_kubernetes_node_annotationpresent_
:true
表示 k8s node 中的每个 annotation 注释。__meta_kubernetes_node_address_
:每个 k8s node 的第一个地址(如果存在)。
采集 kubelet
- job_name: 'kubernetes-kubelet'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
service
为每个服务的每个服务端口发现一个目标。这通常对于服务的黑匣子监视很有用。该地址将设置为服务的 Kubernetes DNS 名称和相应的服务端口。
- 可用的 metalabel
__meta_kubernetes_namespace
: svc 对象的命名空间。__meta_kubernetes_service_annotation_<annotationname>
:来自 svc 对象的每个注释。__meta_kubernetes_service_annotationpresent_<annotationname>
:true
表示 svc 对象的每个注释。__meta_kubernetes_service_cluster_ip
: svc 的群集 IP 地址。(不适用于 ExternalName 类型的服务)__meta_kubernetes_service_loadbalancer_ip
:负载均衡器的 IP 地址。(适用于 LoadBalancer 类型的服务)__meta_kubernetes_service_external_name
: svc 的 DNS 名称。(适用于 ExternalName 类型的服务)__meta_kubernetes_service_label_<labelname>
: svc 对象中的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_service_labelpresent_<labelname>
:true
表示 svc 对象的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_service_name
: svc 对象的名称。__meta_kubernetes_service_port_name
:目标的 svc 端口名称。__meta_kubernetes_service_port_number
:目标的 svc 端口号。__meta_kubernetes_service_port_protocol
:目标的 svc 端口协议。__meta_kubernetes_service_type
:svc 的类型。
pod
所有 Pod 并将其容器公开为目标。对于容器的每个声明端口,都会生成一个目标。如果容器没有指定的端口,则会为每个容器创建一个无端口目标,以便通过重新标记手动添加端口。
- 可用的 metalabel
__meta_kubernetes_namespace
:pod 对象的命名空间。__meta_kubernetes_pod_name
:pod 对象的名称。__meta_kubernetes_pod_ip
:pod 对象的 pod IP。__meta_kubernetes_pod_label_<labelname>
:pod 对象中的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_pod_labelpresent_<labelname>
:true
表示 Pod 对象中的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_pod_annotation_<annotationname>
:来自 pod 对象的每个注解。__meta_kubernetes_pod_annotationpresent_<annotationname>
:true
表示 pod 对象中的每个注释。__meta_kubernetes_pod_container_init
: true 如果容器是 InitContainer__meta_kubernetes_pod_container_name
:目标地址指向的容器的名称。__meta_kubernetes_pod_container_id
:目标地址指向的容器的 ID。ID 的格式为<type>://<container_id>
。__meta_kubernetes_pod_container_image
:容器正在使用的映像。__meta_kubernetes_pod_container_port_name
:容器端口的名称。__meta_kubernetes_pod_container_port_number
:集装箱端口号。__meta_kubernetes_pod_container_port_protocol
:集装箱端口的协议。__meta_kubernetes_pod_ready
:设置为 true 或 false 表示容器的就绪状态。__meta_kubernetes_pod_phase
:在生命周期中设置为 Pending、Running、Succeeded、Failed 或 Unknown。__meta_kubernetes_pod_node_name
:Pod 调度到的节点的名称。__meta_kubernetes_pod_host_ip
:pod 对象的当前主机 IP。__meta_kubernetes_pod_uid
:pod 对象的 UID。__meta_kubernetes_pod_controller_kind
:pod 控制器的对象类型。__meta_kubernetes_pod_controller_name
:容器控制器的名称。
endpoints
对于每个 endpoint 地址,每个 endpoint 都会发现一个目标。如果 endpoint 由 Pod 支持,则 Pod 的所有其他容器端口(未绑定到 endpoint 端口)也会被发现为目标。
- 可用的 metalabel
__meta_kubernetes_namespace
:endpoint 对象的命名空间。__meta_kubernetes_endpoints_name
:endpoint 对象的名称。__meta_kubernetes_endpoints_label_<labelname>
:endpoint 对象中的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_endpoints_labelpresent_<labelname>
:true
表示 endpoint 对象中的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_endpoints_annotation_<annotationname>
:endpoint 对象中的每个注释。__meta_kubernetes_endpoints_annotationpresent_<annotationname>
:true
表示 endpoint 对象中的每个注释。- 对于直接从 endpoint 节点列表发现的所有目标(未从底层 Pod 额外推断的目标),将附加以下标签:
__meta_kubernetes_endpoint_hostname
:endpoint 的主机名。__meta_kubernetes_endpoint_node_name
:endpoint 的节点的名称。__meta_kubernetes_endpoint_ready
:设置为 true 或 false 表示 endpoint 的就绪状态。__meta_kubernetes_endpoint_port_name
:endpoint 端口的名称。__meta_kubernetes_endpoint_port_protocol
:endpoint 端口的协议。__meta_kubernetes_endpoint_address_target_kind
:endpoint 地址目标的类型。__meta_kubernetes_endpoint_address_target_name
:endpoint 地址目标的名称。
- 对于直接从 endpoint 节点列表发现的所有目标(未从底层 Pod 额外推断的目标),将附加以下标签:
- 如果 endpoint 属于 svc ,则会附加
role: service
发现的所有标签。 - 对于 Pod 支持的所有目标,都会附加
role: pod
发现的所有标签。
ingress
每个入口的每个路径发现一个目标。这通常对于入口的黑匣子监视很有用。地址将设置为入口规范中指定的主机。
- 可用的 metalabel
__meta_kubernetes_namespace
:ingress 对象的命名空间。__meta_kubernetes_ingress_name
:ingress 对象的名称。__meta_kubernetes_ingress_label_<labelname>
:ingress 对象中的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_ingress_labelpresent_<labelname>
:true
表示 Ingress 对象中的每个标签,任何不支持的字符都转换为下划线。__meta_kubernetes_ingress_annotation_<annotationname>
:来自 ingress 对象的每个注释。__meta_kubernetes_ingress_annotationpresent_<annotationname>
:true
表示来自 Ingress 对象的每个注释。__meta_kubernetes_ingress_class_name
:入口规范中的类名(如果存在)。__meta_kubernetes_ingress_scheme
:入口的协议方案,https 如果设置了 TLS 配置。默认为 http。__meta_kubernetes_ingress_path
:入口规范中的路径。默认值为 /。
kube-state-metrics
- kube-state-metrics (KSM) 是一个简单的服务,用于监听 Kubernetes API 服务器并生成有关对象状态的指标。它不关注单个 Kubernetes 组件的运行状况,而是关注内部各种对象(例如部署、节点和 Pod)的运行状况。
- kube-state-metrics 有两个端口:
8080
:Kubernetes 集群的当前状态8081
:kube-state-metrics 自身的指标- 如果只需要采集 8080 端口的,可以利用上面的服务发现的方式,通过 pod 注释来采集,如果两个都需要采集,可以直接在 Prometheus 的配置文件里面写两个静态配置来同时采集
- kube-state-metrics 和 metrics-server 是不同的作用,metrics-server 不提供指标,是提供给 k8s 做弹性伸缩使用的,可以用
kubectl top
命令查看集群资源使用情况的
- 不同版本的 kube-state-metrics 用在不同版本的 k8s 上,目前最新的是以下表格,类推一下,基本上也能找到自己 k8s 集群对应的 kube-state-metrics 版本
kube-state-metrics | Kubernetes client-go Version |
---|---|
v2.8.2 | v1.26 |
v2.9.2 | v1.26 |
v2.10.1 | v1.27 |
v2.11.0 | v1.28 |
v2.12.0 | v1.29 |
main | v1.30 |
创建 sa
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
labels:
app: kube-state-metrics
name: kube-state-metrics
namespace: kube-system
创建 clusterrole
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app: kube-state-metrics
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- serviceaccounts
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
- ingressclasses
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- list
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterrolebindings
- clusterroles
- rolebindings
- roles
verbs:
- list
- watch
创建 clusterrolebinding
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app: kube-state-metrics
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
创建 svc
---
apiVersion: v1
kind: Service
metadata:
labels:
app: kube-state-metrics
name: kube-state-metrics
namespace: kube-system
spec:
clusterIP: None
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
- name: telemetry
port: 8081
targetPort: telemetry
selector:
app: kube-state-metrics
创建 deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: kube-state-metrics
name: kube-state-metrics
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-state-metrics
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
prometheus.io/type: kube-state-metrics
labels:
app: kube-state-metrics
spec:
automountServiceAccountToken: true
containers:
- env:
- name: TZ
value: Asia/Shanghai
image: docker.io/bitnami/kube-state-metrics:2.9.2
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 100m
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccountName: kube-state-metrics
cAdvisor
cAdvisor 是谷歌开源的工具,已经集成在 kubelet 里面了,只要是采集容器的资源使用情况的,和 kube-state-metrics 是有区别的,这两个组件也是搭配使用的
- 配合服务发现在 Prometheus 配置文件里面采集 cAdvisor 的指标
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
监控 etcd
我是二进制部署的 k8s,etcd 不在 k8s 集群中,后面使用静态配置的方式来采集
创建 secret
通过 secret 的方式挂载到 prometheus 的容器内,下面先通过证书来创建 secret
certs_dir
是证书存放的路径monitor
是我的 prometheus 所在的 namespace- 证书文件的名字,大家需要修改成自己的证书文件
certs_dir=/etc/kubernetes/ssl/
k create secret generic etcd-ssl -n monitor --from-file=ca=${certs_dir}/ca.pem --from-file=cert=${certs_dir}/etcd.pem --from-file=key=${certs_dir}/etcd-key.pem
prometheus 挂载 secret
我这边图方便,就直接把我自己的 prometheus 的 deployment 全放出来了
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: prometheus
name: prometheus
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccount: prometheus
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- prometheus
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- prometheus
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 0
containers:
- name: prometheus
command:
- "/bin/prometheus"
args:
- "--config.file=/etc/prometheus/config/prometheus.yml"
- "--storage.tsdb.path=/etc/prometheus/data"
- "--storage.tsdb.retention.time=30d"
- "--web.enable-lifecycle"
image: prom/prometheus:v2.45.5
env:
- name: LANG
value: en_US.UTF-8
- name: TZ
value: Asia/Shanghai
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9090
name: http
livenessProbe:
failureThreshold: 60
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 1
readinessProbe:
failureThreshold: 60
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 1
volumeMounts:
- name: prometheus-home
mountPath: /etc/prometheus
# 主要增加这一段和下面 volumes
- name: etcd-ssl
mountPath: /etc/prometheus/etcd-ssl
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 2048Mi
cpu: 1000m
volumes:
- hostPath:
path: /data/k8s-data/prometheus
type: DirectoryOrCreate
name: prometheus-home
# 增加这一段,将 secret 挂载到指定的目录
- name: etcd-ssl
secret:
secretName: etcd-ssl
prometheus 增加 etcd 采集
- job_name: "kubernetes-etcd"
scheme: https
tls_config:
ca_file: /etc/prometheus/etcd-ssl/ca
cert_file: /etc/prometheus/etcd-ssl/cert
key_file: /etc/prometheus/etcd-ssl/key
insecure_skip_verify: false
metrics_path: '/metrics'
static_configs:
- targets:
- "192.168.11.167:2379"
- "192.168.11.168:2379"
- "192.168.11.169:2379"
etcd 告警配置
下面的告警配置是取自 github
# these rules synced manually from https://github.com/etcd-io/etcd/blob/master/Documentation/etcd-mixin/mixin.libsonnet
groups:
- name: etcd
rules:
- alert: etcdInsufficientMembers
annotations:
message: 'etcd cluster "{{ $labels.job }}": insufficient members ({{ $value
}}).'
expr: |
sum(up{job=~".*etcd.*"} == bool 1) by (job) < ((count(up{job=~".*etcd.*"}) by (job) + 1) / 2)
for: 3m
labels:
severity: critical
- alert: etcdNoLeader
annotations:
message: 'etcd cluster "{{ $labels.job }}": member {{ $labels.instance }} has
no leader.'
expr: |
etcd_server_has_leader{job=~".*etcd.*"} == 0
for: 1m
labels:
severity: critical
- alert: etcdHighNumberOfLeaderChanges
annotations:
message: 'etcd cluster "{{ $labels.job }}": instance {{ $labels.instance }}
has seen {{ $value }} leader changes within the last hour.'
expr: |
rate(etcd_server_leader_changes_seen_total{job=~".*etcd.*"}[15m]) > 3
for: 15m
labels:
severity: warning
- alert: etcdHighNumberOfFailedGRPCRequests
annotations:
message: 'etcd cluster "{{ $labels.job }}": {{ $value }}% of requests for {{
$labels.grpc_method }} failed on etcd instance {{ $labels.instance }}.'
expr: |
100 * sum(rate(grpc_server_handled_total{job=~".*etcd.*", grpc_code!="OK"}[5m])) BY (job, instance, grpc_service, grpc_method)
/
sum(rate(grpc_server_handled_total{job=~".*etcd.*"}[5m])) BY (job, instance, grpc_service, grpc_method)
> 1
for: 10m
labels:
severity: warning
- alert: etcdHighNumberOfFailedGRPCRequests
annotations:
message: 'etcd cluster "{{ $labels.job }}": {{ $value }}% of requests for {{
$labels.grpc_method }} failed on etcd instance {{ $labels.instance }}.'
expr: |
100 * sum(rate(grpc_server_handled_total{job=~".*etcd.*", grpc_code!="OK"}[5m])) BY (job, instance, grpc_service, grpc_method)
/
sum(rate(grpc_server_handled_total{job=~".*etcd.*"}[5m])) BY (job, instance, grpc_service, grpc_method)
> 5
for: 5m
labels:
severity: critical
- alert: etcdGRPCRequestsSlow
annotations:
message: 'etcd cluster "{{ $labels.job }}": gRPC requests to {{ $labels.grpc_method
}} are taking {{ $value }}s on etcd instance {{ $labels.instance }}.'
expr: |
histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~".*etcd.*", grpc_type="unary"}[5m])) by (job, instance, grpc_service, grpc_method, le))
> 0.15
for: 10m
labels:
severity: critical
- alert: etcdMemberCommunicationSlow
annotations:
message: 'etcd cluster "{{ $labels.job }}": member communication with {{ $labels.To
}} is taking {{ $value }}s on etcd instance {{ $labels.instance }}.'
expr: |
histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{job=~".*etcd.*"}[5m]))
> 0.15
for: 10m
labels:
severity: warning
- alert: etcdHighNumberOfFailedProposals
annotations:
message: 'etcd cluster "{{ $labels.job }}": {{ $value }} proposal failures within
the last hour on etcd instance {{ $labels.instance }}.'
expr: |
rate(etcd_server_proposals_failed_total{job=~".*etcd.*"}[15m]) > 5
for: 15m
labels:
severity: warning
- alert: etcdHighFsyncDurations
annotations:
message: 'etcd cluster "{{ $labels.job }}": 99th percentile fync durations are
{{ $value }}s on etcd instance {{ $labels.instance }}.'
expr: |
histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=~".*etcd.*"}[5m]))
> 0.5
for: 10m
labels:
severity: warning
- alert: etcdHighCommitDurations
annotations:
message: 'etcd cluster "{{ $labels.job }}": 99th percentile commit durations
{{ $value }}s on etcd instance {{ $labels.instance }}.'
expr: |
histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{job=~".*etcd.*"}[5m]))
> 0.25
for: 10m
labels:
severity: warning
- alert: etcdHighNumberOfFailedHTTPRequests
annotations:
message: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd
instance {{ $labels.instance }}'
expr: |
sum(rate(etcd_http_failed_total{job=~".*etcd.*", code!="404"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~".*etcd.*"}[5m]))
BY (method) > 0.01
for: 10m
labels:
severity: warning
- alert: etcdHighNumberOfFailedHTTPRequests
annotations:
message: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd
instance {{ $labels.instance }}.'
expr: |
sum(rate(etcd_http_failed_total{job=~".*etcd.*", code!="404"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~".*etcd.*"}[5m]))
BY (method) > 0.05
for: 10m
labels:
severity: critical
- alert: etcdHTTPRequestsSlow
annotations:
message: etcd instance {{ $labels.instance }} HTTP requests to {{ $labels.method
}} are slow.
expr: |
histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]))
> 0.15
for: 10m
labels:
severity: warning