k8s中使用prometheus
使用方法
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
# 先部署 kube-prometheus 的 CRD 和 创建 monitoring namespace
kubectl apply -f manifests/setup/
#在这一步直接 apply 可能会报错,具体错误如下:
#The CustomResourceDefinition “prometheuses.monitoring.coreos.com” is invalid: metadata.annotations: Too long: must have at most 262144 bytes
#这时候可以先删除,再通过 create 创建
kubectl delete -f manifests/setup/
kubectl create -f manifests/setup/
#最后部署 prometheus 和 grafana
kubectl apply -f manifests/
工作原理
为了简化 Prometheus 监控在 Kubernetes 中的管理,Prometheus Operator(一种 Kubernetes 的 Operator)提供了 ServiceMonitor 这个自定义资源。ServiceMonitor 允许在 Kubernetes 中定义 Prometheus 应该如何自动发现和监控服务(Service)的指标。它为 Kubernetes 提供了更加智能和自动化的监控目标配置方式。
ServiceMonitor用于管理Service,再通过service去管理pod, 即ServiceMonitor --> Service --> Pod 。
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kube-ovn-controller
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 15s
port: metrics
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: kube-ovn-controller
ServiceMonitor字段说明:
- interval:监控数据抓取的时间间隔
- port:监控的Service的暴露的监控指标采集端口
- namespaceSelector:监控目标Service所在的命名空间
- selector:监控目标Service的标签
查看上报数据
打开prometheus地址(可通过nodePort暴露地址),点击status --> target即可查看所有pod上报的指标
踩坑
1. Pod迁入自定义namespace后无法采集指标
背景:
Pod 原先位于kube-system namespace下,指标上报正常,迁入自定义namespace就看不到指标上报了,查看prometheus-k8s pod日志,出现一下报错,判断是在自定义namespace中缺少权限:
解决办法:
在自定义namespace中创建role和roleBinding,和ServiceAccount monitoring/prometheus-k8s绑定。
配置可参考:
- https://github.com/prometheus-operator/kube-prometheus/blob/b9d1ff5a8848bcedb465d60dd61775debb881534/manifests/prometheus-roleBindingSpecificNamespaces.yaml
- https://github.com/prometheus-operator/kube-prometheus/blob/b9d1ff5a8848bcedb465d60dd61775debb881534/manifests/prometheus-roleSpecificNamespaces.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.47.2
name: prometheus-k8s
namespace: dwc #这里改成自定义命名空间
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.47.2
name: prometheus-k8s
namespace: dwc #这里改成自定义命名空间
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring #注意这里是monitoring