基于Kube-Prometheus/v0.13.0的K8S监控部署
Kube-Prometheus不同版本支持的Kubernetes版本信息如下:
kube-prometheus stack | Kubernetes 1.22 | Kubernetes 1.23 | Kubernetes 1.24 | Kubernetes 1.25 | Kubernetes 1.26 | Kubernetes 1.27 | Kubernetes 1.28 |
---|---|---|---|---|---|---|---|
release-0.10 |
✔ | ✔ | ✗ | ✗ | x | x | x |
release-0.11 |
✗ | ✔ | ✔ | ✗ | x | x | x |
release-0.12 |
✗ | ✗ | ✔ | ✔ | x | x | x |
release-0.13 |
✗ | ✗ | ✗ | x | ✔ | ✔ | ✔ |
main |
✗ | ✗ | ✗ | x | x | ✔ | ✔ |
作者部署的Kubernetes集群版本为Kubernetes 1.28.3,系统使用的CentOS7.9,部署工具使用的sealos,部署方法请查看官文文档:Sealos Official Documents ,话不多说,来看下Kube-Prometheus的部署:
一、下载安装包并解压
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz
tar -xf kube-prometheus-0.13.0.tar.gz
二、修改镜像地址
部分镜像地址国内访问不到,所以要修改镜像地址,国内镜像地址如失效,请重新找其它地址代替。
cd kube-prometheus-0.13.0/manifests
find ./ -type f |xargs sed -ri 's+registry.k8s.io/+k8s.mirror.nju.edu.cn/+g'
find ./ -type f |xargs sed -ri 's+quay.io/+k8s.mirror.nju.edu.cn/+g'
或者不改镜像地址,从这里下载相关镜像包,然后导入k8s集群:kube-prometheus-0.13.0镜像包
三、修改service端口类型
service默认使用端口类型为ClusterIP,为方便访问所以修改为NodePort类型,如果打算用Ingress访问,可以不修改。
(1)prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.46.0
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort #新增
ports:
- name: web
port: 9090
targetPort: web
nodePort: 32501 #新增
- name: reloader-web
port: 8080
targetPort: reloader-web
selector:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
sessionAffinity: ClientIP
(2)grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 9.5.3
name: grafana
namespace: monitoring
spec:
type: NodePort #新增
ports:
- name: http
port: 3000
targetPort: http
nodePort: 32500 #新增
selector:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
(3) alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.26.0
name: alertmanager-main
namespace: monitoring
spec:
type: NodePort #新增
ports:
- name: web
port: 9093
targetPort: web
nodePort: 32503 #新增
- name: reloader-web
port: 8080
targetPort: reloader-web
selector:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
sessionAffinity: ClientIP
四、安装
kubectl apply --server-side -f manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply -f manifests/
五、访问页面
#prometheus
http://IP:32501
#alertmanager
http://IP:32503
#grafana
http://IP:32500
六、数据持久化
6.1、prometheus 数据持久化
这里使用NFS进行数据持久化,需要提前安装nfs-csi-provisioner,并创建好存储类。
(1)安装 nfs-csi-provisioner
#本地安装
git clone https://github.com/kubernetes-csi/csi-driver-nfs.git
cd csi-driver-nfs
./deploy/install-driver.sh v4.7.0 local
#查看pod状态
kubectl -n kube-system get pod -o wide -l app=csi-nfs-controller
kubectl -n kube-system get pod -o wide -l app=csi-nfs-node
安装过程中需要拉取镜像,可能拉取不到,请自己修改镜像地址,主要涉及以下几个文件:
csi-driver-nfs/deploy目录下这两个文件:
csi-nfs-controller.yaml
csi-nfs-node.yaml
也可以从以下地址下载压缩包,然后导入至k8s集群中:
(2)创建存储类
vim prometheus-data-db-sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: prometheus-data-db
provisioner: nfs.csi.k8s.io
parameters:
server: 192.168.3.119 #nfs服务地址
share: /app/nfsdata/prometheus/ #nfs目录,最好提前创建好
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.1
kubectl apply -f prometheus-data-db-sc.yaml
(3) 修改 prometheus-prometheus.yaml 文件
#在文件末尾添加以下内容:
retention: 30d #数据保存天数
storage: #存储配置
volumeClaimTemplate:
spec:
storageClassName: prometheus-data-db
resources:
requests:
storage: 50Gi
(4)应用 prometheus-prometheus.yaml 文件
kubectl apply -f prometheus-prometheus.yaml
6.2、grafana 数据持久化
(1)修改 grafana-deployment.yaml 文件
#找到以下位置
volumes:
#注释以下两行
#- emptyDir: {}
# name: grafana-storage
#添加以下内容,这个直接用nfs卷,如需使用pvc方式,请先创建存储类
- name: grafana-storage
nfs:
server: 192.168.3.119
path: /app/nfsdata/grafana
#PVC方式
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
创建存储类SC和存储类声明PVC过程
vim grafana-pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: grafana-pvc
namespace: monitoring #指定namespace为monitoring
spec:
storageClassName: grafana-sc #指定StorageClass,没有可以不填写这一行。自己手动创建pv
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
vim grafana-sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: grafana-sc
provisioner: nfs.csi.k8s.io
parameters:
server: 192.168.3.119 #nfs服务地址
share: /app/nfsdata/grafana/ #nfs目录,最好提前创建好
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.1
(2)应用 grafana-deployment.yaml 文件
kubectl apply -f grafana-deployment.yaml
七、解决ControllerManage和Scheduler监控不到的问题
7.1、ControllerManage 监控
(1)修改/etc/kubernetes/manifests/kube-controller-manager.yaml
文件, 将--bind-address=127.0.0.1
改为 --bind-address=0.0.0.0
(2)新增 prometheus-kubeControllerManagerService.yaml 文件
vim prometheus-kubeControllerManagerService.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
app.kubernetes.io/name: kube-controller-manager #要与kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml文件的spec.selector.matchLabels相同
spec:
selector:
component: kube-controller-manager #此处注意为kube-controller-manager pod的标签
ports:
- name: https-metrics #名字要与kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml文件的spec.endpoints.port值相同
port: 10257 #注意端口号要正确
targetPort: 10257 #注意端口号要正确
protocol: TCP
(3)应用 prometheus-kubeControllerManagerService.yaml 文件
kubectl apply -f prometheus-kubeControllerManagerService.yaml
7.2、Scheduler 监控
(1)修改/etc/kubernetes/manifests/kube-scheduler.yaml
文件, 将--bind-address=127.0.0.1
改为 --bind-address=0.0.0.0
(2)新增 prometheus-kubeSchedulerService.yaml 文件
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
app.kubernetes.io/name: kube-scheduler #要与kubernetesControlPlane-serviceMonitorKubeScheduler.yaml文件的spec.selector.matchLabels相同
spec:
selector:
component: kube-scheduler #此处注意为kube-scheduler pod的标签
ports:
- name: https-metrics #名字要与kubernetesControlPlane-serviceMonitorKubeScheduler.yaml文件的spec.endpoints.port值相同
port: 10259 #注意端口号要正确
targetPort: 10259 #注意端口号要正确
protocol: TCP
(3)应用 prometheus-kubeControllerManagerService.yaml 文件
kubectl apply -f prometheus-kubeControllerManagerService.yaml
八、添加etcd监控
(1)创建ectd证书secret
首先查看一下ectd的证书路径,如果使用systemd部署的可以查看service文件:/etc/systemd/system/etcd.service,如果使用静态pod部署的,可以查看一下 /etc/kubernetes/manifests/etcd.yaml 文件,找到以下三个值的路径 :
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--key-file=/etc/kubernetes/pki/etcd/server.key
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
然后创建secret
kubectl create secret generic etcd-ssl --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/server.crt --from-file=/etc/kubernetes/pki/etcd/server.key -n monitoring
(2)修改 prometheus-prometheus.yaml 文件
...
replicas: 2 #找到此处,添加以下两行内容
secrets:
- etcd-ssl
...
#修改完成后应用一下
kubectl apply -f prometheus-prometheus.yaml
(3)创建ServiceMonitor对象用于监控etcd
vim kubernetesControlPlane-serviceMonitorEtcd.yaml
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: etcd
app.kubernetes.io/part-of: kube-prometheus
name: etcd
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-metrics
scheme: https
tlsConfig:
#证书相关路径为prometheus pod内的路径
caFile: /etc/prometheus/secrets/etcd-ssl/ca.crt
certFile: /etc/prometheus/secrets/etcd-ssl/server.crt
keyFile: /etc/prometheus/secrets/etcd-ssl/server.key
insecureSkipVerify: true
jobLabel: app.kubernetes.io/name
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app.kubernetes.io/name: etcd
#创建完应用一下
kubectl apply -f kubernetesControlPlane-serviceMonitorEtcd.yaml
(4)创建Service对象用于匹配etcd pod
vim prometheus-etcdService.yaml
---
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: etcd
labels:
app.kubernetes.io/name: etcd
spec:
selector:
component: etcd
ports:
- name: https-metrics
port: 2379
targetPort: 2379
protocol: TCP
#创建完应用一下
kubectl apply -f prometheus-etcdService.yaml
(5)稍等一会即可在prometheus web的targets页面看到etcd的信息
九、添加自定义监控
有时候我们可能需要添加集群外部的一些监控,比如MySQL、Redis、Kafka等,我们可以新建一个 prometheus-additional.yaml 文件,配置scrape_configs来添加额外监控组件。
(1)添加prometheus-additional.yaml文件,假设我们要添加MySQL和Redis的监控
- job_name: 'mysql-exporter'
static_configs:
- targets:
- 192.168.3.8:9104
- job_name: 'redis-exporter'
static_configs:
- targets:
- 192.168.3.9:9121
(2)然后我们需要将这些监控配置以secret资源类型存储到k8s集群中
kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring
(3)修改 prometheus-prometheus.yaml 文件
...
image: quay.io/prometheus/prometheus:v2.46.0 #找到此行,添加以下内容
additionalScrapeConfigs:
name: additional-scrape-configs
key: prometheus-additional.yaml
...
(4)应用 prometheus-prometheus.yaml 文件
kubectl apply -f prometheus-prometheus.yaml
(5)稍等一会然后在prometheus web的targets页面查看相关监控