返回顶部

基于Kube-Prometheus/v0.13.0的K8S监控部署

Kube-Prometheus不同版本支持的Kubernetes版本信息如下:

kube-prometheus stack Kubernetes 1.22 Kubernetes 1.23 Kubernetes 1.24 Kubernetes 1.25 Kubernetes 1.26 Kubernetes 1.27 Kubernetes 1.28
release-0.10 x x x
release-0.11 x x x
release-0.12 x x x
release-0.13 x
main x x

作者部署的Kubernetes集群版本为Kubernetes 1.28.3,系统使用的CentOS7.9,部署工具使用的sealos,部署方法请查看官文文档:Sealos Official Documents ,话不多说,来看下Kube-Prometheus的部署:

一、下载安装包并解压

wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz

tar -xf kube-prometheus-0.13.0.tar.gz

二、修改镜像地址

部分镜像地址国内访问不到,所以要修改镜像地址,国内镜像地址如失效,请重新找其它地址代替。

cd kube-prometheus-0.13.0/manifests
find ./ -type f |xargs sed -ri 's+registry.k8s.io/+k8s.mirror.nju.edu.cn/+g'
find ./ -type f |xargs sed -ri 's+quay.io/+k8s.mirror.nju.edu.cn/+g'

或者不改镜像地址,从这里下载相关镜像包,然后导入k8s集群:kube-prometheus-0.13.0镜像包

三、修改service端口类型

service默认使用端口类型为ClusterIP,为方便访问所以修改为NodePort类型,如果打算用Ingress访问,可以不修改。

(1)prometheus-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.46.0
  name: prometheus-k8s
  namespace: monitoring
spec:
  type: NodePort  #新增
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 32501		#新增
  - name: reloader-web
    port: 8080
    targetPort: reloader-web
  selector:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
  sessionAffinity: ClientIP

(2)grafana-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 9.5.3
  name: grafana
  namespace: monitoring
spec:
  type: NodePort   #新增
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 32500  #新增
  selector:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus

(3) alertmanager-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.26.0
  name: alertmanager-main
  namespace: monitoring
spec:
  type: NodePort   #新增
  ports:
  - name: web
    port: 9093
    targetPort: web
    nodePort: 32503  #新增
  - name: reloader-web
    port: 8080
    targetPort: reloader-web
  selector:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
  sessionAffinity: ClientIP

四、安装

kubectl apply --server-side -f manifests/setup
kubectl wait \
	--for condition=Established \
	--all CustomResourceDefinition \
	--namespace=monitoring
kubectl apply -f manifests/

五、访问页面

#prometheus
http://IP:32501
#alertmanager
http://IP:32503
#grafana
http://IP:32500

六、数据持久化

6.1、prometheus 数据持久化

这里使用NFS进行数据持久化,需要提前安装nfs-csi-provisioner,并创建好存储类。

(1)安装 nfs-csi-provisioner

#本地安装 
git clone https://github.com/kubernetes-csi/csi-driver-nfs.git
cd csi-driver-nfs
./deploy/install-driver.sh v4.7.0 local

#查看pod状态
kubectl -n kube-system get pod -o wide -l app=csi-nfs-controller
kubectl -n kube-system get pod -o wide -l app=csi-nfs-node

安装过程中需要拉取镜像,可能拉取不到,请自己修改镜像地址,主要涉及以下几个文件:

csi-driver-nfs/deploy目录下这两个文件:

csi-nfs-controller.yaml

csi-nfs-node.yaml

也可以从以下地址下载压缩包,然后导入至k8s集群中:

nfs-csi-provisioner v4.7.0镜像包

(2)创建存储类

vim prometheus-data-db-sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-data-db
provisioner: nfs.csi.k8s.io
parameters:
  server: 192.168.3.119			#nfs服务地址
  share: /app/nfsdata/prometheus/   #nfs目录,最好提前创建好
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1
kubectl apply -f prometheus-data-db-sc.yaml

(3) 修改 prometheus-prometheus.yaml 文件

#在文件末尾添加以下内容:
  retention: 30d	#数据保存天数
  storage:			#存储配置
    volumeClaimTemplate:
      spec:
        storageClassName: prometheus-data-db
        resources:
          requests:
            storage: 50Gi

(4)应用 prometheus-prometheus.yaml 文件

kubectl apply -f prometheus-prometheus.yaml

6.2、grafana 数据持久化

(1)修改 grafana-deployment.yaml 文件

#找到以下位置      
      volumes:
      #注释以下两行
      #- emptyDir: {}
      #  name: grafana-storage
      #添加以下内容,这个直接用nfs卷,如需使用pvc方式,请先创建存储类
      - name: grafana-storage
        nfs:
          server: 192.168.3.119
          path: /app/nfsdata/grafana
      #PVC方式
      - name: grafana-storage
        persistentVolumeClaim:
          claimName: grafana-pvc

创建存储类SC和存储类声明PVC过程

vim grafana-pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: grafana-pvc
  namespace: monitoring  #指定namespace为monitoring
spec:
  storageClassName: grafana-sc #指定StorageClass,没有可以不填写这一行。自己手动创建pv
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

vim grafana-sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: grafana-sc
provisioner: nfs.csi.k8s.io
parameters:
  server: 192.168.3.119			#nfs服务地址
  share: /app/nfsdata/grafana/   #nfs目录,最好提前创建好
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1

(2)应用 grafana-deployment.yaml 文件

kubectl apply -f grafana-deployment.yaml

七、解决ControllerManage和Scheduler监控不到的问题

7.1、ControllerManage 监控

(1)修改/etc/kubernetes/manifests/kube-controller-manager.yaml文件, 将--bind-address=127.0.0.1 改为 --bind-address=0.0.0.0

(2)新增 prometheus-kubeControllerManagerService.yaml 文件

vim prometheus-kubeControllerManagerService.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    app.kubernetes.io/name: kube-controller-manager  #要与kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml文件的spec.selector.matchLabels相同
spec:
  selector:
    component: kube-controller-manager	#此处注意为kube-controller-manager pod的标签
  ports:
  - name: https-metrics			#名字要与kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml文件的spec.endpoints.port值相同
    port: 10257					#注意端口号要正确
    targetPort: 10257			#注意端口号要正确
    protocol: TCP

(3)应用 prometheus-kubeControllerManagerService.yaml 文件

kubectl apply -f  prometheus-kubeControllerManagerService.yaml

7.2、Scheduler 监控

(1)修改/etc/kubernetes/manifests/kube-scheduler.yaml文件, 将--bind-address=127.0.0.1 改为 --bind-address=0.0.0.0

(2)新增 prometheus-kubeSchedulerService.yaml 文件

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    app.kubernetes.io/name: kube-scheduler  #要与kubernetesControlPlane-serviceMonitorKubeScheduler.yaml文件的spec.selector.matchLabels相同
spec:
  selector:
    component: kube-scheduler   #此处注意为kube-scheduler pod的标签
  ports:
  - name: https-metrics   		#名字要与kubernetesControlPlane-serviceMonitorKubeScheduler.yaml文件的spec.endpoints.port值相同
    port: 10259					#注意端口号要正确
    targetPort: 10259			#注意端口号要正确
    protocol: TCP

(3)应用 prometheus-kubeControllerManagerService.yaml 文件

kubectl apply -f prometheus-kubeControllerManagerService.yaml

八、添加etcd监控

(1)创建ectd证书secret

首先查看一下ectd的证书路径,如果使用systemd部署的可以查看service文件:/etc/systemd/system/etcd.service,如果使用静态pod部署的,可以查看一下 /etc/kubernetes/manifests/etcd.yaml 文件,找到以下三个值的路径 :

--cert-file=/etc/kubernetes/pki/etcd/server.crt
--key-file=/etc/kubernetes/pki/etcd/server.key
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

然后创建secret

kubectl create secret generic etcd-ssl --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/server.crt --from-file=/etc/kubernetes/pki/etcd/server.key -n monitoring

(2)修改 prometheus-prometheus.yaml 文件

...
  replicas: 2	#找到此处,添加以下两行内容  
  secrets:
  - etcd-ssl
...

#修改完成后应用一下
kubectl apply -f prometheus-prometheus.yaml

(3)创建ServiceMonitor对象用于监控etcd

vim kubernetesControlPlane-serviceMonitorEtcd.yaml
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: etcd
    app.kubernetes.io/part-of: kube-prometheus
  name: etcd
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      #证书相关路径为prometheus pod内的路径
      caFile: /etc/prometheus/secrets/etcd-ssl/ca.crt
      certFile: /etc/prometheus/secrets/etcd-ssl/server.crt
      keyFile: /etc/prometheus/secrets/etcd-ssl/server.key
      insecureSkipVerify: true
  jobLabel: app.kubernetes.io/name
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: etcd
      
#创建完应用一下
kubectl apply -f kubernetesControlPlane-serviceMonitorEtcd.yaml

(4)创建Service对象用于匹配etcd pod

vim prometheus-etcdService.yaml
---
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: etcd
  labels:
    app.kubernetes.io/name: etcd
spec:
  selector:
    component: etcd
  ports:
  - name: https-metrics
    port: 2379
    targetPort: 2379
    protocol: TCP
    
#创建完应用一下
kubectl apply -f prometheus-etcdService.yaml

(5)稍等一会即可在prometheus web的targets页面看到etcd的信息

九、添加自定义监控

有时候我们可能需要添加集群外部的一些监控,比如MySQL、Redis、Kafka等,我们可以新建一个 prometheus-additional.yaml 文件,配置scrape_configs来添加额外监控组件。

(1)添加prometheus-additional.yaml文件,假设我们要添加MySQL和Redis的监控

- job_name: 'mysql-exporter'
  static_configs:
    - targets:
      - 192.168.3.8:9104

- job_name: 'redis-exporter'
  static_configs:
  - targets:
    - 192.168.3.9:9121

(2)然后我们需要将这些监控配置以secret资源类型存储到k8s集群中

kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring

(3)修改 prometheus-prometheus.yaml 文件

...
  image: quay.io/prometheus/prometheus:v2.46.0  #找到此行,添加以下内容
  additionalScrapeConfigs:
    name: additional-scrape-configs
    key: prometheus-additional.yaml
...

(4)应用 prometheus-prometheus.yaml 文件

kubectl apply -f prometheus-prometheus.yaml

(5)稍等一会然后在prometheus web的targets页面查看相关监控

posted @ 2024-07-05 14:00  hovin  阅读(304)  评论(0编辑  收藏  举报