每天一点K8S基础--使用prometheus监控K8S集群

用prometheus监控K8S集群

1、监控任务简介

依据prometheus架构图,使用prometheus对K8S集群进行监控,主要包括以下几个方面:
  1、对集群节点状态监控;可以在集群中每个节点运行exports实现;
  2、对节点中的pod进行监控;集群各节点上的POD情况均受kubelet管理,kubelet服务内置的cadvisor可以获取pod的运行情况和资源使用情况;
  3、对K8S的核心组件进行监控,API server、etcd、kubelet、controller-manager、kube-scheduler、kube-proxy等

2、创建node-exporter监控node资源

# yaml文件
链接: https://pan.baidu.com/s/12kwzyK5f3FjDNK4HV-ADJA 提取码: cjmo 


# 创建namespace
[ master-worker-node-1 root ~] # kubectl  create namespace monitor-sa
namespace/monitor-sa created

# 创建daemonset资源
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl  apply -f node-exporter.yaml 
daemonset.apps/node-exporter created

[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl  get daemonset -n monitor-sa 
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
node-exporter   3         3         3       3            3           <none>          3m42s


[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl  get pods  -n monitor-sa -o wide 
NAME                  READY   STATUS    RESTARTS   AGE   IP               NODE                   NOMINATED NODE   READINESS GATES
node-exporter-95p4m   1/1     Running   0          40s   192.168.100.92   master-worker-node-2   <none>           <none>
node-exporter-hp5zb   1/1     Running   0          39s   192.168.100.93   master-worker-node-3   <none>           <none>
node-exporter-hvf47   1/1     Running   0          40s   192.168.100.91   master-worker-node-1   <none>           <none>

# 检查端口信息
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # curl http://192.168.100.91:9100/metrics | grep node_cpu_seconds |  head -5
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k    0   97k    0     0  3906k      0 --:--:-- --:--:-- --:--:-- 3906k
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter   # counter类型
node_cpu_seconds_total{cpu="0",mode="idle"} 694084.85
node_cpu_seconds_total{cpu="0",mode="iowait"} 2498.56
node_cpu_seconds_total{cpu="0",mode="irq"} 14901.21

3、配置prometheus server

3.1、给prometheus server创建service account,并配置rbac授权
# 创建service account
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl create serviceaccount monitor -n monitor-sa 
serviceaccount/monitor created

# 进行clusterrolebinding
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl create clusterrolebinding monitor --serviceaccount monitor-sa:monitor --clusterrole cluster-admin --namespace monitor-sa 
clusterrolebinding.rbac.authorization.k8s.io/monitor created
3.2 创建prometheus本地数据目录
# 此处采用NFS先挂在到node节点,三个节点都有相同的目录,所以prometheus server pod飘移不影响运行。
[ master-worker-node-1 root ~] # df -Th  |  grep -E "Type|192.168.100.94"
Filesystem                        Type      Size  Used Avail Use% Mounted on
192.168.100.94:/prometheus/node-1 nfs4       51G  2.5G   49G   5% /prometheus

# 创建目录,并授予权限
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # mkdir /prometheus/data

[ master-worker-node-1 root ~/k8s/yaml/prometheus] # ls -ld /prometheus/data 
drwxr-xr-x. 2 nobody nobody 6 Feb 19 12:54 /prometheus/data
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # chmod 777 /prometheus/data
3.3 配置prometheus server配置文件
# 以configmap的形式添加prometheus server配置文件。
# 链接: https://pan.baidu.com/s/1RuyLOoH_FM90mG8CT1hKtQ 提取码: lpvg 

[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl  apply -f prometheus-cfg.yaml  
configmap/prometheus-config created
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl  get configmap -n monitor-sa 
NAME                DATA   AGE
kube-root-ca.crt    1      15h
prometheus-config   1      7s

# 因为prometheus server的配置文件中的scrape部分配置了4个job,那么登录页面就可以看到4个target
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # cat prometheus-cfg.yaml |  grep job 
    - job_name: 'kubernetes-node'  
    - job_name: 'kubernetes-node-cadvisor'
    - job_name: 'kubernetes-apiserver'
    - job_name: 'kubernetes-service-endpoints'

1、kubernetes-node: 是通过kubelet默认提供的10250端口发现
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # ss -ntupal |  grep 10250
tcp   LISTEN 0      128                 *:10250            *:*    users:(("kubelet",pid=9468,fd=27))  

2、kubernetes-node-cadvisor: 是通过kubelet/metrics/cadvisor接口发现

3、kubernetes-apiserver: 是通过采集apiserver的6443端口发现
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # ss -naltup | grep 6443 -w
tcp   LISTEN 0      128                 *:6443             *:*    users:(("kube-apiserver",pid=7520,fd=7))   

4、kubernetes-service-endpoints: 是通过service的annotation中的相应信息实现。
3.4、通过deployment部署prometheus server
# 创建prometheus pod,并将configmap以volume的形式挂载
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # cat prometheus-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitor-sa
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
      component: server
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
        component: server
    spec:
      serviceAccount: monitor
      containers:
        - name: prometheus-server
          image: bitnami/prometheus:2.42.0
          imagePullPolicy: IfNotPresent
          command:
            - prometheus
            - --config.file=/etc/prometheus/prometheus.yml
            - --storage.tsdb.path=/prometheus
            - --storage.tsdb.retention=720h
            - --web.enable-lifecycle
          ports:
              - containerPort: 9090
                protocol: TCP
          volumeMounts:
              - mountPath: /etc/prometheus
                name: prometheus-config
              - mountPath: /prometheus/
                name: prometheus-storage-volume
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config
      - name: prometheus-storage-volume
        hostPath:
          path: /prometheus/data
          type: Directory    
          
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl apply -f prometheus-deployment.yaml 
deployment.apps/prometheus created

# pod启动正常
[ master-worker-node-1 root /prometheus] # kubectl  get deployments.apps -o wide -n monitor-sa prometheus 
NAME         READY   UP-TO-DATE   AVAILABLE   AGE    CONTAINERS          IMAGES                      SELECTOR
prometheus   1/1     1            1           4m1s   prometheus-server   bitnami/prometheus:2.42.0   app=prometheus,component=server

[ master-worker-node-1 root /prometheus] # kubectl get pods -o wide -n monitor-sa prometheus-666897fc49-2jpnx 
NAME                          READY   STATUS    RESTARTS   AGE     IP               NODE                   NOMINATED NODE   READINESS GATES
prometheus-666897fc49-2jpnx   1/1     Running   0          4m12s   10.244.132.197   master-worker-node-2   <none>           <none>
3.5、暴露prometheus server pod的端口
[ master-worker-node-1 root ~] # kubectl  expose deployment -n monitor-sa prometheus --type NodePort --name prometheus --port 9090 --target-port 9090 --protocol TCP 
service/prometheus exposed

[ master-worker-node-1 root ~] # kubectl  get service -o wide -n monitor-sa 
NAME         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE   SELECTOR
prometheus   NodePort   10.103.206.76   <none>        9090:31346/TCP   26s   app=prometheus,component=server
# 此时就可以登录界面进行查看,在target页签下可以看到相应的信息。

3.6、手动添加service到prometheus中
# 当前target中的service-endpoints只有kube-dns service中的两个endpoints

# 能被加入到target中是因为service的annotation被kubernetes-service-endpoints job选中
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl  describe service -n kube-system kube-dns 
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=CoreDNS
Annotations:       prometheus.io/port: 9153
                   prometheus.io/scrape: true

# 新创建一个deployment nginx
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl  get deployments.apps  nginx -owide 
NAME    READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES   SELECTOR
nginx   1/1     1            1           95s   nginx        nginx    app=nginx

# 给deployment nginx配置service,其中service配置相应的annotations
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl get service -o wide nginx-deployment-service 
NAME                       TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)           AGE   SELECTOR
nginx-deployment-service   NodePort   10.99.22.192   <none>        30080:31942/TCP   11s   app=nginx

# 此时失败主要是因为nginx默认不携带metrics资源,需要通过service monitor创建一个接口。后面会有单独的操作流程。

4、prometheus热加载

# 当prometheus的配置文件进行了修改,删除prometheus pod将导致数据丢失,如果prometheus配置文件中指定了热加载参数,可以通过命令进行热加载 

# prometheus容器启动时,添加了相应的热加载参数
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # cat prometheus-deployment.yaml  |  grep lifecycle
            - --web.enable-lifecycle
            
            
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # kubectl get pods -n monitor-sa prometheus-6bb56888bc-k2t6h -o wide 
NAME                          READY   STATUS    RESTARTS   AGE     IP               NODE                   NOMINATED NODE   READINESS GATES
prometheus-6bb56888bc-k2t6h   1/1     Running   0          7h51m   10.244.123.136   master-worker-node-1   <none>           <none>

# 执行热加载
[ master-worker-node-1 root ~/k8s/yaml/prometheus] # curl -X POST 10.244.123.136:9090/-/reload

5、小结

1、prometheus监控K8S集群主要包括对node节点监控,节点资源监控和K8S基础服务监控

2、job中有大量的语法进行特殊配置;

3、如何暴露接口,没有自带接口时如果构造接口,是配置prometheus的重点;

4、prometheus创建中如果指定了热加载参数,那么可以对运行中的prometheus进行热加载。防止重启prometheus过程中数据丢失。

posted @ 2023-02-21 21:38  woshinidaye  阅读(2223)  评论(0编辑  收藏  举报