65、K8S-使用K8S部署Prometheus、grafana【使用】
1、运行状态查询
安装好后,我们就要看看运行状态怎么样
1.1、Pod运行状态
]# kubectl -n monitoring get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES alertmanager-main-0 2/2 Running 0 96m 10.244.3.7 node1 <none> <none> alertmanager-main-1 2/2 Running 0 96m 10.244.4.8 node2 <none> <none> alertmanager-main-2 2/2 Running 0 96m 10.244.4.9 node2 <none> <none> blackbox-exporter-84bb6f6bd9-2tr2q 3/3 Running 0 95m 10.244.3.9 node1 <none> <none> grafana-7bdbdbcb4b-67qsj 1/1 Running 0 74m 10.244.3.13 node1 <none> <none> kube-state-metrics-c7c57885f-scxdh 3/3 Running 0 94m 10.244.3.10 node1 <none> <none> node-exporter-27bgj 2/2 Running 0 93m 192.168.10.27 master2 <none> <none> node-exporter-cnzhw 2/2 Running 0 93m 192.168.10.30 node2 <none> <none> node-exporter-knqgv 2/2 Running 0 93m 192.168.10.29 node1 <none> <none> node-exporter-qwbb6 2/2 Running 0 93m 192.168.10.26 master1 <none> <none> prometheus-adapter-67d7695cb7-7wf9j 1/1 Running 0 95m 10.244.4.10 node2 <none> <none> prometheus-adapter-67d7695cb7-vbdkr 1/1 Running 0 95m 10.244.3.8 node1 <none> <none> prometheus-k8s-0 2/2 Running 0 93m 10.244.3.12 node1 <none> <none> prometheus-k8s-1 2/2 Running 0 93m 10.244.4.11 node2 <none> <none> prometheus-operator-ffcc9958-2dbgn 2/2 Running 0 94m 10.244.3.11 node1 <none> <none>
1.2、SVC运行状态
1.2.1、svc运行状态
]# kubectl -n monitoring get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main NodePort 10.100.113.107 <none> 9093:30093/TCP,8080:30081/TCP 97m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 97m blackbox-exporter ClusterIP 10.105.55.97 <none> 9115/TCP,19115/TCP 96m grafana NodePort 10.102.101.236 <none> 3000:30030/TCP 106m kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 95m node-exporter ClusterIP None <none> 9100/TCP 94m prometheus-adapter ClusterIP 10.110.224.24 <none> 443/TCP 96m prometheus-k8s NodePort 10.104.132.49 <none> 9090:30090/TCP,8080:30080/TCP 93m prometheus-operated ClusterIP None <none> 9090/TCP 93m prometheus-operator ClusterIP None <none> 8443/TCP 95m
注意:ClusterIP=None,表示是headless服务【即无头服务】
1.2.2、svc端口开放情况分析
alertmanager-main type: NodePort 9093:30093/TCP,8080:30081/TCP alertmanager web端口: pod端口:9093 svc端口:30093 alertmanager metrics端口: pod端口:8080 svc端口:30081 -------------------------------- grafana type: NodePort 3000:30030/TCP pod端口:3000 svc端口:30030 -------------------------------- prometheus-k8s type: NodePort 9090:30090/TCP,8080:30080/TCP prometheus web端口: pod端口:9090 svc端口:30090 prometheus metrics端口: pod端口:8080 svc端口:30080
1.3、EndPoints查询
# 主要查询svc与endpoint关联关系 ]# kubectl -n monitoring get endpoints NAME ENDPOINTS AGE alertmanager-main 10.244.3.7:8080,10.244.4.8:8080,10.244.4.9:8080 + 3 more... 112m alertmanager-operated 10.244.3.7:9094,10.244.4.8:9094,10.244.4.9:9094 + 6 more... 112m blackbox-exporter 10.244.3.9:9115,10.244.3.9:19115 110m grafana 10.244.3.13:3000 120m kube-state-metrics 10.244.3.10:8443,10.244.3.10:9443 110m node-exporter 192.168.10.26:9100,192.168.10.27:9100,192.168.10.29:9100 + 1 more... 109m prometheus-adapter 10.244.3.8:6443,10.244.4.10:6443 111m prometheus-k8s 10.244.3.12:8080,10.244.4.11:8080,10.244.3.12:9090 + 1 more... 108m prometheus-operated 10.244.3.12:9090,10.244.4.11:9090 108m prometheus-operator 10.244.3.11:8443 110m
1.4、查询prometheus资源状态
]# kubectl -n monitoring get prometheus NAME VERSION DESIRED READY RECONCILED AVAILABLE AGE k8s 2.41.0 2 2 True True 109m
2、Prometheus Web端查询
2.1、查询targets页面
2.2、Graph页面查询
2.2.1、进行数据的采集查询
例如查询K8S集群中每个POD的CPU使用情况,可以使用如下查询条件查询:
提示:metrics的指标名称 container_cpu_usage_seconds_total sum(rate(container_cpu_usage_seconds_total{image!="", pod!=""}[1m] )) by (pod)
2.3、规则页面查询
这里为我们自动增加很多规则,后面可以进一步的学习,包括不知道规则怎么写的,可以参考一下。
2.4、如果修改Prometheus配置文件
2.4.1、解压出配置文件
# 解压prometheus.yaml ]# kubectl -n monitoring get secrets prometheus-k8s -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gzip -d # 这里以alertmanager为例 ]# kubectl -n monitoring get secrets alertmanager-main -o jsonpath='{.data.alertmanager\.yaml}' | base64 -d "global": "resolve_timeout": "5m" "inhibit_rules": - "equal": - "namespace" - "alertname" "source_matchers": - "severity = critical" "target_matchers": - "severity =~ warning|info" - "equal": - "namespace" - "alertname" "source_matchers": - "severity = warning" "target_matchers": - "severity = info" - "equal": - "namespace" "source_matchers": - "alertname = InfoInhibitor" "target_matchers": - "severity = info" "receivers": - "name": "Default" - "name": "Watchdog" - "name": "Critical" - "name": "null" "route": "group_by": - "namespace" "group_interval": "5m" "group_wait": "30s" "receiver": "Default" "repeat_interval": "12h" "routes": - "matchers": - "alertname = Watchdog" "receiver": "Watchdog" - "matchers": - "alertname = InfoInhibitor" "receiver": "null" - "matchers": - "severity = critical" "receiver": "Critical"
2.4.2、输出为文件
]# kubectl -n monitoring get secrets prometheus-k8s -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gzip -d >prometheus.yaml.v1
2.4.3、修改好后,再base64,再gzip即可
gzip -c prometheus.yaml.v1 | base64
2.4.4、再通过edit修改
再通过kubectl edit修改
3、Grafana Web端查询
3.1、登陆的帐号与密码
admin/admin
3.2、默认已经配置数据源
3.2.1、查询数据源配置
3.2.2、为什么自动配置数据源
]# vi kube-prometheus-0.12.0/manifests/prom_adapter/prometheusAdapter-deployment.yaml spec: automountServiceAccountToken: true containers: - args: - --cert-dir=/var/run/serving-cert - --config=/etc/adapter/config.yaml - --logtostderr=true - --metrics-relist-interval=1m - --prometheus-url=http://prometheus-k8s.monitoring.svc:9090/ # 此处已经配置 - --secure-port=6443
3.2.3、增加画图模板
这里不再重复介绍,请看文章:https://www.cnblogs.com/ygbh/p/17299339.html#_label3_2_1_2
4、AlertManager Web端查询
5、prometheus、grafana、alertmanager定制Ingress-nginx
5.1、创建ingress-nginx资源
5.1.1、定义资源配置清单
kubectl apply -f - <<EOF apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ingress-prometheus namespace: monitoring annotations: kubernetes.io/ingress.class: "nginx" prometheus.io/http_probe: "true" spec: rules: - host: alert.localprom.com http: paths: - path: / pathType: Prefix backend: service: name: alertmanager-main port: number: 9093 - host: grafana.localprom.com http: paths: - path: / pathType: Prefix backend: service: name: grafana port: number: 3000 - host: prom.localprom.com http: paths: - path: / pathType: Prefix backend: service: name: prometheus-k8s port: number: 9090 EOF
5.1.2、查询ingress资源状态
]# kubectl get ingress -n monitoring NAME CLASS HOSTS ADDRESS PORTS AGE ingress-prometheus <none> alert.localprom.com,grafana.localprom.com,prom.localprom.com 80 13s
5.2、配置hosts
192.168.10.222 prom.localprom.com 192.168.10.222 grafana.localprom.com 192.168.10.222 alert.localprom.com
5.3、访问测试
5.3.1、prometheus
5.3.2、grafana
5.3.3、alertmanager
6、Prometheus增加 controller、scheduler组件监控
6.1、需求
默认情况下,prometheus没有监控到 controller 和 scheduler的信息
6.1.1、targets截图
6.1.2、配置的流程
prometheus 要监控k8s的组件,我们需要关注以下两点: 1、kubernetes必须开放controller和scheduler的监听地址要定制专用的endpoint和svc资源对象 2、prometheus是根据 kubernetes-serviceMonitorKubeScheduler.yaml和kubernetesserviceMonitorKubeControllerManager.yaml 文件来进行监控的。所以,定制的svc关联出来的labels 必须要与kube-prometheus的值一致
6.2、开放监听地址【所有的master节点都要修改】
修改好,会自动加载,不用重启服务
]# kubectl -n kube-system get pods -o wide| grep -E 'schedu|control' calico-kube-controllers-74846594dd-76m7g 1/1 Running 0 9d 10.244.1.2 master2 <none> <none> kube-controller-manager-master1 1/1 Running 0 48s 192.168.10.26 master1 <none> <none> kube-controller-manager-master2 1/1 Running 0 2m2s 192.168.10.27 master2 <none> <none> kube-scheduler-master1 1/1 Running 0 49s 192.168.10.26 master1 <none> <none> kube-scheduler-master2 1/1 Running 0 2m42s 192.168.10.27 master2 <none> <none>
6.2.1、controller-manager修改
]# vi /etc/kubernetes/manifests/kube-controller-manager.yaml spec: containers: - command: - kube-controller-manager - --allocate-node-cidrs=true - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf - --bind-address=0.0.0.0 - --client-ca-file=/etc/kubernetes/pki/ca.crt - --cluster-cidr=10.244.0.0/16
6.2.2、scheduler修改
]# vi /etc/kubernetes/manifests/kube-scheduler.yaml - command: - kube-scheduler - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf - --bind-address=0.0.0.0 - --kubeconfig=/etc/kubernetes/scheduler.conf - --leader-elect=true
6.3、定制采集controller数据
6.3.1、创建资源配置清单
kubectl apply -f - <<EOF apiVersion: v1 kind: Service metadata: name: kube-controller-manager namespace: kube-system labels: app.kubernetes.io/name: kube-controller-manager spec: type: ClusterIP clusterIP: None ports: - name: https-metrics port: 10257 targetPort: 10257 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: name: kube-controller-manager namespace: kube-system labels: app.kubernetes.io/name: kube-controller-manager subsets: - addresses: - ip: 192.168.10.26 - ip: 192.168.10.27 ports: - name: https-metrics port: 10257 protocol: TCP EOF # 属性解析:这里面的addresses 是master的节点地址,把所有master节点都配置上
6.3.2、查询运行状态
]# kubectl -n kube-system get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-controller-manager ClusterIP None <none> 10257/TCP 114s
]# kubectl -n kube-system get endpoints NAME ENDPOINTS AGE kube-controller-manager 192.168.10.26:10257,192.168.10.27:10257 2m1s
6.4、定制采集Scheduler数据
6.4.1、创建资源配置清单
kubectl apply -f - <<EOF apiVersion: v1 kind: Service metadata: name: kube-scheduler namespace: kube-system labels: app.kubernetes.io/name: kube-scheduler spec: type: ClusterIP clusterIP: None ports: - name: https-metrics port: 10259 targetPort: 10259 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: name: kube-scheduler namespace: kube-system labels: app.kubernetes.io/name: kube-scheduler subsets: - addresses: - ip: 192.168.10.26 - ip: 192.168.10.27 ports: - name: https-metrics port: 10259 protocol: TCP EOF # 属性解析:这里面的addresses 是master的节点地址,把所有master节点都配置上
6.3.3、查询运行状态
]# kubectl -n kube-system get endpoints NAME ENDPOINTS AGE kube-scheduler 192.168.10.26:10259,192.168.10.27:10259 53s
]# kubectl -n kube-system get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-scheduler ClusterIP None <none> 10259/TCP 57s
6.4、查询prometheus targets是否有增加
此时已经增加完成