k8s环境下部署Prometheus+Grafana监控
环境:k8s-master 192.168.1.130
k8s-master 192.168.1.131
期望效果:可以看到个节点的监控情况
node-exporter用以采集信息
获取镜像
[root@k8s-node ~]# docker search node-exporter
[root@k8s-node ~]# docker pull prom/node-exporter
[root@k8s-node ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE prom/node-exporter latest 1dbe0e931976 4 weeks ago 20.9M
如果有本地镜像
1 | [root@k8s-node ~] # docker load -i node-exporter |
node-exporter.yaml文件
[root@k8s-master ~]# cat node-exporter.yaml apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: kube-system labels: name: node-exporter spec: selector: matchLabels: name: node-exporter template: metadata: labels: name: node-exporter spec: hostPID: true hostIPC: true hostNetwork: true containers: - name: node-exporter image: prom/node-exporter:latest ports: - containerPort: 9100 resources: requests: cpu: 0.15 securityContext: privileged: true args: - --path.procfs - /host/proc - --path.sysfs - /host/sys - --collector.filesystem.ignored-mount-points - '"^/(sys|proc|dev|host|etc)($|/)"' volumeMounts: - name: dev mountPath: /host/dev - name: proc mountPath: /host/proc - name: sys mountPath: /host/sys - name: rootfs mountPath: /rootfs tolerations: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule" volumes: - name: proc hostPath: path: /proc - name: dev hostPath: path: /dev - name: sys hostPath: path: /sys - name: rootfs hostPath: path: /
# image: prom/node-exporter:latest 根据自己的镜像配置,我的是拉取最新镜像latest
1 2 | [root@k8s-master ~] # kubectl apply -f node-exporter.yaml daemonset.apps /node-exporter created |
查看pods
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [root@k8s-master ~] # kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-6949477b58-vslgj 1 /1 Running 4 27h calico-node-j2tzw 1 /1 Running 7 27h calico-node-xp5tr 1 /1 Running 7 27h coredns-7f89b7bc75-bdz9n 1 /1 Running 2 27h coredns-7f89b7bc75-tt9rr 1 /1 Running 2 27h etcd-k8s-master 1 /1 Running 2 27h kube-apiserver-k8s-master 1 /1 Running 2 27h kube-controller-manager-k8s-master 1 /1 Running 4 27h kube-proxy-2bh6r 1 /1 Running 2 27h kube-proxy-7xsjb 1 /1 Running 2 27h kube-scheduler-k8s-master 1 /1 Running 4 27h node-exporter-b2dt2 1 /1 Running 0 43s node-exporter-k8vtw 1 /1 Running 0 43s |
kubectl get pods -n kube-system -o wide
[root@k8s-master ~]# kubectl get pods -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-6949477b58-vslgj 1/1 Running 4 27h 10.122.235.207 k8s-master <none> <none> calico-node-j2tzw 1/1 Running 7 27h 192.168.1.131 k8s-node <none> <none> calico-node-xp5tr 1/1 Running 7 27h 192.168.1.130 k8s-master <none> <none> coredns-7f89b7bc75-bdz9n 1/1 Running 2 27h 10.122.235.208 k8s-master <none> <none> coredns-7f89b7bc75-tt9rr 1/1 Running 2 27h 10.122.235.206 k8s-master <none> <none> etcd-k8s-master 1/1 Running 2 27h 192.168.1.130 k8s-master <none> <none> kube-apiserver-k8s-master 1/1 Running 2 27h 192.168.1.130 k8s-master <none> <none> kube-controller-manager-k8s-master 1/1 Running 4 27h 192.168.1.130 k8s-master <none> <none> kube-proxy-2bh6r 1/1 Running 2 27h 192.168.1.130 k8s-master <none> <none> kube-proxy-7xsjb 1/1 Running 2 27h 192.168.1.131 k8s-node <none> <none> kube-scheduler-k8s-master 1/1 Running 4 27h 192.168.1.130 k8s-master <none> <none> node-exporter-b2dt2 1/1 Running 0 85s 192.168.1.130 k8s-master <none> <none> node-exporter-k8vtw 1/1 Running 0 85s 192.168.1.131 k8s-node <none> <none>
查看 ip:9100
1 | curl 192.168.1.131:9100 /metrics <br>或者<br><a href= "http://192.168.1.130:9100/metrics" rel= "noopener nofollow" >http: //192 .168.1.130:9100 /metrics < /a > |
由于信息太多,可以输出到文件查看
1 2 3 4 | [root@k8s-master ~] # curl 192.168.1.131:9100/metrics > node-exporter.txt % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 70951 0 70951 0 0 637k 0 --:--:-- --:--:-- --:--:-- 653k |
或者,进行过滤查看
[root@k8s-master ~]# curl 192.168.1.130:9100/metrics | grep node_load
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP node_load1 1m load average.
# TYPE node_load1 gauge
node_load1 2.4
# HELP node_load15 15m load average.
# TYPE node_load15 gauge
node_load15 2.4
# HELP node_load5 5m load average.
# TYPE node_load5 gauge
node_load5 2.67
100 86253 0 86253 0 0 301k 0 --:--:-- --:--:-- --:--:-- 307k
部署prometheus组
[root@k8s-node ~]# docker pull prom/prometheus:v2.0.0
#获取镜像prometheus
rbac-setup.yaml文件
[root@k8s-master prometheus]# cat rbac-setup.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: - extensions resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: kube-system
[root@k8s-master ~]# kubectl create -f rbac-setup.yaml clusterrole.rbac.authorization.k8s.io/prometheus created serviceaccount/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created
1 | 以configmap的形式管理prometheus组件的配置文件 |
1 2 | [root@k8s-master ~] # kubectl create -f configmap.yaml configmap /prometheus-config created |
configmap.yaml文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | [root@k8s-master prometheus] # cat configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-system data: prometheus.yml: | global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes .io /serviceaccount/ca .crt bearer_token_file: /var/run/secrets/kubernetes .io /serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes .io /serviceaccount/ca .crt bearer_token_file: /var/run/secrets/kubernetes .io /serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/ ${1} /proxy/metrics - job_name: 'kubernetes-cadvisor' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes .io /serviceaccount/ca .crt bearer_token_file: /var/run/secrets/kubernetes .io /serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/ ${1} /proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - job_name: 'kubernetes-services' kubernetes_sd_configs: - role: service metrics_path: /probe params: module: [http_2xx] relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name - job_name: 'kubernetes-ingresses' kubernetes_sd_configs: - role: ingress relabel_configs: - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path] regex: (.+);(.+);(.+) replacement: ${1}: // ${2}${3} target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_ingress_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_ingress_name] target_label: kubernetes_name - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name |
[root@k8s-master ~]# kubectl create -f prometheus.deploy.yml
deployment.apps/prometheus created
prometheus.deploy.yml 文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | [root@k8s-master prometheus] # cat prometheus.deploy.yml --- apiVersion: apps /v1 kind: Deployment metadata: labels: name: prometheus-deployment name: prometheus namespace: kube-system spec: replicas: 1 selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: containers: - image: prom /prometheus :v2.0.0 name: prometheus command : - "/bin/prometheus" args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention=24h" ports: - containerPort: 9090 protocol: TCP volumeMounts: - mountPath: "/prometheus" name: data - mountPath: "/etc/prometheus" name: config-volume resources: requests: cpu: 100m memory: 100Mi limits: cpu: 500m memory: 2500Mi serviceAccountName: prometheus volumes: - name: data emptyDir: {} - name: config-volume configMap: name: prometheus-config |
kubectl get pods -n kube-system
[root@k8s-master ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-6949477b58-vslgj 1/1 Running 4 28h calico-node-j2tzw 1/1 Running 12 28h calico-node-xp5tr 1/1 Running 9 28h coredns-7f89b7bc75-bdz9n 1/1 Running 2 28h coredns-7f89b7bc75-tt9rr 1/1 Running 2 28h etcd-k8s-master 1/1 Running 2 28h kube-apiserver-k8s-master 1/1 Running 2 28h kube-controller-manager-k8s-master 1/1 Running 4 28h kube-proxy-2bh6r 1/1 Running 2 28h kube-proxy-7xsjb 1/1 Running 2 28h kube-scheduler-k8s-master 1/1 Running 4 28h node-exporter-b2dt2 1/1 Running 0 74m node-exporter-k8vtw 1/1 Running 0 74m prometheus-68546b8d9-nvr2v 1/1 Running 0 2m47s
[root@k8s-master ~]# kubectl create -f prometheus.svc.yml
service/prometheus created
1 | prometheus.svc.yml文件 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | root@k8s-master prometheus] # cat prometheus.svc.yml --- kind: Service apiVersion: v1 metadata: labels: app: prometheus name: prometheus namespace: kube-system spec: type : NodePort ports: - port: 9090 targetPort: 9090 nodePort: selector: app: prometheus |
kubectl get svc -n kube-system
1 2 3 4 | [root@k8s-master ~] # kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.10.0.10 <none> 53 /UDP ,53 /TCP ,9153 /TCP 28h prometheus NodePort 10.10.10.193 <none> 9090:30003 /TCP 68s |
访问prometheus :IP:30003
部署grafana组件
1 | [root@k8s-node ~] # docker pulgrafana:4.2.0#获取 grafana镜像 |
1 2 | [root@k8s-master ~] # kubectl create -f grafana-deploy.yaml deployment.apps /grafana-core created |
grafana-deploy.yaml文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | [root@k8s-master ~] # cat grafana-deploy.yaml apiVersion: apps /v1 kind: Deployment metadata: name: grafana-core namespace: kube-system labels: app: grafana component: core spec: replicas: 1 selector: matchLabels: app: grafana template: metadata: labels: app: grafana component: core spec: containers: - image: grafana /grafana :4.2.0 name: grafana-core imagePullPolicy: IfNotPresent # env: resources: # keep request = limit to keep this container in guaranteed class limits: cpu: 100m memory: 500Mi requests: cpu: 100m memory: 500Mi env : # The following env variables set up basic auth twith the default admin user and admin password. - name: GF_AUTH_BASIC_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ENABLED value: "false" # - name: GF_AUTH_ANONYMOUS_ORG_ROLE # value: Admin # does not really work, because of template variables in exported dashboards: # - name: GF_DASHBOARDS_JSON_ENABLED # value: "true" readinessProbe: httpGet: path: /login port: 3000 # initialDelaySeconds: 30 # timeoutSeconds: 1 volumeMounts: - name: grafana-persistent-storage mountPath: /var volumes: - name: grafana-persistent-storage emptyDir: {} |
create -f grafana-svc.yaml
1 2 | [root@k8s-master ~] # kubectl create -f grafana-svc.yaml service /grafana created |
grafana-service.yaml文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | [root@k8s-master ~] # cat grafana-service.yaml apiVersion: v1 kind: Service metadata: name: grafana namespace: kube-system labels: app: grafana component: core spec: type : NodePort ports: - port: 3000 selector: app: grafana component: core |
1 2 3 | grafana ingress配置文件 kubectl create -f grafana-ing.yaml |
1 2 3 4 5 | [root@k8s-master ~] # kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE grafana NodePort 10.10.40.199 <none> 3000:32374 /TCP 5m56s kube-dns ClusterIP 10.10.0.10 <none> 53 /UDP ,53 /TCP ,9153 /TCP 29h prometheus NodePort 10.10.10.193 <none> 9090:30003 /TCP 16m |
kubectl get all -n kube-system
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | [root@k8s-master ~] # kubectl get all -n kube-system NAME READY STATUS RESTARTS AGE pod /calico-kube-controllers-6949477b58-vslgj 1 /1 Running 4 29h pod /calico-node-j2tzw 0 /1 CrashLoopBackOff 30 29h pod /calico-node-xp5tr 0 /1 Running 15 29h pod /coredns-7f89b7bc75-bdz9n 1 /1 Running 2 30h pod /coredns-7f89b7bc75-tt9rr 1 /1 Running 2 30h pod /etcd-k8s-master 1 /1 Running 2 30h pod /grafana-core-6895c7468b-wwn42 1 /1 Running 0 73m pod /kube-apiserver-k8s-master 1 /1 Running 2 30h pod /kube-controller-manager-k8s-master 1 /1 Running 4 30h pod /kube-proxy-2bh6r 1 /1 Running 2 30h pod /kube-proxy-7xsjb 1 /1 Running 2 29h pod /kube-scheduler-k8s-master 1 /1 Running 4 30h pod /node-exporter-b2dt2 1 /1 Running 0 158m pod /node-exporter-k8vtw 1 /1 Running 0 158m pod /prometheus-68546b8d9-nvr2v 1 /1 Running 0 86m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service /grafana NodePort 10.10.40.199 <none> 3000:32374 /TCP 71m service /kube-dns ClusterIP 10.10.0.10 <none> 53 /UDP ,53 /TCP ,9153 /TCP 30h service /prometheus NodePort 10.10.10.193 <none> 9090:30003 /TCP 82m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps /calico-node 2 2 0 2 0 kubernetes.io /os =linux 29h daemonset.apps /kube-proxy 2 2 2 2 2 kubernetes.io /os =linux 30h daemonset.apps /node-exporter 2 2 2 2 2 <none> 158m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps /calico-kube-controllers 1 /1 1 1 29h deployment.apps /coredns 2 /2 2 2 30h deployment.apps /grafana-core 1 /1 1 1 73m deployment.apps /prometheus 1 /1 1 1 86m NAME DESIRED CURRENT READY AGE replicaset.apps /calico-kube-controllers-6949477b58 1 1 1 29h replicaset.apps /calico-kube-controllers-6dc8c99cbc 0 0 0 29h replicaset.apps /coredns-7f89b7bc75 2 2 2 30h replicaset.apps /grafana-core-6895c7468b 1 1 1 73m replicaset.apps /prometheus-68546b8d9 1 1 1 86m |
访问granfana,默认用户名,密码均为admin
添加数据源
导入面板
查看
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示