云原生监控系统Prometheus——Exporter(Linux主机监控)
Exporter(Linux主机监控)
由于 Linux 操作系统自身并不支持 Prometheus,所以 Prometheus 官方提供了 Go 语言编写的 Node exporter 来实现对 linux 操作系统主机的监控数据采集。它提供了系统内几乎所有的标准指标,如 CPU、内存、磁盘空间、磁盘I/O、系统负载和网络带宽。另外它还提供了由内核公开的大量额外监控指标,从负载平均到主板温度等。
在安装之前,首先在官方下载页面 https://github.com/prometheus/node_exporter/releases 找到最新 Node exporter 版本,下载最新版本中特定平台的二进制文件,如下:
一、部署 Node exporter
我这里都是kubernetes环境,就不讲二进制部署了。以下是 DaemonSet 运行的 Node export 配置清单 yaml 文件,Node exporter 版本:node-exporter:v1.3.1:
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: node-exporter
namespace: kubesphere-monitoring-system
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.3.1
annotations:
deprecated.daemonset.template.generation: '1'
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"exporter","app.kubernetes.io/name":"node-exporter","app.kubernetes.io/part-of":"kube-prometheus","app.kubernetes.io/version":"1.3.1"},"name":"node-exporter","namespace":"kubesphere-monitoring-system"},"spec":{"selector":{"matchLabels":{"app.kubernetes.io/component":"exporter","app.kubernetes.io/name":"node-exporter","app.kubernetes.io/part-of":"kube-prometheus"}},"template":{"metadata":{"labels":{"app.kubernetes.io/component":"exporter","app.kubernetes.io/name":"node-exporter","app.kubernetes.io/part-of":"kube-prometheus","app.kubernetes.io/version":"1.3.1"}},"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"node-role.kubernetes.io/edge","operator":"DoesNotExist"}]}]}}},"containers":[{"args":["--web.listen-address=127.0.0.1:9100","--path.procfs=/host/proc","--path.sysfs=/host/sys","--path.rootfs=/host/root","--no-collector.wifi","--no-collector.hwmon","--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)","--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$"],"image":"registry.cn-beijing.aliyuncs.com/kubesphereio/node-exporter:v1.3.1","name":"node-exporter","resources":{"limits":{"cpu":1,"memory":"500Mi"},"requests":{"cpu":"102m","memory":"180Mi"}},"volumeMounts":[{"mountPath":"/host/proc","name":"proc","readOnly":true},{"mountPath":"/host/sys","name":"sys","readOnly":true},{"mountPath":"/host/root","mountPropagation":"HostToContainer","name":"root","readOnly":true}]},{"args":["--logtostderr","--secure-listen-address=[$(IP)]:9100","--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256","--upstream=http://127.0.0.1:9100/"],"env":[{"name":"IP","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}}],"image":"registry.cn-beijing.aliyuncs.com/kubesphereio/kube-rbac-proxy:v0.11.0","name":"kube-rbac-proxy","ports":[{"containerPort":9100,"hostPort":9100,"name":"https"}],"resources":{"limits":{"cpu":1,"memory":"100Mi"},"requests":{"cpu":"10m","memory":"20Mi"}},"securityContext":{"runAsGroup":65532,"runAsNonRoot":true,"runAsUser":65532}}],"hostNetwork":true,"hostPID":true,"nodeSelector":{"kubernetes.io/os":"linux"},"securityContext":{"runAsNonRoot":true,"runAsUser":65534},"serviceAccountName":"node-exporter","tolerations":[{"operator":"Exists"}],"volumes":[{"hostPath":{"path":"/proc"},"name":"proc"},{"hostPath":{"path":"/sys"},"name":"sys"},{"hostPath":{"path":"/"},"name":"root"}]}}}}
spec:
selector:
matchLabels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.3.1
spec:
volumes:
- name: proc
hostPath:
path: /proc
type: ''
- name: sys
hostPath:
path: /sys
type: ''
- name: root
hostPath:
path: /
type: ''
containers:
- name: node-exporter
image: 'registry.cn-beijing.aliyuncs.com/kubesphereio/node-exporter:v1.3.1'
args:
- '--web.listen-address=127.0.0.1:9100'
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/host/root'
- '--no-collector.wifi'
- '--no-collector.hwmon'
- >-
--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- >-
--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
resources:
limits:
cpu: '1'
memory: 500Mi
requests:
cpu: 102m
memory: 180Mi
volumeMounts:
- name: proc
readOnly: true
mountPath: /host/proc
- name: sys
readOnly: true
mountPath: /host/sys
- name: root
readOnly: true
mountPath: /host/root
mountPropagation: HostToContainer
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
- name: kube-rbac-proxy
image: >-
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-rbac-proxy:v0.11.0
args:
- '--logtostderr'
- '--secure-listen-address=[$(IP)]:9100'
- >-
--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
- '--upstream=http://127.0.0.1:9100/'
ports:
- name: https
hostPort: 9100
containerPort: 9100
protocol: TCP
env:
- name: IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
resources:
limits:
cpu: '1'
memory: 100Mi
requests:
cpu: 10m
memory: 20Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 65532
runAsGroup: 65532
runAsNonRoot: true
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: node-exporter
serviceAccount: node-exporter
hostNetwork: true
hostPID: true
securityContext:
runAsUser: 65534
runAsNonRoot: true
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/edge
operator: DoesNotExist
schedulerName: default-scheduler
tolerations:
- operator: Exists
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
revisionHistoryLimit: 10
二、与 Prometheus 集成
Node exporter 和 Prometheus 启动后,没有经过配置文件配置,他们还是没有进行对接关联,此时,两个程序是各自独立运行的应用程序。
现在需要将已部署好的 node_exporter 添加到 Prometheus 服务器中。在 Prometheus 主机目录中,找到主配置文件,使用其中的静态配置功能 static_configs 来采集 node_exporter 提供的数据。
~ # cat /etc/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
~ #
在默认的配置文件的基础上,重新编辑 /etc/prometheus/prometheus.yaml 文件,添加 job 与 node_exporter 进行关联的参考配置文件内容如下:
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.2.121:9100"]
配置完成后,需要我们重新启动 prometheus 或 进行动态热加载操作,使操作修改后的配置文件加载生效。
- 1、首先,可以在 Prometheus UI 首页点开 "status" 中的 "Targets",如下图所示:
- 2、进入 Targets 页面后,可以在列表中看到刚才配置好的 node_exporter 的状态为 "UP",说明 Prometheus 最后一次从 Node exporter 中采集数据是成功的,此刻被监控的服务器主机工作状态是正常的,如下图所示:
- 3、我们也可以在 Prometheus UI 提供的 graph 页面,在搜索框中查找,这里就不说了。
- 4、metrics 查看:
- CPU 数据采集。
- 内存数据采集:数据源来源于 /proc/meminfo 文件。
- 磁盘数据采集:数据来源于 /proc/diskstats 文件。
- 文件系统数据采集。
- 网络数据采集。