老杨监控 k8s部署 prometheus

一、打造基于Prometheus的全方位监控平台

1.1、前言

官网地址：https://prometheus.io/docs/prometheus/latest/getting_started/

灵活的时间序列数据库；
定制各式各样的监控规则；
Prometheus的开发人员和用户社区非常活跃；
独立的开源项目，不依赖于任何公司；
继Kurberntes之后第二个入驻的项目；

1.2、prometheus架构

Prometheus 的工作原理主要分为五个步骤： 1. 数据采集（Exporters）：Prometheus 定期通过HTTP请求从目标资源中拉取数据。目标资源可以是应用程序、系统、服务或其他资源。

数据存储（Storage）：Prometheus 将采集到的数据存储在本地存储引擎中。存储引擎以时间序列方式存储数据，其中每个时间序列都由指标名称和一组键值对组成。
数据聚合（PromQL）：Prometheus 通过查询表达式聚合数据。PromQL 是 Prometheus 的查询语言，它允许用户通过查询表达式从存储引擎中检索指标的特定信息。
告警处理（Alertmanager）：Prometheus 可以根据用户指定的规则对数据进行警报。当指标的值超出特定阈值时，Prometheus 向 Alertmanager 发送警报。Alertmanager 可以帮助用户对警报进行分组、消除和路由，并将警报发送到相应的接收器，例如邮件、企微、钉钉等。
数据大盘（Grafana）：帮助用户通过可视化方式展示 Prometheus 的数据，包括仪表盘、图表、日志和警报等。

1.3、prometheus时间序列数据

1.3.1、什么是序列数据？

时间序列数据（TimeSeries Data）：按照时间顺序记录系统、设备状态变化的数据被称为时序数据。

1.3.2、时间序列数据特点

性能好：关系型数据库对于大规模数据的处理性能糟糕。NOSQL 可以比较好的处理大规模数据，依然比不上时间序列数据库。
存储成本低：高效的压缩算法，节省存储空间，有效降低 IO。

官方数据：Prometheus 有着非常高效的时间序列数据存储方法，每个采样数据仅仅占用 3.5byte 左右空间，上百万条时间序列，30 秒间隔，保留 60 天，大概200多G。

1.3.3、Promethues适合场景

Prometheus 非常适合记录任何纯数字时间序列。它既适合以机器为中心的监控，也适合监控高度动态的面向服务的体系架构。

二、部署配置

整个监控体系涉及的技术栈较多，几乎可覆盖真实企业中的所有场景。主要技术栈如下：

Prometheus：监控主服务
node-exporter：数据采集器
kube-state-metrics：数据采集器
metrics-server：数据采集器
Consul：自动发现
blackbox：黑盒拨测
Alertmanager：监控告警服务
Grafana：数据展示服务
prometheusAlert：告警消息转发服务

2.1、Prometheus部署

部署对外可访问Prometheus:

首先需要创建Prometheus所在命名空间；
然后创建Prometheus使用的RBAC规则；
创建Prometheus的configmap来保存配置文件；
创建service暴露Prometheus服务；
创建deployment部署Prometheus容器；
最后创建Ingress实现外部域名访问Prometheus。

2.1.1、创建命名空间

 kubectl create namespace monitor

2.1.2、创建RBAC规则

创建RBAC规则，包含ServiceAccount、ClusterRole、ClusterRoleBinding三类YAML文件。

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources: ["nodes","nodes/proxy","services","endpoints","pods"]
  verbs: ["get", "list", "watch"] 
- apiGroups: ["extensions"]
  resources: ["ingress"]
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef: 
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitor

确认验证：

kubectl get sa prometheus -n monitor
kubectl get clusterrole prometheus
kubectl get clusterrolebinding prometheus

2.1.3、创建ConfigMap类型的Prometheus配置文件

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitor
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s
      evaluation_interval: 15s
      external_labels:
        cluster: "kubernetes"
        
    ############ 数据采集job ###################
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets: ['127.0.0.1:9090']
        labels:
          instance: prometheus
    ############ 指定告警规则文件路径位置 ###################
    rule_files:
    - /etc/prometheus/rules/*.rules

确认验证：

kubectl get cm prometheus-config -n monitor

2.1.4、创建ConfigMap类型的prometheus rules配置文件

使用ConfigMap方式创建prometheus rules配置文件:

包含的内容是两块，分别是general.rules和node.rules。使用以下命令创建Prometheus的另外两个配置文件：

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rules
  namespace: monitor
data:
  general.rules: |
    groups:
    - name: general.rules
      rules:
      - alert: InstanceDown
        expr: |
          up{job=~"k8s-nodes|prometheus"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} 停止工作"
          description: "{{ $labels.instance }} 主机名：{{ $labels.hostname }} 已经停止1分钟以上."
  node.rules: |
    groups:
    - name: node.rules
      rules:
      - alert: NodeFilesystemUsage
        expr: |
          100 - (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
          description: "{{ $labels.instance }} 主机名：{{ $labels.hostname }} : {{ $labels.mountpoint }} 分区使用大于85% (当前值: {{ $value }})"

确认验证：

kubectl get cm -n monitor prometheus-rules

2.1.5、创建prometheus svc

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitor
  labels:
    k8s-app: prometheus
spec:
  type: ClusterIP
  ports:
  - name: http
    port: 9090
    targetPort: 9090
  selector:
    k8s-app: prometheus

2.1.6、创建prometheus deploy

由于Prometheus需要对数据进行持久化，以便在重启后能够恢复历史数据。所以这边我们通过早先课程部署的NFS做存储来实现持久化。

当前我们使用NFS提供的StorageClass来做数据存储。

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-data-pvc
  namespace: monitor
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: "nfs-storage"
  resources:
    requests:
      storage: 10Gi

Prometheus控制器文件：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitor
  labels:
    k8s-app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: prometheus
  template:
    metadata:
      labels:
        k8s-app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:v2.36.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 9090
        securityContext:
          runAsUser: 65534
          privileged: true
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--web.enable-lifecycle"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention.time=10d"
        - "--web.console.libraries=/etc/prometheus/console_libraries"
        - "--web.console.templates=/etc/prometheus/consoles"
        resources:
          limits:
            cpu: 2000m
            memory: 2048Mi
          requests:
            cpu: 1000m
            memory: 512Mi
        readinessProbe:
          httpGet:
            path: /-/ready
            port: 9090
          initialDelaySeconds: 5
          timeoutSeconds: 10
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: 9090
          initialDelaySeconds: 30
          timeoutSeconds: 30
        volumeMounts:
        - name: data
          mountPath: /prometheus
          subPath: prometheus
        - name: config
          mountPath: /etc/prometheus
        - name: prometheus-rules
          mountPath: /etc/prometheus/rules
      - name: configmap-reload
        image: jimmidyson/configmap-reload:v0.5.0
        imagePullPolicy: IfNotPresent
        args:
        - "--volume-dir=/etc/config"
        - "--webhook-url=http://localhost:9090/-/reload"
        resources:
          limits:
            cpu: 100m
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 10Mi
        volumeMounts:
        - name: config
          mountPath: /etc/config
          readOnly: true
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: prometheus-data-pvc
      - name: prometheus-rules
        configMap:
          name: prometheus-rules
      - name: config
        configMap:
          name: prometheus-config

部署的 Deployment 资源文件中的 containers 部分配置了两个容器，分别是:

prometheus: Prometheus 容器是主容器，用于运行 Prometheus 进程。
configmap-reload: 用于监听指定的 ConfigMap 文件中的内容，如果内容发生更改，则执行 webhook url 请求，因为 Prometheus 支持通过接口重新加载配置文件，所以这里使用这个容器提供的机制来完成 Prometheus ConfigMap 配置文件内容一有更改，就执行 Prometheus 的 /-/reload 接口，进行更新配置操作。

上面资源文件中 Prometheus 参数说明:

--web.enable-lifecycle: 启用 Prometheus 用于重新加载配置的 /-/reload 接口
--config.file: 指定 Prometheus 配置文件所在地址，这个地址是相对于容器内部而言的
--storage.tsdb.path: 指定 Prometheus 数据存储目录地址，这个地址是相对于容器而言的
--storage.tsdb.retention.time: 指定删除旧数据的时间，默认为 15d
--web.console.libraries: 指定控制台组件依赖的存储路径
--web.console.templates: 指定控制台模板的存储路径

确认验证：

 kubectl get deploy  -n monitor
 kubectl get pods -n monitor

2.1.7、创建prometheus ingress实现外部域名访问

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: monitor
  name: prometheus-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: prometheus.kubernets.cn
    http:
      paths:
        - pathType: Prefix
          backend:
            service:
              name: prometheus
              port:
                number: 9090
          path: /

访问验证：

# curl prometheus.kubernets.cn
<a href="/graph">Found</a>.

三、初识Prometheus监控平台

prometheus监控平台：

Graph：用于绘制图表，可以选择不同的时间范围、指标和标签，还可以添加多个图表进行比较。
Alert：用于设置告警规则，当指标达到设定的阈值时，会发送告警通知。
Explore：用于查询和浏览指标数据，可以通过查询表达式或者标签过滤器来查找数据。
Status：用于查看prometheus的状态信息，包括当前的targets、rules、alerts等。
Config：用于编辑prometheus的配置文件，可以添加、修改和删除配置项。

四、总结

全面的监控：Prometheus可以监控各种数据源，比如服务器、容器等，还支持度量数据和日志数据等多种类型的监控。

支持动态服务发现：Prometheus可以自动地发现并监控正在运行的服务，从而避免手动配置。（后续课程会介绍到）
灵活的告警机制：Prometheus支持可配置的告警规则，可以根据不同的情况发出不同的告警信息，并且可以通过API通知其他服务。（后续课程会介绍到）
多维数据模型：Prometheus的数据模型支持多维度的数据，可以使用标准的PromQL查询语言对数据进行分析和展示。
高效的存储：Prometheus使用自己的时间序列数据库存储数据，采用一种基于时间的存储方式，可以高效地处理大量数据。

posted @ 2024-06-03 11:29 不会跳舞的胖子阅读(272) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· K8S-Promethues+Grafana集群性能监控平台搭建

· K8S-kubernetes结合alertmanager实现报警通知及基于haproxy_exporter监控haproxy

· k8s集群部署prometheus

· 基于Prometheus智能化监控告警系统

· 深入理解Prometheus: Kubernetes环境中的监控实践

公告

昵称：不会跳舞的胖子
园龄： 4年5个月
粉丝： 11
关注： 6

+加关注

2025年3月

日

一

二

三

四

五

六

不会跳舞的胖子

老杨监控 k8s部署 prometheus

1.1、前言

1.2、prometheus架构

1.3、prometheus时间序列数据

1.3.1、什么是序列数据？

1.3.2、时间序列数据特点

1.3.3、Promethues适合场景

二、部署配置

2.1、Prometheus部署

2.1.1、创建命名空间

2.1.2、创建RBAC规则

2.1.3、创建ConfigMap类型的Prometheus配置文件

2.1.4、创建ConfigMap类型的prometheus rules配置文件

2.1.5、创建prometheus svc

2.1.6、创建prometheus deploy

2.1.7、创建prometheus ingress实现外部域名访问

三、初识Prometheus监控平台

四、总结

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

推荐排行榜

不会跳舞的胖子

老杨 监控 k8s部署 prometheus

1.1、前言

1.2、prometheus架构

1.3、prometheus时间序列数据

1.3.1、什么是序列数据？

1.3.2、时间序列数据特点

1.3.3、Promethues适合场景

二、部署配置

2.1、Prometheus部署

2.1.1、创建命名空间

2.1.2、创建RBAC规则

2.1.3、创建ConfigMap类型的Prometheus配置文件

2.1.4、创建ConfigMap类型的prometheus rules配置文件

2.1.5、创建prometheus svc

2.1.6、创建prometheus deploy

2.1.7、创建prometheus ingress实现外部域名访问

三、初识Prometheus监控平台

四、总结

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

推荐排行榜

老杨监控 k8s部署 prometheus