Prometheus监控应用

一、Prometheus PromQL语法

  PromQL(Prometheus Query Language)是prometheus专有的数据查询语言(DSL),其提供了简洁且贴近自然语言的语法实现了时序数据的分析计算能力。PromQL表现力丰富,支持条件查询、操作符,并且内建了大量内置函数,可供客户端针对监控数据的各种维度进行查询。  

1. 数据类型

  PromQL 表达式计算出来的值有以下几种类型:

瞬时向量 (Instant vector): 一组时序,每个时序只有一个采样值

区间向量 (Range vector): 一组时序,每个时序包含一段时间内的多个采样值

标量数据 (Scalar): 一个浮点数

字符串 (String): 一个字符串,暂时未用

1)瞬时向量选择器

  瞬时向量选择器用来选择一组时序在某个采样点的采样值。
最简单的情况就是指定一个度量指标,选择出所有属于该度量指标的时序的当前采样值。直接使用监控指标名称查询时,可以查询该指标下的所有时间序列。比如下面的表达式:
  apiserver_request_total

  通过执行后的结果可以看到,可以通过在后面添加用大括号包围起来的一组标签键值对来对时序进行过滤。

如:下面的表达式筛选出了 job 为 kubernetes-apiservers,resource为 pod,scope为cluster的时序:

  apiserver_request_total{job="kubernetes-apiserver",resource="pods",scope="cluster"}

  匹配标签值时可以是等于,也可以使用正则表达式。总共有下面几种匹配操作符:

=:完全相等

!=: 不相等

=~: 正则表达式匹配

!~: 正则表达式不匹配

如:下面表达式筛选出了container是kube-scheduler或kube-proxy或kube-apiserver的时序数据
  container_processes{container=~"kube-scheduler|kube-proxy|kube-apiserver"}

2)区间向量选择器

  区间向量选择器类似于瞬时向量选择器,不同的是它选择的是过去一段时间的采样值。可以通过在瞬时向量选择器后面添加包含在 [] 里的时长来得到区间向量选择器。

比如下面的表达式选出了所有度量指标为apiserver_request_total且resource是pod,scope是cluster的时序在过去1 分钟的采样值。

  apiserver_request_total{job="kubernetes-apiserver",resource="pods",scope="cluster"}[1m]

  注:这个不支持Graph,需要选择Console,才会看到采集的数据

  选择Graph报错,报错截图如下:

  说明:时长的单位可以是下面几种之一:
s:seconds
m:minutes
h:hours
d:days
w:weeks
y:years

3)偏移向量选择器

  前面介绍的选择器默认都是以当前时间为基准时间,偏移修饰器用来调整基准时间,使其往前偏移一段时间。偏移修饰器紧跟在选择器后面,使用 offset 来指定要偏移的量。

如下面的表达式选择度量名称为apiserver_request_total的所有时序在 5 分钟前的采样值。
  apiserver_request_total{job="kubernetes-apiserver",resource="pods"} offset 5m

下面的表达式选择apiserver_request_total 度量指标在 5小时前的这个时间点过去 5 分钟的采样值。
  apiserver_request_total{job="kubernetes-apiserver",resource="pods"} [5m] offset 5h

4)聚合操作符

  PromQL 的聚合操作符用来将向量里的元素聚合得更少。总共有下面这些聚合操作符:

sum:求和
min:最小值
max:最大值
avg:平均值
stddev:标准差
stdvar:方差
count:元素个数
count_values:等于某值的元素个数
bottomk:最小的 k 个元素
topk:最大的 k 个元素
quantile:分位数

如:计算k8s-node1节点所有容器总计内存

sum(container_memory_usage_bytes{instance=~"k8s-node1"})/1024/1024/1024

      计算k8s-node1节点最近1m所有容器cpu使用率

sum (rate (container_cpu_usage_seconds_total{instance=~"k8s-node1"}[1m])) / sum (machine_cpu_cores{ instance =~"k8s-node1"}) * 100

     计算最近1m所有容器cpu使用率

sum (rate (container_cpu_usage_seconds_total{id!="/"}[1m])) by (id)

5)函数

  Prometheus 内置了一些函数来辅助计算,下面介绍一些典型的。
abs():绝对值
sqrt():平方根
exp():指数计算
ln():自然对数
ceil():向上取整
floor():向下取整
round():四舍五入取整
delta():计算区间向量里每一个时序第一个和最后一个的差值
sort():排序

2. 合法的PromQL表达式

  所有的PromQL表达式都必须至少包含一个指标名称(例如http_request_total),或者一个不会匹配到空字符串的标签过滤器(例如{code="200"})。因此以下两种方式,均为合法的表达式:
http_request_total # 合法

http_request_total{} # 合法

{method="get"} # 合法

  同时,除了使用<metric name>{label=value}的形式以外,还可以使用内置的__name__标签来指定监控指标名称:

{__name__=~"http_request_total"} # 合法

{__name__=~"node_disk_bytes_read|node_disk_bytes_written"} # 合法

二、Prometheus监控应用

1. promethues采集tomcat监控数据

  tomcat_exporter地址:https://github.com/nlighten/tomcat_exporter

1)制作tomcat镜像

[root@k8s-master1 ~]# mkdir /root/tomcat_image
[root@k8s-master1 ~]# cd tomcat_image/
[root@k8s-master1 tomcat_image]# cat >>Dockerfile <<_EOF_
> FROM tomcat:8.5-jdk8-corretto
> ADD metrics.war /usr/local/tomcat/webapps/
> ADD simpleclient-0.8.0.jar  /usr/local/tomcat/lib/
> ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/
> ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/
> ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/
> ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/
>
> _EOF_
[root@k8s-master1 tomcat_image]# cat Dockerfile
FROM tomcat:8.5-jdk8-corretto
ADD metrics.war /usr/local/tomcat/webapps/
ADD simpleclient-0.8.0.jar  /usr/local/tomcat/lib/
ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/
ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/
ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/
ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/
[root@k8s-master1 tomcat_image]# docker build -t='tomcat_prometheus:v1' .
Sending build context to Docker daemon  130.6kB
Step 1/7 : FROM tomcat:8.5-jdk8-corretto
 ---> ff29e39b049e
Step 2/7 : ADD metrics.war /usr/local/tomcat/webapps/
 ---> 835dedcabb25
Step 3/7 : ADD simpleclient-0.8.0.jar  /usr/local/tomcat/lib/
 ---> 16d967e2b311
Step 4/7 : ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/
 ---> 9e71e96ffd2d
Step 5/7 : ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/
 ---> 8a13cfd15e70
Step 6/7 : ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/
 ---> dc5ca2616b77
Step 7/7 : ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/
 ---> 129787d128e3
Successfully built 129787d128e3
Successfully tagged tomcat_prometheus:v1
[root@k8s-master1 tomcat_image]# docker save -o tomcat_prometheus_v1.tar tomcat_prometheus:v1
You have new mail in /var/spool/mail/root
[root@k8s-master1 tomcat_image]# scp tomcat_prometheus_v1.tar 10.0.0.132:/data/software/
tomcat_prometheus_v1.tar                                                                                                  100%  368MB  19.3MB/s   00:19
You have new mail in /var/spool/mail/root
[root@k8s-master1 tomcat_image]# scp tomcat_prometheus_v1.tar 10.0.0.133:/data/software/
tomcat_prometheus_v1.tar                                                                                                  100%  368MB  31.4MB/s   00:11
#登录到k8s-node1节点
[root@k8s-node1 ~]# cd /data/software/
You have new mail in /var/spool/mail/root
[root@k8s-node1 software]# docker load -i tomcat_prometheus_v1.tar
07d3193ef6f4: Loading layer [==================================================>]  7.168kB/7.168kB
9cd8df93bfa8: Loading layer [==================================================>]  63.49kB/63.49kB
b82f66b159ef: Loading layer [==================================================>]  9.728kB/9.728kB
80951bbcff57: Loading layer [==================================================>]   25.6kB/25.6kB
96b8fa864f38: Loading layer [==================================================>]  10.75kB/10.75kB
8e3b1565e006: Loading layer [==================================================>]  23.55kB/23.55kB
Loaded image: tomcat_prometheus:v1
#登录到k8s-node2节点
[root@k8s-node2 ~]# cd /data/software/
You have new mail in /var/spool/mail/root
[root@k8s-node2 software]# docker load -i tomcat_prometheus_v1.tar
07d3193ef6f4: Loading layer [==================================================>]  7.168kB/7.168kB
9cd8df93bfa8: Loading layer [==================================================>]  63.49kB/63.49kB
b82f66b159ef: Loading layer [==================================================>]  9.728kB/9.728kB
80951bbcff57: Loading layer [==================================================>]   25.6kB/25.6kB
96b8fa864f38: Loading layer [==================================================>]  10.75kB/10.75kB
8e3b1565e006: Loading layer [==================================================>]  23.55kB/23.55kB
Loaded image: tomcat_prometheus:v1

2)基于上面的镜像创建一个tomcat实例

[root@k8s-master1 tomcat_image]# vim tomcat-deploy.yaml
[root@k8s-master1 tomcat_image]# cat tomcat-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tomcat-deploy
  namespace: default
spec:
  selector:
    matchLabels:
     app: tomcat
  replicas: 2 # tells deployment to run 2 pods matching the template
  template: # create pods using pod definition in this template
    metadata:
      labels:
        app: tomcat
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      containers:
      - name: tomcat
        image: tomcat_prometheus:v1
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
        securityContext:
          privileged: true
[root@k8s-master1 tomcat_image]# kubectl apply -f tomcat-deploy.yaml
deployment.apps/tomcat-deploy created
[root@k8s-master1 tomcat_image]# kubectl get deployment tomcat-deploy -o wide
NAME            READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                 SELECTOR
tomcat-deploy   2/2     2            2           34s   tomcat       tomcat_prometheus:v1   app=tomcat
[root@k8s-master1 tomcat_image]# kubectl get pods -o wide -l app=tomcat
NAME                            READY   STATUS    RESTARTS   AGE    IP              NODE        NOMINATED NODE   READINESS GATES
tomcat-deploy-bd6d757c6-h5b6k   1/1     Running   0          110s   10.244.36.114   k8s-node1   <none>           <none>
tomcat-deploy-bd6d757c6-k6dwl   1/1     Running   0          110s   10.244.36.124   k8s-node1   <none>           <none>
[root@k8s-master1 tomcat_image]#

3)部署tomcat服务Service

[root@k8s-master1 tomcat_image]# vim tomcat-service.yaml
You have new mail in /var/spool/mail/root
[root@k8s-master1 tomcat_image]# cat tomcat-service.yaml
kind: Service  #service 类型
apiVersion: v1
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: tomcat-service
spec:
  selector:
    app: tomcat
  ports:
  - nodePort: 31360
    port: 80
    protocol: TCP
    targetPort: 8080
  type: NodePort
[root@k8s-master1 tomcat_image]# kubectl apply -f tomcat-service.yaml
service/tomcat-service created
[root@k8s-master1 tomcat_image]# kubectl get svc tomcat-service
NAME             TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
tomcat-service   NodePort   10.105.237.65   <none>        80:31360/TCP   18s

4)查看监控数据

  在prometheus Web UI界面上查看监控到的tomcat对应的两个pod的数据和service的数据:

2. promethues采集redis监控数据

1)配置一个Redis的exporter

其中,redis 这个 Pod 中包含了两个容器,一个就是 redis 本身的主应用,另外一个容器就是 redis_exporter
      由于Redis服务的metrics接口在redis-exporter 9121上,所以添加了prometheus.io/port=9121这样的annotation,在prometheus就会自动发现redis了

[root@k8s-master1 ~]# mkdir  redis
[root@k8s-master1 ~]# cd redis/
[root@k8s-master1 redis]# vim redis.yaml
You have new mail in /var/spool/mail/root
[root@k8s-master1 redis]# cat redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:latest
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6379
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 9121
---
kind: Service
apiVersion: v1
metadata:
  name: redis
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9121"
spec:
  selector:
    app: redis
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
  - name: prom
    port: 9121
    targetPort: 9121
[root@k8s-master1 redis]# kubectl apply -f redis.yaml
deployment.apps/redis created
service/redis created
You have new mail in /var/spool/mail/root
[root@k8s-master1 redis]# kubectl get pods -o wide -l app=redis
NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE        NOMINATED NODE   READINESS GATES
redis-55c57445b4-8fd72   2/2     Running   0          24s   10.244.36.113   k8s-node1   <none>           <none>
[root@k8s-master1 redis]# kubectl get svc redis -o wide
NAME    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE   SELECTOR
redis   ClusterIP   10.109.195.22   <none>        6379/TCP,9121/TCP   49s   app=redis

2)查看监控数据

3)grafana导入redis监控指标

  参考https://grafana.com/grafana/dashboards/?search=redis,下载需要的json文件

3. Prometheus监控mysql

1)安装mariadb

[root@k8s-master1 ~]# mkdir mysql
[root@k8s-master1 ~]# cd mysql/
[root@k8s-master1 mysql]# yum install mariadb mariadb-server -y
[root@k8s-master1 ~]# systemctl start mariadb
[root@k8s-master1 ~]# systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2022-11-20 22:05:37 CST; 38s ago
  Process: 120748 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)
  Process: 120611 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
 Main PID: 120747 (mysqld_safe)
   Memory: 100.8M
   CGroup: /system.slice/mariadb.service
           ├─120747 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
           └─120935 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mari...

Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: MySQL manual for more instructions.
Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: Please report any problems at http://mariadb.org/jira
Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: The latest information about MariaDB is available at http://mariadb.org/.
Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: You can find additional information about the MySQL part at:
Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: http://dev.mysql.com
Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: Consider joining MariaDB's strong and vibrant community:
Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: https://mariadb.org/get-involved/
Nov 20 22:05:35 k8s-master1 mysqld_safe[120747]: 221120 22:05:35 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'.
Nov 20 22:05:35 k8s-master1 mysqld_safe[120747]: 221120 22:05:35 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Nov 20 22:05:37 k8s-master1 systemd[1]: Started MariaDB database server.
#初始化数据库
[root@k8s-master1 mysql]# mysql_secure_installation

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!

In order to log into MariaDB to secure it, we'll need the current
password for the root user.  If you've just installed MariaDB, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none):
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MariaDB
root user without the proper authorisation.

Set root password? [Y/n]
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
 ... Success!


By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them.  This is intended only for testing, and to make the installation
go a bit smoother.  You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n]
 ... Success!

Normally, root should only be allowed to connect from 'localhost'.  This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n]
 ... Success!

By default, MariaDB comes with a database named 'test' that anyone can
access.  This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n]
 - Dropping test database...
 ... Success!
 - Removing privileges on test database...
 ... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n]
 ... Success!

Cleaning up...

All done!  If you've completed all of the above steps, your MariaDB
installation should now be secure.

Thanks for using MariaDB!

2)使用mysql_export应用程序

#提前上传mysqld_exporter-0.10.0.linux-amd64.tar.gz
[root@k8s-master1 mysql]# ll
total 3320
-rw-r--r-- 1 root root 3397781 Nov 20 21:55 mysqld_exporter-0.10.0.linux-amd64.tar.gz
You have new mail in /var/spool/mail/root
[root@k8s-master1 mysql]# tar -zxvf mysqld_exporter-0.10.0.linux-amd64.tar.gz
mysqld_exporter-0.10.0.linux-amd64/
mysqld_exporter-0.10.0.linux-amd64/LICENSE
mysqld_exporter-0.10.0.linux-amd64/NOTICE
mysqld_exporter-0.10.0.linux-amd64/mysqld_exporter
[root@k8s-master1 mysql]# cd mysqld_exporter-0.10.0.linux-amd64
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# ll
total 10192
-rw-rw-r-- 1 1000 1000    11325 Apr 25  2017 LICENSE
-rwxr-xr-x 1 1000 1000 10419174 Apr 25  2017 mysqld_exporter
-rw-rw-r-- 1 1000 1000       65 Apr 25  2017 NOTICE
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cp -ar mysqld_exporter /usr/local/bin/
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# chmod +x /usr/local/bin/mysqld_exporter
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# which mysqld_exporter
/usr/local/bin/mysqld_exporter

3)登陆mysql为mysql_exporter创建账号并授权

[root@k8s-master1 mysql]# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 5.5.68-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE USER 'mysql_exporter'@'localhost' IDENTIFIED BY 'Mysql@123';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_exporter'@'localhost';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> exit
Bye

4)设置免密码连接数据库

[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cat >my.cnf <<_EOF_
> [client]
> user=mysql_exporter
> password=Mysql@123
>
> _EOF_
You have new mail in /var/spool/mail/root
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cat my.cnf
[client]
user=mysql_exporter
password=Mysql@123

5)启动mysql_exporter客户端

[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# nohup mysqld_exporter --config.my-cnf=./my.cnf &
[1] 410
You have new mail in /var/spool/mail/root
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# nohup: ignoring input and appending output to ‘nohup.out’

[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# ss -lntup |grep 9104
tcp    LISTEN     0      128      :::9104                 :::*                   users:(("mysqld_exporter",pid=410,fd=3))
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]#

6)修改prometheus-alertmanager-cfg.yaml文件

[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cd /root/prometheus/
[root@k8s-master1 prometheus]# vim prometheus-alertmanager-cfg.yaml

  添加监控“mysql”的job

  更新配置文件

[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml
configmap "prometheus-config" deleted
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml
configmap/prometheus-config created
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml
deployment.apps "prometheus-server" deleted
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml
deployment.apps/prometheus-server created
[root@k8s-master1 prometheus]# kubectl get pods -n monitor-sa
NAME                                 READY   STATUS    RESTARTS   AGE
node-exporter-k4wsq                  1/1     Running   5          7d12h
node-exporter-x84r5                  1/1     Running   5          7d12h
node-exporter-zrwvh                  1/1     Running   5          7d12h
prometheus-server-646bf944c6-pbdbv   2/2     Running   0          95s

7)查看监控数据

8)grafana导入mysql监控图表

  发现Buffer Pool Size of Total RAM 这个panel 显示 No data,这是因为(mysql_global_variables_innodb_buffer_pool_size{instance="$host"} * 100) / on (instance) node_memory_MemTotal_bytes{instance="$host"} 这个公式中用到的mysql_global_variables_innodb_buffer_pool_size 和 node_memory_MemTotal_bytes 两个收集值中默认的instance值不一致,前者instance值是主机名带有端口号,而后一个值只是主机名。因此,需要添加标签,使得二者的instance值保持一致。

  重新更新配置清单文件:

[root@k8s-master1 prometheus]# vim prometheus-alertmanager-cfg.yaml
You have new mail in /var/spool/mail/root
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml
configmap "prometheus-config" deleted
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml
configmap/prometheus-config created
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml
deployment.apps "prometheus-server" deleted
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml
deployment.apps/prometheus-server created

  编辑granfan界面正确的表达式值,是其显示正常

4. Prometheus监控Nginx

  在k8s-node1节点上安装nginx,nginx中的vts模块是非常好用的一款监控模块,能清晰的观测到服务器当下状态。

  监控Nginx主要用到以下三个模块:

(1)nginx-module-vts:Nginx virtual host traffic status module,Nginx的监控模块,能够提供JSON格式的数据产出。

(2)nginx-vts-exporter:Simple server that scrapes Nginx vts stats and exports them via HTTP for Prometheus consumption。主要用于收集Nginx的监控数据,并给Prometheus提供监控接口,默认端口号9913。

(3)Prometheus:监控Nginx-vts-exporter提供的Nginx数据,并存储在时序数据库中,可以使用PromQL对时序数据进行查询和聚合

1)使用nginx-module-vts模块

[root@k8s-node1 ~]# mkdir nginx
[root@k8s-node1 ~]# cd nginx/
[root@k8s-node1 nginx]# ll
total 4596
-rw-r--r-- 1 root root 1026732 Nov 21 22:02 nginx-1.15.7.tar.gz
-rw-r--r-- 1 root root  407765 Nov 21 22:02 nginx-module-vts-master.zip
-rw-r--r-- 1 root root 3264895 Nov 21 22:02 nginx-vts-exporter-0.5.zip
You have new mail in /var/spool/mail/root
[root@k8s-node1 nginx]# unzip nginx-module-vts-master.zip
[root@k8s-node1 nginx]# ll
total 4596
-rw-r--r-- 1 root root 1026732 Nov 21 22:02 nginx-1.15.7.tar.gz
drwxr-xr-x 6 root root     112 Jul 12  2018 nginx-module-vts-master
-rw-r--r-- 1 root root  407765 Nov 21 22:02 nginx-module-vts-master.zip
-rw-r--r-- 1 root root 3264895 Nov 21 22:02 nginx-vts-exporter-0.5.zip
[root@k8s-node1 nginx]# mv nginx-module-vts-master /usr/local/

2)安装nginx

(1)安装依赖包
[root@k8s-node1 nginx]# yum -y install gcc gcc-c++ pcre pcre-devel zlib zlib-devel openssl openssl-devel
(2)安装编译
[root@k8s-node1 nginx]# tar -zxvf nginx-1.15.7.tar.gz
[root@k8s-node1 nginx]# cd nginx-1.15.7
[root@k8s-node1 nginx-1.15.7]# ./configure  --prefix=/usr/local/nginx --with-http_gzip_static_module --with-http_stub_status_module --with-http_ssl_module --with-pcre --with-file-aio --with-http_realip_module --add-module=/usr/local/nginx-module-vts-master
........
#检测执行命令是否正确
[root@k8s-node1 nginx-1.15.7]# echo $?
0
[root@k8s-node1 nginx-1.15.7]# make && make install
[root@k8s-node1 nginx-1.15.7]# echo $?
0 

3)修改nginx配置文件

[root@k8s-node1 nginx-1.15.7]# cp /usr/local/nginx/conf/nginx.conf /usr/local/nginx/conf/nginx.conf.bak
[root@k8s-node1 nginx-1.15.7]# vim /usr/local/nginx/conf/nginx.conf
#server下添加如下:
location /status {
        vhost_traffic_status_display;
        vhost_traffic_status_display_format html;
        }
#http中添加如下:
vhost_traffic_status_zone;

  检测配置文件是否修改正确:

[root@k8s-node1 nginx-1.15.7]# /usr/local/nginx/sbin/nginx -t
nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful
[root@k8s-node1 nginx-1.15.7]#

4)启动nginx服务

[root@k8s-node1 nginx-1.15.7]# /usr/local/nginx/sbin/nginx
You have new mail in /var/spool/mail/root

5)查看nginx监控数据

  在浏览器上访问:http://10.0.0.132/status,结果如下:

其中,监控列表各项信息:

  Server main 主服务器

**Host:**主机名

**Version:**版本号

**Uptime:**服务器运行时间

Connections active:当前客户端的连接数 reading:读取客户端连接的总数 writing:写入客户端连接的总数

Requsts accepted:接收客户端的连接总数 handled:已处理客户端的连接总数 Total:请求总数 Req/s:每秒请求的数量

Shared memory:共享内存 name:配置中指定的共享内存名称 maxSize:配置中指定的共享内存的最大限制 usedSize:共享内存的当前大小 usedNode:共享内存中当前使用的节点数

  Server zones 服务器区域

zone:当前区域

Requests Total:请求总数 Req/s:每秒请求数 time:时间

Responses:状态码数量 1xx、2xx、3xx、4xx、5xx:表示响应不同状态码数量 Total:响应状态码的总数

Traffic表示流量 Sent:发送的流量 Rcvd:接收的流量 Sent/s:每秒发送的流量 Rcvd/s:每秒接收的流量

Cache表示缓存 Miss:未命中的缓存数 Bypass:避开的缓存数 Expirde:过期的缓存数 Stale:生效的缓存数 Updating:缓存更新的次数 Revalidated:重新验证的缓存书 Hit:缓存命中数 Scarce:未达缓存要求的请求次数Total:总数

6)安装nginx-vts-exporter

[root@k8s-node1 nginx-1.15.7]# cd ..
[root@k8s-node1 nginx]# ll
total 4596
drwxr-xr-x 9 1001 1001     186 Nov 21 22:15 nginx-1.15.7
-rw-r--r-- 1 root root 1026732 Nov 21 22:02 nginx-1.15.7.tar.gz
-rw-r--r-- 1 root root  407765 Nov 21 22:02 nginx-module-vts-master.zip
-rw-r--r-- 1 root root 3264895 Nov 21 22:02 nginx-vts-exporter-0.5.zip
[root@k8s-node1 nginx]# unzip nginx-vts-exporter-0.5.zip
....
[root@k8s-node1 nginx]# mv nginx-vts-exporter-0.5  /usr/local/
[root@k8s-node1 nginx]# chmod +x /usr/local/nginx-vts-exporter-0.5/bin/nginx-vts-exporter
[root@k8s-node1 nginx]# cd /usr/local/nginx-vts-exporter-0.5/bin
[root@k8s-node1 bin]# nohup ./nginx-vts-exporter  -nginx.scrape_uri http://10.0.0.132/status/format/json &
[1] 129188
You have new mail in /var/spool/mail/root
[root@k8s-node1 bin]# nohup: ignoring input and appending output to ‘nohup.out’

[root@k8s-node1 bin]# ss -lntup |grep nginx-vts-expor
tcp    LISTEN     0      128      :::9913                 :::*                   users:(("nginx-vts-expor",pid=129188,fd=3))
[root@k8s-node1 bin]#

7)修改prometheus-alertmanager-cfg.yaml配置文件

  添加如下job:

[root@k8s-master1 prometheus]# vim prometheus-alertmanager-cfg.yaml
#添加如下job
  - job_name: 'nginx'
    scrape_interval: 5s
    static_configs:
    - targets: ['10.0.0.132:9913']

  更新资源清单文件

[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml
configmap "prometheus-config" deleted
You have new mail in /var/spool/mail/root
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml
configmap/prometheus-config created
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml
deployment.apps "prometheus-server" deleted
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml
deployment.apps/prometheus-server created
[root@k8s-master1 prometheus]# kubectl get pods -n monitor-sa -o wide |grep prometheus
prometheus-server-646bf944c6-s5klp   2/2     Running   0          33s   10.244.169.188   k8s-node2     <none>           <none>

8)在Prometheus Web UI界面上查看监控数据

9)在grafana界面导入nginx监控数据

  

三、Prometheus组件Pushgateway

1. Pushgateway简介

  Pushgateway是prometheus的一个组件,prometheus server默认是通过exporter主动获取数据(默认采取pull拉取数据),pushgateway则是通过被动方式推送数据到prometheus server,用户可以写一些自定义的监控脚本把需要监控的数据发送给pushgateway, 然后pushgateway再把数据发送给Prometheus server。

2. Pushgateway优缺点

优点:Prometheus 默认采用定时pull 模式拉取targets数据,但是如果不在一个子网或者防火墙,prometheus就拉取不到targets数据,所以可以采用各个target往pushgateway上push数据,然后prometheus去pushgateway上定时pull数据。

     在监控业务数据的时候,需要将不同数据汇总, 汇总之后的数据可以由pushgateway统一收集,然后由 Prometheus 统一拉取。

缺点:1)Prometheus拉取状态只针对 pushgateway, 不能对每个节点都有效;

   2)Pushgateway出现问题,整个采集到的数据都会出现问题

   3)监控下线,prometheus还会拉取到旧的监控数据,需要手动清理 pushgateway不要的数据

3. 安装Pushgateway

  在k8s-node1节点上安装pushgateway

[root@k8s-node1 ~]# docker run -d --name pushgateway -p 9091:9091 prom/pushgateway:latest
7e1e72b18a1ec1e447bda4a5e0f5e28b44dd673b4e234a68b5f1c947f3501057
You have new mail in /var/spool/mail/root
[root@k8s-node1 ~]# docker ps -a |grep pushgateway
7e1e72b18a1e   prom/pushgateway:latest                             "/bin/pushgateway"       13 seconds ago   Up 10 seconds               0.0.0.0:9091->9091/tcp, :::9091->9091/tcp   pushgateway

4. Pushgateway的WebUI界面

  在浏览器上输入:http://10.0.0.132:9091,然后点击status按钮,显示如下界面:

5. prometheus监控pushgateway

  修改prometheus-alertmanager-cfg.yaml文件,添加以下内容:

- job_name: 'pushgateway'
      scrape_interval: 5s
      static_configs:
      - targets: ['10.0.0.132:9091']
  honor_labels: true

  更新配置文件:

[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml
configmap "prometheus-config" deleted
You have new mail in /var/spool/mail/root
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml
deployment.apps "prometheus-server" deleted
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml
configmap/prometheus-config created
[root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml
deployment.apps/prometheus-server created
[root@k8s-master1 prometheus]# kubectl get pods -n monitor-sa
NAME                                 READY   STATUS    RESTARTS   AGE
node-exporter-k4wsq                  1/1     Running   8          14d
node-exporter-x84r5                  1/1     Running   8          14d
node-exporter-zrwvh                  1/1     Running   9          14d
prometheus-server-646bf944c6-k8mcz   2/2     Running   0          33s

  在Prometheus Web UI界面上查看pushgateway

6. 推送指定的数据格式到pushgateway

1)添加单条数据

  向 {job="test_job"} 添加单条数据:

[root@k8s-master1 prometheus]# echo " metric 3.6" | curl --data-binary @- http://10.0.0.132:9091/metrics/job/test_job
You have new mail in /var/spool/mail/root
[root@k8s-master1 prometheus]#

其中:--data-binary 表示发送二进制数据,注意:它是使用POST方式发送的!

  查看pushgateway Web Ui界面上是否有数据:

  在Prometheus Web UI界面上查看pushgateway添加的数据指标

2)添加复杂数据

[root@k8s-master1 prometheus]# cat <<EOF | curl --data-binary @- http://10.0.0.132:9091/metrics/job/test_job/instance/test_instance
> #TYPE node_memory_usage gauge
> node_memory_usage 36
> # TYPE memory_total gauge
> node_memory_total 36000
> EOF

  查看pushgateway Web Ui界面上是否有数据:

  查看Prometheus Web UI界面上pushgateway数据:

3)删除pushgateway某个组下某个实例的所有数据

[root@k8s-master1 prometheus]# curl -X DELETE http://10.0.0.132:9091/metrics/job/test_job/instance/test_instance
You have new mail in /var/spool/mail/root
[root@k8s-master1 prometheus]#

  查看pushgateway web ui界面数据是否删除,可以看到instance=test_instance数据已删除

 

4)删除pushgateway某个组下的所有数据

[root@k8s-master1 prometheus]# curl -X DELETE http://10.0.0.132:9091/metrics/job/test_job
You have new mail in /var/spool/mail/root
[root@k8s-master1 prometheus]#

  查看pushgateway web ui界面数据是否删除,数据均已删除,恢复到初始状态

5)通过脚本将监控的数据上报pushgateway

  在被监控服务所在的机器配置数据上报,想要把10.0.0.132这个机器的内存数据上报到pushgateway,下面步骤需要在10.0.0.132操作

[root@k8s-node1 ~]# mkdir monitor-data
You have new mail in /var/spool/mail/root
[root@k8s-node1 ~]# cd monitor-data/
[root@k8s-node1 monitor-data]# vim push.sh
You have new mail in /var/spool/mail/root
[root@k8s-node1 monitor-data]# cat push.sh
node_memory_usages=$(free -m | grep Mem | awk '{print $3/$2*100}')
job_name="memory"
instance_name="k8s-node1"
cat <<EOF | curl --data-binary @- http://10.0.0.132:9091/metrics/job/$job_name/instance/$instance_name
#TYPE node_memory_usages  gauge
node_memory_usages $node_memory_usages
EOF
[root@k8s-node1 monitor-data]# sh push.sh
You have new mail in /var/spool/mail/root

  打开pushgateway web ui界面,可看到如下

  打开prometheus ui界面,可看到如下node_memory_usages的metrics指标

其中,如果需要定时上报监控数据,可以设置计划任务

  注意:从上面配置可以看到,上传到pushgateway中的数据有job也有instance,而prometheus配置pushgateway这个job_name中也有job和instance,这个job和instance是指pushgateway实例本身,添加 honor_labels: true 参数, 可以避免promethues的targets列表中的job_name是pushgateway的 job 、instance 和上报到pushgateway数据的job和instance冲突。

posted @ 2022-11-27 22:32  出水芙蓉·薇薇  阅读(643)  评论(0编辑  收藏  举报