Prometheus监控应用
一、Prometheus PromQL语法
PromQL(Prometheus Query Language)是prometheus专有的数据查询语言(DSL),其提供了简洁且贴近自然语言的语法实现了时序数据的分析计算能力。PromQL表现力丰富,支持条件查询、操作符,并且内建了大量内置函数,可供客户端针对监控数据的各种维度进行查询。
1. 数据类型
PromQL 表达式计算出来的值有以下几种类型:
瞬时向量 (Instant vector): 一组时序,每个时序只有一个采样值
区间向量 (Range vector): 一组时序,每个时序包含一段时间内的多个采样值
标量数据 (Scalar): 一个浮点数
字符串 (String): 一个字符串,暂时未用
1)瞬时向量选择器
瞬时向量选择器用来选择一组时序在某个采样点的采样值。
最简单的情况就是指定一个度量指标,选择出所有属于该度量指标的时序的当前采样值。直接使用监控指标名称查询时,可以查询该指标下的所有时间序列。比如下面的表达式:
apiserver_request_total
通过执行后的结果可以看到,可以通过在后面添加用大括号包围起来的一组标签键值对来对时序进行过滤。
如:下面的表达式筛选出了 job 为 kubernetes-apiservers,resource为 pod,scope为cluster的时序:
apiserver_request_total{job="kubernetes-apiserver",resource="pods",scope="cluster"}
匹配标签值时可以是等于,也可以使用正则表达式。总共有下面几种匹配操作符:
=:完全相等
!=: 不相等
=~: 正则表达式匹配
!~: 正则表达式不匹配
如:下面表达式筛选出了container是kube-scheduler或kube-proxy或kube-apiserver的时序数据
container_processes{container=~"kube-scheduler|kube-proxy|kube-apiserver"}
2)区间向量选择器
区间向量选择器类似于瞬时向量选择器,不同的是它选择的是过去一段时间的采样值。可以通过在瞬时向量选择器后面添加包含在 [] 里的时长来得到区间向量选择器。
比如下面的表达式选出了所有度量指标为apiserver_request_total且resource是pod,scope是cluster的时序在过去1 分钟的采样值。
apiserver_request_total{job="kubernetes-apiserver",resource="pods",scope="cluster"}[1m]
注:这个不支持Graph,需要选择Console,才会看到采集的数据
选择Graph报错,报错截图如下:
说明:时长的单位可以是下面几种之一:
s:seconds
m:minutes
h:hours
d:days
w:weeks
y:years
3)偏移向量选择器
前面介绍的选择器默认都是以当前时间为基准时间,偏移修饰器用来调整基准时间,使其往前偏移一段时间。偏移修饰器紧跟在选择器后面,使用 offset 来指定要偏移的量。
如下面的表达式选择度量名称为apiserver_request_total的所有时序在 5 分钟前的采样值。
apiserver_request_total{job="kubernetes-apiserver",resource="pods"} offset 5m
下面的表达式选择apiserver_request_total 度量指标在 5小时前的这个时间点过去 5 分钟的采样值。
apiserver_request_total{job="kubernetes-apiserver",resource="pods"} [5m] offset 5h
4)聚合操作符
PromQL 的聚合操作符用来将向量里的元素聚合得更少。总共有下面这些聚合操作符:
sum:求和
min:最小值
max:最大值
avg:平均值
stddev:标准差
stdvar:方差
count:元素个数
count_values:等于某值的元素个数
bottomk:最小的 k 个元素
topk:最大的 k 个元素
quantile:分位数
如:计算k8s-node1节点所有容器总计内存
sum(container_memory_usage_bytes{instance=~"k8s-node1"})/1024/1024/1024
计算k8s-node1节点最近1m所有容器cpu使用率
sum (rate (container_cpu_usage_seconds_total{instance=~"k8s-node1"}[1m])) / sum (machine_cpu_cores{ instance =~"k8s-node1"}) * 100
计算最近1m所有容器cpu使用率
sum (rate (container_cpu_usage_seconds_total{id!="/"}[1m])) by (id)
5)函数
Prometheus 内置了一些函数来辅助计算,下面介绍一些典型的。
abs():绝对值
sqrt():平方根
exp():指数计算
ln():自然对数
ceil():向上取整
floor():向下取整
round():四舍五入取整
delta():计算区间向量里每一个时序第一个和最后一个的差值
sort():排序
2. 合法的PromQL表达式
所有的PromQL表达式都必须至少包含一个指标名称(例如http_request_total),或者一个不会匹配到空字符串的标签过滤器(例如{code="200"})。因此以下两种方式,均为合法的表达式:
http_request_total # 合法
http_request_total{} # 合法
{method="get"} # 合法
同时,除了使用<metric name>{label=value}的形式以外,还可以使用内置的__name__标签来指定监控指标名称:
{__name__=~"http_request_total"} # 合法
{__name__=~"node_disk_bytes_read|node_disk_bytes_written"} # 合法
二、Prometheus监控应用
1. promethues采集tomcat监控数据
tomcat_exporter地址:https://github.com/nlighten/tomcat_exporter
1)制作tomcat镜像
[root@k8s-master1 ~]# mkdir /root/tomcat_image [root@k8s-master1 ~]# cd tomcat_image/ [root@k8s-master1 tomcat_image]# cat >>Dockerfile <<_EOF_ > FROM tomcat:8.5-jdk8-corretto > ADD metrics.war /usr/local/tomcat/webapps/ > ADD simpleclient-0.8.0.jar /usr/local/tomcat/lib/ > ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/ > ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/ > ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/ > ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/ > > _EOF_ [root@k8s-master1 tomcat_image]# cat Dockerfile FROM tomcat:8.5-jdk8-corretto ADD metrics.war /usr/local/tomcat/webapps/ ADD simpleclient-0.8.0.jar /usr/local/tomcat/lib/ ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/ ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/ ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/ ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/ [root@k8s-master1 tomcat_image]# docker build -t='tomcat_prometheus:v1' . Sending build context to Docker daemon 130.6kB Step 1/7 : FROM tomcat:8.5-jdk8-corretto ---> ff29e39b049e Step 2/7 : ADD metrics.war /usr/local/tomcat/webapps/ ---> 835dedcabb25 Step 3/7 : ADD simpleclient-0.8.0.jar /usr/local/tomcat/lib/ ---> 16d967e2b311 Step 4/7 : ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/ ---> 9e71e96ffd2d Step 5/7 : ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/ ---> 8a13cfd15e70 Step 6/7 : ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/ ---> dc5ca2616b77 Step 7/7 : ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/ ---> 129787d128e3 Successfully built 129787d128e3 Successfully tagged tomcat_prometheus:v1 [root@k8s-master1 tomcat_image]# docker save -o tomcat_prometheus_v1.tar tomcat_prometheus:v1 You have new mail in /var/spool/mail/root [root@k8s-master1 tomcat_image]# scp tomcat_prometheus_v1.tar 10.0.0.132:/data/software/ tomcat_prometheus_v1.tar 100% 368MB 19.3MB/s 00:19 You have new mail in /var/spool/mail/root [root@k8s-master1 tomcat_image]# scp tomcat_prometheus_v1.tar 10.0.0.133:/data/software/ tomcat_prometheus_v1.tar 100% 368MB 31.4MB/s 00:11 #登录到k8s-node1节点 [root@k8s-node1 ~]# cd /data/software/ You have new mail in /var/spool/mail/root [root@k8s-node1 software]# docker load -i tomcat_prometheus_v1.tar 07d3193ef6f4: Loading layer [==================================================>] 7.168kB/7.168kB 9cd8df93bfa8: Loading layer [==================================================>] 63.49kB/63.49kB b82f66b159ef: Loading layer [==================================================>] 9.728kB/9.728kB 80951bbcff57: Loading layer [==================================================>] 25.6kB/25.6kB 96b8fa864f38: Loading layer [==================================================>] 10.75kB/10.75kB 8e3b1565e006: Loading layer [==================================================>] 23.55kB/23.55kB Loaded image: tomcat_prometheus:v1 #登录到k8s-node2节点 [root@k8s-node2 ~]# cd /data/software/ You have new mail in /var/spool/mail/root [root@k8s-node2 software]# docker load -i tomcat_prometheus_v1.tar 07d3193ef6f4: Loading layer [==================================================>] 7.168kB/7.168kB 9cd8df93bfa8: Loading layer [==================================================>] 63.49kB/63.49kB b82f66b159ef: Loading layer [==================================================>] 9.728kB/9.728kB 80951bbcff57: Loading layer [==================================================>] 25.6kB/25.6kB 96b8fa864f38: Loading layer [==================================================>] 10.75kB/10.75kB 8e3b1565e006: Loading layer [==================================================>] 23.55kB/23.55kB Loaded image: tomcat_prometheus:v1
2)基于上面的镜像创建一个tomcat实例
[root@k8s-master1 tomcat_image]# vim tomcat-deploy.yaml [root@k8s-master1 tomcat_image]# cat tomcat-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: tomcat-deploy namespace: default spec: selector: matchLabels: app: tomcat replicas: 2 # tells deployment to run 2 pods matching the template template: # create pods using pod definition in this template metadata: labels: app: tomcat annotations: prometheus.io/scrape: 'true' spec: containers: - name: tomcat image: tomcat_prometheus:v1 imagePullPolicy: IfNotPresent ports: - containerPort: 8080 securityContext: privileged: true [root@k8s-master1 tomcat_image]# kubectl apply -f tomcat-deploy.yaml deployment.apps/tomcat-deploy created [root@k8s-master1 tomcat_image]# kubectl get deployment tomcat-deploy -o wide NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR tomcat-deploy 2/2 2 2 34s tomcat tomcat_prometheus:v1 app=tomcat [root@k8s-master1 tomcat_image]# kubectl get pods -o wide -l app=tomcat NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tomcat-deploy-bd6d757c6-h5b6k 1/1 Running 0 110s 10.244.36.114 k8s-node1 <none> <none> tomcat-deploy-bd6d757c6-k6dwl 1/1 Running 0 110s 10.244.36.124 k8s-node1 <none> <none> [root@k8s-master1 tomcat_image]#
3)部署tomcat服务Service
[root@k8s-master1 tomcat_image]# vim tomcat-service.yaml You have new mail in /var/spool/mail/root [root@k8s-master1 tomcat_image]# cat tomcat-service.yaml kind: Service #service 类型 apiVersion: v1 metadata: annotations: prometheus.io/scrape: 'true' name: tomcat-service spec: selector: app: tomcat ports: - nodePort: 31360 port: 80 protocol: TCP targetPort: 8080 type: NodePort [root@k8s-master1 tomcat_image]# kubectl apply -f tomcat-service.yaml service/tomcat-service created [root@k8s-master1 tomcat_image]# kubectl get svc tomcat-service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE tomcat-service NodePort 10.105.237.65 <none> 80:31360/TCP 18s
4)查看监控数据
在prometheus Web UI界面上查看监控到的tomcat对应的两个pod的数据和service的数据:
2. promethues采集redis监控数据
1)配置一个Redis的exporter
其中,redis 这个 Pod 中包含了两个容器,一个就是 redis 本身的主应用,另外一个容器就是 redis_exporter
由于Redis服务的metrics接口在redis-exporter 9121上,所以添加了prometheus.io/port=9121这样的annotation,在prometheus就会自动发现redis了
[root@k8s-master1 ~]# mkdir redis [root@k8s-master1 ~]# cd redis/ [root@k8s-master1 redis]# vim redis.yaml You have new mail in /var/spool/mail/root [root@k8s-master1 redis]# cat redis.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis spec: replicas: 1 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: redis:latest imagePullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 6379 - name: redis-exporter image: oliver006/redis_exporter:latest imagePullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 100Mi ports: - containerPort: 9121 --- kind: Service apiVersion: v1 metadata: name: redis annotations: prometheus.io/scrape: "true" prometheus.io/port: "9121" spec: selector: app: redis ports: - name: redis port: 6379 targetPort: 6379 - name: prom port: 9121 targetPort: 9121 [root@k8s-master1 redis]# kubectl apply -f redis.yaml deployment.apps/redis created service/redis created You have new mail in /var/spool/mail/root [root@k8s-master1 redis]# kubectl get pods -o wide -l app=redis NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES redis-55c57445b4-8fd72 2/2 Running 0 24s 10.244.36.113 k8s-node1 <none> <none> [root@k8s-master1 redis]# kubectl get svc redis -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR redis ClusterIP 10.109.195.22 <none> 6379/TCP,9121/TCP 49s app=redis
2)查看监控数据
3)grafana导入redis监控指标
参考https://grafana.com/grafana/dashboards/?search=redis,下载需要的json文件
3. Prometheus监控mysql
1)安装mariadb
[root@k8s-master1 ~]# mkdir mysql [root@k8s-master1 ~]# cd mysql/ [root@k8s-master1 mysql]# yum install mariadb mariadb-server -y [root@k8s-master1 ~]# systemctl start mariadb [root@k8s-master1 ~]# systemctl status mariadb ● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled) Active: active (running) since Sun 2022-11-20 22:05:37 CST; 38s ago Process: 120748 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS) Process: 120611 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS) Main PID: 120747 (mysqld_safe) Memory: 100.8M CGroup: /system.slice/mariadb.service ├─120747 /bin/sh /usr/bin/mysqld_safe --basedir=/usr └─120935 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mari... Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: MySQL manual for more instructions. Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: Please report any problems at http://mariadb.org/jira Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: The latest information about MariaDB is available at http://mariadb.org/. Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: You can find additional information about the MySQL part at: Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: http://dev.mysql.com Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: Consider joining MariaDB's strong and vibrant community: Nov 20 22:05:35 k8s-master1 mariadb-prepare-db-dir[120611]: https://mariadb.org/get-involved/ Nov 20 22:05:35 k8s-master1 mysqld_safe[120747]: 221120 22:05:35 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'. Nov 20 22:05:35 k8s-master1 mysqld_safe[120747]: 221120 22:05:35 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql Nov 20 22:05:37 k8s-master1 systemd[1]: Started MariaDB database server. #初始化数据库 [root@k8s-master1 mysql]# mysql_secure_installation NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY! In order to log into MariaDB to secure it, we'll need the current password for the root user. If you've just installed MariaDB, and you haven't set the root password yet, the password will be blank, so you should just press enter here. Enter current password for root (enter for none): OK, successfully used password, moving on... Setting the root password ensures that nobody can log into the MariaDB root user without the proper authorisation. Set root password? [Y/n] New password: Re-enter new password: Password updated successfully! Reloading privilege tables.. ... Success! By default, a MariaDB installation has an anonymous user, allowing anyone to log into MariaDB without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? [Y/n] ... Success! Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? [Y/n] ... Success! By default, MariaDB comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? [Y/n] - Dropping test database... ... Success! - Removing privileges on test database... ... Success! Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? [Y/n] ... Success! Cleaning up... All done! If you've completed all of the above steps, your MariaDB installation should now be secure. Thanks for using MariaDB!
2)使用mysql_export应用程序
#提前上传mysqld_exporter-0.10.0.linux-amd64.tar.gz [root@k8s-master1 mysql]# ll total 3320 -rw-r--r-- 1 root root 3397781 Nov 20 21:55 mysqld_exporter-0.10.0.linux-amd64.tar.gz You have new mail in /var/spool/mail/root [root@k8s-master1 mysql]# tar -zxvf mysqld_exporter-0.10.0.linux-amd64.tar.gz mysqld_exporter-0.10.0.linux-amd64/ mysqld_exporter-0.10.0.linux-amd64/LICENSE mysqld_exporter-0.10.0.linux-amd64/NOTICE mysqld_exporter-0.10.0.linux-amd64/mysqld_exporter [root@k8s-master1 mysql]# cd mysqld_exporter-0.10.0.linux-amd64 [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# ll total 10192 -rw-rw-r-- 1 1000 1000 11325 Apr 25 2017 LICENSE -rwxr-xr-x 1 1000 1000 10419174 Apr 25 2017 mysqld_exporter -rw-rw-r-- 1 1000 1000 65 Apr 25 2017 NOTICE [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cp -ar mysqld_exporter /usr/local/bin/ [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# chmod +x /usr/local/bin/mysqld_exporter [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# which mysqld_exporter /usr/local/bin/mysqld_exporter
3)登陆mysql为mysql_exporter创建账号并授权
[root@k8s-master1 mysql]# mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 10 Server version: 5.5.68-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> CREATE USER 'mysql_exporter'@'localhost' IDENTIFIED BY 'Mysql@123'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_exporter'@'localhost'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> exit Bye
4)设置免密码连接数据库
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cat >my.cnf <<_EOF_ > [client] > user=mysql_exporter > password=Mysql@123 > > _EOF_ You have new mail in /var/spool/mail/root [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cat my.cnf [client] user=mysql_exporter password=Mysql@123
5)启动mysql_exporter客户端
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# nohup mysqld_exporter --config.my-cnf=./my.cnf & [1] 410 You have new mail in /var/spool/mail/root [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# nohup: ignoring input and appending output to ‘nohup.out’ [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# ss -lntup |grep 9104 tcp LISTEN 0 128 :::9104 :::* users:(("mysqld_exporter",pid=410,fd=3)) [root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]#
6)修改prometheus-alertmanager-cfg.yaml文件
[root@k8s-master1 mysqld_exporter-0.10.0.linux-amd64]# cd /root/prometheus/ [root@k8s-master1 prometheus]# vim prometheus-alertmanager-cfg.yaml
添加监控“mysql”的job
更新配置文件
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml configmap "prometheus-config" deleted [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml configmap/prometheus-config created [root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml deployment.apps "prometheus-server" deleted [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml deployment.apps/prometheus-server created [root@k8s-master1 prometheus]# kubectl get pods -n monitor-sa NAME READY STATUS RESTARTS AGE node-exporter-k4wsq 1/1 Running 5 7d12h node-exporter-x84r5 1/1 Running 5 7d12h node-exporter-zrwvh 1/1 Running 5 7d12h prometheus-server-646bf944c6-pbdbv 2/2 Running 0 95s
7)查看监控数据
8)grafana导入mysql监控图表
发现Buffer Pool Size of Total RAM 这个panel 显示 No data,这是因为(mysql_global_variables_innodb_buffer_pool_size{instance="$host"} * 100) / on (instance) node_memory_MemTotal_bytes{instance="$host"} 这个公式中用到的mysql_global_variables_innodb_buffer_pool_size 和 node_memory_MemTotal_bytes 两个收集值中默认的instance值不一致,前者instance值是主机名带有端口号,而后一个值只是主机名。因此,需要添加标签,使得二者的instance值保持一致。
重新更新配置清单文件:
[root@k8s-master1 prometheus]# vim prometheus-alertmanager-cfg.yaml You have new mail in /var/spool/mail/root [root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml configmap "prometheus-config" deleted [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml configmap/prometheus-config created [root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml deployment.apps "prometheus-server" deleted [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml deployment.apps/prometheus-server created
编辑granfan界面正确的表达式值,是其显示正常
4. Prometheus监控Nginx
在k8s-node1节点上安装nginx,nginx中的vts模块是非常好用的一款监控模块,能清晰的观测到服务器当下状态。
监控Nginx主要用到以下三个模块:
(1)nginx-module-vts:Nginx virtual host traffic status module,Nginx的监控模块,能够提供JSON格式的数据产出。
(2)nginx-vts-exporter:Simple server that scrapes Nginx vts stats and exports them via HTTP for Prometheus consumption。主要用于收集Nginx的监控数据,并给Prometheus提供监控接口,默认端口号9913。
(3)Prometheus:监控Nginx-vts-exporter提供的Nginx数据,并存储在时序数据库中,可以使用PromQL对时序数据进行查询和聚合
1)使用nginx-module-vts模块
[root@k8s-node1 ~]# mkdir nginx [root@k8s-node1 ~]# cd nginx/ [root@k8s-node1 nginx]# ll total 4596 -rw-r--r-- 1 root root 1026732 Nov 21 22:02 nginx-1.15.7.tar.gz -rw-r--r-- 1 root root 407765 Nov 21 22:02 nginx-module-vts-master.zip -rw-r--r-- 1 root root 3264895 Nov 21 22:02 nginx-vts-exporter-0.5.zip You have new mail in /var/spool/mail/root [root@k8s-node1 nginx]# unzip nginx-module-vts-master.zip [root@k8s-node1 nginx]# ll total 4596 -rw-r--r-- 1 root root 1026732 Nov 21 22:02 nginx-1.15.7.tar.gz drwxr-xr-x 6 root root 112 Jul 12 2018 nginx-module-vts-master -rw-r--r-- 1 root root 407765 Nov 21 22:02 nginx-module-vts-master.zip -rw-r--r-- 1 root root 3264895 Nov 21 22:02 nginx-vts-exporter-0.5.zip [root@k8s-node1 nginx]# mv nginx-module-vts-master /usr/local/
2)安装nginx
(1)安装依赖包
[root@k8s-node1 nginx]# yum -y install gcc gcc-c++ pcre pcre-devel zlib zlib-devel openssl openssl-devel
(2)安装编译
[root@k8s-node1 nginx]# tar -zxvf nginx-1.15.7.tar.gz [root@k8s-node1 nginx]# cd nginx-1.15.7 [root@k8s-node1 nginx-1.15.7]# ./configure --prefix=/usr/local/nginx --with-http_gzip_static_module --with-http_stub_status_module --with-http_ssl_module --with-pcre --with-file-aio --with-http_realip_module --add-module=/usr/local/nginx-module-vts-master ........ #检测执行命令是否正确 [root@k8s-node1 nginx-1.15.7]# echo $? 0 [root@k8s-node1 nginx-1.15.7]# make && make install [root@k8s-node1 nginx-1.15.7]# echo $? 0
3)修改nginx配置文件
[root@k8s-node1 nginx-1.15.7]# cp /usr/local/nginx/conf/nginx.conf /usr/local/nginx/conf/nginx.conf.bak [root@k8s-node1 nginx-1.15.7]# vim /usr/local/nginx/conf/nginx.conf #server下添加如下: location /status { vhost_traffic_status_display; vhost_traffic_status_display_format html; } #http中添加如下: vhost_traffic_status_zone;
检测配置文件是否修改正确:
[root@k8s-node1 nginx-1.15.7]# /usr/local/nginx/sbin/nginx -t nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful [root@k8s-node1 nginx-1.15.7]#
4)启动nginx服务
[root@k8s-node1 nginx-1.15.7]# /usr/local/nginx/sbin/nginx You have new mail in /var/spool/mail/root
5)查看nginx监控数据
在浏览器上访问:http://10.0.0.132/status,结果如下:
其中,监控列表各项信息:
Server main 主服务器
**Host:**主机名
**Version:**版本号
**Uptime:**服务器运行时间
Connections active:当前客户端的连接数 reading:读取客户端连接的总数 writing:写入客户端连接的总数
Requsts accepted:接收客户端的连接总数 handled:已处理客户端的连接总数 Total:请求总数 Req/s:每秒请求的数量
Shared memory:共享内存 name:配置中指定的共享内存名称 maxSize:配置中指定的共享内存的最大限制 usedSize:共享内存的当前大小 usedNode:共享内存中当前使用的节点数
Server zones 服务器区域
zone:当前区域
Requests Total:请求总数 Req/s:每秒请求数 time:时间
Responses:状态码数量 1xx、2xx、3xx、4xx、5xx:表示响应不同状态码数量 Total:响应状态码的总数
Traffic表示流量 Sent:发送的流量 Rcvd:接收的流量 Sent/s:每秒发送的流量 Rcvd/s:每秒接收的流量
Cache表示缓存 Miss:未命中的缓存数 Bypass:避开的缓存数 Expirde:过期的缓存数 Stale:生效的缓存数 Updating:缓存更新的次数 Revalidated:重新验证的缓存书 Hit:缓存命中数 Scarce:未达缓存要求的请求次数Total:总数
6)安装nginx-vts-exporter
[root@k8s-node1 nginx-1.15.7]# cd .. [root@k8s-node1 nginx]# ll total 4596 drwxr-xr-x 9 1001 1001 186 Nov 21 22:15 nginx-1.15.7 -rw-r--r-- 1 root root 1026732 Nov 21 22:02 nginx-1.15.7.tar.gz -rw-r--r-- 1 root root 407765 Nov 21 22:02 nginx-module-vts-master.zip -rw-r--r-- 1 root root 3264895 Nov 21 22:02 nginx-vts-exporter-0.5.zip [root@k8s-node1 nginx]# unzip nginx-vts-exporter-0.5.zip .... [root@k8s-node1 nginx]# mv nginx-vts-exporter-0.5 /usr/local/ [root@k8s-node1 nginx]# chmod +x /usr/local/nginx-vts-exporter-0.5/bin/nginx-vts-exporter [root@k8s-node1 nginx]# cd /usr/local/nginx-vts-exporter-0.5/bin [root@k8s-node1 bin]# nohup ./nginx-vts-exporter -nginx.scrape_uri http://10.0.0.132/status/format/json & [1] 129188 You have new mail in /var/spool/mail/root [root@k8s-node1 bin]# nohup: ignoring input and appending output to ‘nohup.out’ [root@k8s-node1 bin]# ss -lntup |grep nginx-vts-expor tcp LISTEN 0 128 :::9913 :::* users:(("nginx-vts-expor",pid=129188,fd=3)) [root@k8s-node1 bin]#
7)修改prometheus-alertmanager-cfg.yaml配置文件
添加如下job:
[root@k8s-master1 prometheus]# vim prometheus-alertmanager-cfg.yaml #添加如下job - job_name: 'nginx' scrape_interval: 5s static_configs: - targets: ['10.0.0.132:9913']
更新资源清单文件
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml configmap "prometheus-config" deleted You have new mail in /var/spool/mail/root [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml configmap/prometheus-config created [root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml deployment.apps "prometheus-server" deleted [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml deployment.apps/prometheus-server created [root@k8s-master1 prometheus]# kubectl get pods -n monitor-sa -o wide |grep prometheus prometheus-server-646bf944c6-s5klp 2/2 Running 0 33s 10.244.169.188 k8s-node2 <none> <none>
8)在Prometheus Web UI界面上查看监控数据
9)在grafana界面导入nginx监控数据
三、Prometheus组件Pushgateway
1. Pushgateway简介
Pushgateway是prometheus的一个组件,prometheus server默认是通过exporter主动获取数据(默认采取pull拉取数据),pushgateway则是通过被动方式推送数据到prometheus server,用户可以写一些自定义的监控脚本把需要监控的数据发送给pushgateway, 然后pushgateway再把数据发送给Prometheus server。
2. Pushgateway优缺点
优点:Prometheus 默认采用定时pull 模式拉取targets数据,但是如果不在一个子网或者防火墙,prometheus就拉取不到targets数据,所以可以采用各个target往pushgateway上push数据,然后prometheus去pushgateway上定时pull数据。
在监控业务数据的时候,需要将不同数据汇总, 汇总之后的数据可以由pushgateway统一收集,然后由 Prometheus 统一拉取。
缺点:1)Prometheus拉取状态只针对 pushgateway, 不能对每个节点都有效;
2)Pushgateway出现问题,整个采集到的数据都会出现问题
3)监控下线,prometheus还会拉取到旧的监控数据,需要手动清理 pushgateway不要的数据
3. 安装Pushgateway
在k8s-node1节点上安装pushgateway
[root@k8s-node1 ~]# docker run -d --name pushgateway -p 9091:9091 prom/pushgateway:latest 7e1e72b18a1ec1e447bda4a5e0f5e28b44dd673b4e234a68b5f1c947f3501057 You have new mail in /var/spool/mail/root [root@k8s-node1 ~]# docker ps -a |grep pushgateway 7e1e72b18a1e prom/pushgateway:latest "/bin/pushgateway" 13 seconds ago Up 10 seconds 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp pushgateway
4. Pushgateway的WebUI界面
在浏览器上输入:http://10.0.0.132:9091,然后点击status按钮,显示如下界面:
5. prometheus监控pushgateway
修改prometheus-alertmanager-cfg.yaml文件,添加以下内容:
- job_name: 'pushgateway' scrape_interval: 5s static_configs: - targets: ['10.0.0.132:9091'] honor_labels: true
更新配置文件:
[root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-cfg.yaml configmap "prometheus-config" deleted You have new mail in /var/spool/mail/root [root@k8s-master1 prometheus]# kubectl delete -f prometheus-alertmanager-deploy.yaml deployment.apps "prometheus-server" deleted [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-cfg.yaml configmap/prometheus-config created [root@k8s-master1 prometheus]# kubectl apply -f prometheus-alertmanager-deploy.yaml deployment.apps/prometheus-server created [root@k8s-master1 prometheus]# kubectl get pods -n monitor-sa NAME READY STATUS RESTARTS AGE node-exporter-k4wsq 1/1 Running 8 14d node-exporter-x84r5 1/1 Running 8 14d node-exporter-zrwvh 1/1 Running 9 14d prometheus-server-646bf944c6-k8mcz 2/2 Running 0 33s
在Prometheus Web UI界面上查看pushgateway
6. 推送指定的数据格式到pushgateway
1)添加单条数据
向 {job="test_job"} 添加单条数据:
[root@k8s-master1 prometheus]# echo " metric 3.6" | curl --data-binary @- http://10.0.0.132:9091/metrics/job/test_job You have new mail in /var/spool/mail/root [root@k8s-master1 prometheus]#
其中:--data-binary 表示发送二进制数据,注意:它是使用POST方式发送的!
查看pushgateway Web Ui界面上是否有数据:
在Prometheus Web UI界面上查看pushgateway添加的数据指标
2)添加复杂数据
[root@k8s-master1 prometheus]# cat <<EOF | curl --data-binary @- http://10.0.0.132:9091/metrics/job/test_job/instance/test_instance > #TYPE node_memory_usage gauge > node_memory_usage 36 > # TYPE memory_total gauge > node_memory_total 36000 > EOF
查看pushgateway Web Ui界面上是否有数据:
查看Prometheus Web UI界面上pushgateway数据:
3)删除pushgateway某个组下某个实例的所有数据
[root@k8s-master1 prometheus]# curl -X DELETE http://10.0.0.132:9091/metrics/job/test_job/instance/test_instance You have new mail in /var/spool/mail/root [root@k8s-master1 prometheus]#
查看pushgateway web ui界面数据是否删除,可以看到instance=test_instance数据已删除
4)删除pushgateway某个组下的所有数据
[root@k8s-master1 prometheus]# curl -X DELETE http://10.0.0.132:9091/metrics/job/test_job You have new mail in /var/spool/mail/root [root@k8s-master1 prometheus]#
查看pushgateway web ui界面数据是否删除,数据均已删除,恢复到初始状态
5)通过脚本将监控的数据上报pushgateway
在被监控服务所在的机器配置数据上报,想要把10.0.0.132这个机器的内存数据上报到pushgateway,下面步骤需要在10.0.0.132操作
[root@k8s-node1 ~]# mkdir monitor-data You have new mail in /var/spool/mail/root [root@k8s-node1 ~]# cd monitor-data/ [root@k8s-node1 monitor-data]# vim push.sh You have new mail in /var/spool/mail/root [root@k8s-node1 monitor-data]# cat push.sh node_memory_usages=$(free -m | grep Mem | awk '{print $3/$2*100}') job_name="memory" instance_name="k8s-node1" cat <<EOF | curl --data-binary @- http://10.0.0.132:9091/metrics/job/$job_name/instance/$instance_name #TYPE node_memory_usages gauge node_memory_usages $node_memory_usages EOF [root@k8s-node1 monitor-data]# sh push.sh You have new mail in /var/spool/mail/root
打开pushgateway web ui界面,可看到如下
打开prometheus ui界面,可看到如下node_memory_usages的metrics指标
其中,如果需要定时上报监控数据,可以设置计划任务
注意:从上面配置可以看到,上传到pushgateway中的数据有job也有instance,而prometheus配置pushgateway这个job_name中也有job和instance,这个job和instance是指pushgateway实例本身,添加 honor_labels: true 参数, 可以避免promethues的targets列表中的job_name是pushgateway的 job 、instance 和上报到pushgateway数据的job和instance冲突。