普罗米修斯

普罗米修斯使用详解

参考链接：https://blog.csdn.net/weixin_43865008/article/details/118946362

prometheus 监控原理

1、prometheus ：虽然说是监控平台，但是实际上是一套数据库和数据的调度指令
2、mysql_exporter: 可以理解成程序或者软件，他是工作在我们要监控的目标服务器上，主要是用于监控mysql的数据。
3、node_exporter: 他的作用主要是收集性能测试的数据，如cpu、内存磁盘网络等信息，然后将数据保存到prometheus，相当于将数据存入到数据库中。
4、prometheus 只能用于做数据存储，不能做展示，因此我们需要用到grafana组件。
5、grafana 主要是用于数据展示，并且可以做到定时读取数据

prometheus server

下载、安装并运行普罗米修斯。您还将下载并安装exporter，这是一种在主机和服务上公开时间序列数据的工具。我们的第一个出口商将是普罗米修斯本身，它提供了关于内存使用、垃圾收集等多种主机级指标。

下載

官网指南：https://prometheus.io/docs/introduction/first_steps/

配置

示例配置文件中有三个配置块:global、rule_files和scrape_configs。

rule_files块指定了我们希望Prometheus服务器加载的任何规则的位置。现在我们没有规则。最后一个块scrape_configs控制Prometheus监视的资源。由于Prometheus也将自己的数据作为HTTP端点公开，因此它可以收集并监控自己的健康状况。在默认配置中，有一个名为prometheus的作业，它将刮除由prometheus服务器公开的时间序列数据。该作业包含一个静态配置的目标，即端口909e上的本地主机。Prometheus希望度量标准能够在/度量的道路上为目标提供。所以这个默认的工作是通过URL抓取:

配置文件

# my global config global块控制Prometheus服务器的全局配置。我们现在有两个选择。全局设置是每15秒刮一次。
global:
  scrape_interval: 15s # 第一个是scrape_interval，它控制普罗米修斯收集目标的频率。您可以针对单个目标重写此设置。
  evaluation_interval: 15s # evaluation_interval选项控制了Prometheus计算规则的频率。Prometheus使用规则来创建新的时间序列并生成警报。
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["localhost:9090"]

  # 设置node
  - job_name: "node"
    static_configs:
     - targets: ['localhost:9091']

  # GPU 节点
  - job_name: "gpu"
    static_configs:
     - targets: ['localhost:9445']

--storage.tsdb.retention # 更改数据保存时间默认15天

运行

./prometheus --config.file=prometheus.yml

自帶的界面张这样，后面会改grefana

命令檢索

promhttp_metric_handler_requests_total
promhttp_metric_handler_requests_total{code="200"}
count(promhttp_metric_handler_requests_total)

检索语言表达式：

https://prometheus.io/docs/prometheus/latest/querying/basics/

面板模式

rate(promhttp_metric_handler_requests_total{code="200"}[1m])

检测其他目标

node_exporter 监控节点

作用

用来接受服务器的信息

安装

nohup ./node_exporter --web.listen-address=":9091" & # 默认9100

测试

curl http://localhost:9091/metrics

接入prometheus

更改prometheus 配置文件

vim prometheus.yml
# 加入节点
global:
  scrape_interval: 15s

scrape_configs:
加入
- job_name: “node”
  static_configs:
  - targets: ['localhost:9091']

再登入看看

通过 Prometheus 表达式浏览器探索 Node Exporter 指标

现在 Prometheus 正在从正在运行的 Node Exporter 实例中抓取指标，您可以使用 Prometheus UI（又名表达式浏览器）探索这些指标。在浏览器中导航到localhost:9090/graph并使用页面顶部的主表达式栏输入表达式。表达式栏如下所示：

特定于节点导出器的指标以和为前缀并node_包括指标。node_cpu_seconds_total``node_exporter_build_info

单击下面的链接以查看一些示例指标：

公制	意义
`rate(node_cpu_seconds_total{mode="system"}[1m\])`	过去一分钟内每秒在系统模式下花费的平均 CPU 时间（以秒为单位）
`node_filesystem_avail_bytes`	非 root 用户可用的文件系统空间（以字节为单位）
`rate(node_network_receive_bytes_total[1m\])`	过去一分钟内每秒接收的平均网络流量（以字节为单位）

GPU 监控节点

nvidia_gpu_prometheus_exporter

安装编译（需要go编译环境）

go get github.com/mindprince/nvidia_gpu_prometheus_exporter

没有安装go 环境先安装

sudo snap install go         # version 1.17.8, or
sudo apt  install golang-go
sudo apt  install gccgo-go

下载好的文件在一般在 ~/go/bin/ 目录中

运行

nohup ./nvidia_gpu_prometheus_exporter &

配置

普罗米修斯配置文件中加入

  # GPU 节点
  - job_name: "gpu"
    static_configs:
     - targets: ['localhost:9445']

可视化面板 grafana

安装

https://grafana.com/grafana/download

sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_8.4.4_amd64.deb
sudo dpkg -i grafana-enterprise_8.4.4_amd64.deb

安装完成显示

Adding new user `grafana' (UID 111) with group `grafana' ...
Not creating home directory `/usr/share/grafana'.
### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd
 sudo /bin/systemctl daemon-reload
 sudo /bin/systemctl enable grafana-server
### You can start grafana-server by executing
 sudo /bin/systemctl start grafana-server
Processing triggers for systemd (237-3ubuntu10.49) ...
Processing triggers for ureadahead (0.100.0-21) ...

配置

配置文件 /etc/grafana/grafana.ini

vim /etc/grafana/grafana.ini

默认使用 3000 端口，可更改

运行

首次运行使用systemctl 运行

systemctl 介绍

https://blog.csdn.net/skh2015java/article/details/94012643

输入命令

root@iZwz94i2x6mfttevff926kZ:~# sudo /bin/systemctl daemon-reload
root@iZwz94i2x6mfttevff926kZ:~# sudo /bin/systemctl enable grafana-server
sudo /bin/systemctl start grafana-server

输出结果

Synchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable grafana-server
Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /usr/lib/systemd/system/grafana-server.service.

查看端口

接入prometheus

参考文件：https://prometheus.io/docs/visualization/grafana/#installing

导入制作好的仪表盘

链接：https://pan.baidu.com/s/1-QRttUDDly7XfZNc9dPfOQ
提取码：0e2u
--来自百度网盘超级会员V5的分享

最后这样

告警

Prometheus 的警报分为两部分。Prometheus 服务器中的警报规则将警报发送到警报管理器。然后，Alertmanager 管理这些警报，包括静音、抑制、聚合和通过电子邮件、待命通知系统和聊天平台等方法发送通知。

设置警报和通知的主要步骤是：

设置和配置警报管理器
配置 Prometheus与 Alertmanager 对话
在 Prometheus 中创建警报规则

报警管理器

💺 https://prometheus.io/docs/alerting/latest/alertmanager/

posted @ 2022-04-11 15:10 貌似大家阅读(285) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

貌似大家

普罗米修斯

普罗米修斯使用详解

prometheus 监控原理

prometheus server

下載

配置

运行

命令檢索

面板模式

node_exporter 监控节点

作用

安装

测试

接入prometheus

通过 Prometheus 表达式浏览器探索 Node Exporter 指标

GPU 监控节点

安装编译（需要go编译环境）

运行

配置

可视化面板 grafana

安装

配置

运行

登录

接入prometheus

告警

公告

貌似大家

普罗米修斯

普罗米修斯 使用详解

prometheus 监控原理

prometheus server

下載

配置

运行

命令檢索

面板模式

node_exporter 监控节点

作用

安装

测试

接入prometheus

通过 Prometheus 表达式浏览器探索 Node Exporter 指标

GPU 监控节点

安装编译 （需要go编译环境）

运行

配置

可视化面板 grafana

安装

配置

运行

登录

接入prometheus

告警

公告

普罗米修斯使用详解

安装编译（需要go编译环境）