Grafana学习(1)——What is Prometheus ?

Glossary-术语表
管理员手册

Observability focuses on understanding the internal state of your systems based on the data they produce, which helps determine if your infrastructure is healthy. Prometheus is a core technology for monitoring and observability of systems, but the term “Prometheus” can be confusing because it is used in different contexts. Understanding Prometheus basics, why it’s valuable for system observability, and how users use it in practice will both help you better understand it and help you use Grafana.

Prometheus began in 2012 at SoundCloud because existing technologies were insufficient for their observability needs. Prometheus offers both a robust data model and a query language. Prometheus is also simple and scalable. In 2018, Prometheus graduated from Cloud Native Computing Foundation (CNCF) incubation, and today has a thriving community.

可观察性侧重于根据系统产生的数据了解系统的内部状态,这有助于确定您的基础设施是否健康。Prometheus是系统监测和可观察性的核心技术,但“Prometheus”一词可能会令人困惑,因为它在不同的环境中使用。了解Prometheus的基础知识,为什么它对系统的可观察性很有价值,以及用户在实践中如何使用它,都将帮助您更好地理解它,并帮助您使用Grafana。
Prometheus于2012年在SoundCloud开始工作,因为现有技术不足以满足其可观测性需求。Prometheus提供了一个健壮的数据模型和一种查询语言。Prometheus也是简单且可扩展的。2018年,Prometheus从云原生计算基金会(CNCF)孵化中心毕业,如今拥有一个蓬勃发展的社区。

The following panel in a Grafana dashboard shows how much disk bandwidth on a Mac laptop is being used. The green line represents disk reads, and the yellow line represents writes.
Data like these form time series. The X-axis is a moment in time and the Y-axis is a number or measurement; for example, 5 megabytes per second. This type of time series data appears everywhere in systems monitoring, as well as in places such as seasonal temperature charts and stock prices. This data is simply some measurement (such as a company stock price or Disk I/O) through a series of time instants.

Grafana仪表板中的以下面板显示了Mac笔记本电脑上使用的磁盘带宽。绿色线表示磁盘读取,黄色线表示写入。
像这样的数据形成时间序列。X轴是时间上的一个时刻,Y轴是一个数字或测量值;例如每秒5兆字节。这种类型的时间序列数据出现在系统监测中的任何地方,也出现在季节性温度图和股价等地方。这些数据只是通过一系列时间瞬间进行的一些测量指标(例如公司股价或磁盘I/O)。

Prometheus is a technology that collects and stores time series data. Time series are fundamental to Prometheus; its data model is arranged into:

metrics that consist of a timestamp and a sample, which is the numeric value, such as how many disk bytes have been read or a stock price
a set of labels called dimensions, for example, job and device
You can store time series data in any relational database, however, these systems are not developed to store and query large volumes of time series data. Prometheus and similar software provide tools to compact and optimize time series data.

Prometheus是一种收集和存储时间序列数据的技术。时间序列是Prometheus的基础;其数据模型被设计为:

  • 由时间戳和样本组成的指标,样本是数值,例如读取了多少磁盘字节或股价
  • 一组称为维度的标签,例如作业和设备
    您可以将时间序列数据存储在任何关系数据库中,但是,这些系统并不是为存储和查询大量时间序列数据而开发的。Prometheus和类似的软件提供了压缩和优化时间序列数据的工具。
Simple dashboard using PromQL
The following Grafana dashboard image shows a Disk I/O graph of raw data from Prometheus derived from a laptop.

The Metrics browser field contains the following query:
node_disk_written_bytes_total{job="integrations/macos-node", device!=""}

In this example, the Y-axis shows the total number of bytes written, and the X-axis shows dates and times. As the laptop runs, the number of bytes written increases over time. Below Metrics browser is a counter that counts the number of bytes written over time.

使用PromQL的简单仪表板
下面的Grafana仪表板图像显示了Prometheus从笔记本电脑获得的原始数据的磁盘I/O图。
度量浏览器字段包含以下查询:
node_disk_writen_bytes_total{job=“integrations/macos node”,device!=“”}
在本例中,Y轴显示写入的字节总数,X轴显示日期和时间。随着笔记本电脑的运行,写入的字节数会随着时间的推移而增加。Metrics浏览器里面是一个计数器,用于统计随时间写入的字节数。

The query is a simple example of PromQL, the Prometheus Query Language. The query identifies the metric of interest (node_disk_written_bytes_total) and provides two labels (job and device). The label selector job="integrations/macos-node" filters metrics. It both reduces the scope of the metrics to those coming from the MacOS integration job and specifies that the “device” label cannot be empty. The result of this query is the raw stream of numbers that the graph displays.

Although this view provides some insight into the performance of the system, it doesn’t provide the full story. A clearer picture of system performance requires understanding the rate of change that displays how fast the data being written is changing. To properly monitor disk performance, you need to also see spikes in activity that illustrate if and when the system is under load, and whether disk performance is at risk. PromQL includes a rate() function that shows the per-second average rate of increase over 5m (5-minute) intervals. This view provides a much clearer picture of what’s happening with the system.

该查询是Prometheus查询语言PromQL的一个简单示例。查询标识感兴趣的指标(node_disk_writen_bytes_total),并提供两个标签(job和device)。标签选择器job=“integrations/macos节点”过滤指标。它既将指标的范围缩小到来自MacOS集成job的指标,又指定“设备”标签不能为空。此查询的结果是图形显示的原始数字流。
尽管此视图提供了对系统性能的一些深入了解,但它并没有提供完整的情况。要更清楚地了解系统性能,需要了解显示写入数据变化速度的变化率。为了正确监控磁盘性能,您还需要看到活动中的峰值,以说明系统是否以及何时处于过载,以及磁盘性能是否存在风险。PromQL包含一个rate()函数,用于显示 5m(5分钟)间隔的每秒平均增长率。此视图可以更清楚地了解系统的情况。

A counter metric is just one type of metric; it is a number (such as total bytes written) that only increases. Prometheus supports several others, such as the metric type gauge, which can increase or decrease. 

The following gauge visualization displays the total RAM usage on a computer. 

The third metric type is called a histogram, which counts observations and organizes them into configurable groups. The following example displays floating-point numbers grouped into ranges that display how frequently each occurred.

计数器指标只是指标的一种类型;它是一个只会增加的数字(例如写入的总字节数)。Prometheus支持其他几种,例如公制仪表,它可以增加或减少。
以下仪表可视化显示计算机上的RAM使用总量。(饼图)
第三种度量类型称为直方图,它统计观察结果并将其组织成可配置的组。以下示例显示了按范围分组的浮点数,这些范围显示了每个浮点数出现的频率。

These core concepts of time series, metrics, labels, and aggregation functions are foundational to Grafana and observability.

时间序列、指标、标签和聚合函数的这些核心概念是Grafana和可观察性的基础。

Software and systems are a difficult business. Sometimes things go wrong. Observability helps you understand a system’s state so that issues can be quickly identified and proactively addressed. And when problems do occur, you can be alerted to them to diagnose and solve them within your Service Level Objectives (SLOs).
The three pillars of observability are metrics, logs, and traces. Prometheus supports the metrics pillar. When software on a computer runs slowly, observability can help you identify whether CPU is saturated, the system is out of memory, or if the disk is writing at maximum speed so you can proactively respond.

软件和系统是一项困难的业务。有时事情会出错。可观察性有助于您了解系统的状态,以便能够快速识别和主动解决问题。当确实出现问题时,您可以得到提醒,以便在您的服务级别目标(SLO)范围内进行诊断和解决。
可观察性的三个支柱是指标、日志和跟踪。Prometheus 支持指标。当计算机上的软件运行缓慢时,可观察性可以帮助您确定CPU是否饱和、系统内存不足,或者磁盘是否以最大速度写入,以便您能够主动响应。

Prometheus isn’t just a data format; it is also considered an open source systems monitoring and alerting toolkit. That’s because Prometheus is software, not just data.

Prometheus can scrape metric data from software and infrastructure and store it. Scraping means that Prometheus software periodically revisits the same endpoint to check for new data. Prometheus scrapes data from a piece of software instrumented with a client library.

For example, a NodeJS application can configure the prom-client to expose metrics easily at an endpoint, and Prometheus can regularly scrape that endpoint. Prometheus includes a number of other tools within the toolkit to instrument your applications.

Prometheus 不仅仅是一种数据格式;它也被认为是一个开源系统监控和警报工具包。这是因为Prometheus 是软件,而不仅仅是数据。
Prometheus 可以从软件和基础设施中抓取度量数据并进行存储。抓取意味着Prometheus 软件定期重新访问同一端点以检查新数据。Prometheus 从一个装有客户端库的软件中抓取数据。
例如,NodeJS应用程序可以配置prom客户端,以便在端点轻松地公开指标,而Prometheus可以定期抓取该端点。Prometheus在工具包中包含了许多其他工具,可以为您的应用程序提供工具。

The first section of this document introduced the Prometheus as Data concept and how the Prometheus data model and metrics are organized. The second section introduced the concept of Prometheus as Software that is used to collect, process, and store metrics. This section describes how Prometheus as Data and Prometheus as Software come together.

Consider the following example. Suppose a ‘MyApp’ application uses a Prometheus client to expose metrics. One approach to collecting metrics data is to use a URL in the application that points to an endpoint http://localhost:3000/metrics that produces Prometheus metrics data.

The following image shows the two metrics associated with the endpoint. The HELP text explains what the metric means, and the TYPE text indicates what kind of metric it is (in this case, a gauge). MyAppnodejs_active_request_total indicates the number of requests (in this case, 1). MyAppnodejs_heap_size_total_bytes indicates the heap size reported in bytes. There are only two numbers because this data shows the value at the moment the data was fetched.
The following image shows the two metrics associated with the endpoint. The HELP text explains what the metric means, and the TYPE text indicates what kind of metric it is (in this case, a gauge). MyAppnodejs_active_request_total indicates the number of requests (in this case, 1). MyAppnodejs_heap_size_total_bytes indicates the heap size reported in bytes. There are only two numbers because this data shows the value at the moment the data was fetched.

本文档的第一部分介绍了Prometheus as Data的概念,以及如何组织Prometheus数据模型和度量。第二部分介绍了Prometheus作为用于收集、处理和存储指标的软件的概念。本节介绍了作为数据的Prometheus和作为软件的Prometheus是如何结合在一起的。
请考虑以下示例。假设一个“MyApp”应用程序使用Prometheus客户端来公开指标。收集指标数据的一种方法是在应用程序中使用指向端点的URL http://localhost:3000/metrics产生Prometheus指标数据。
下图显示了与端点关联的两个指标。HELP文本解释指标的含义,TYPE文本指示指标的类型(在本例中为gauge)。MyAppnodejs_active_request_total表示请求数(在本例中为1)。MyAppnodejs_heap_size_total_bytes表示内存堆大小(以字节为单位)。只有两个数字,因为这些数据显示了提取数据时的值。

The ‘MyApp’ metrics are available in an HTTP endpoint, but how do they get to Grafana, and subsequently, into a dashboard? The process of recording and transmitting the readings of an application or piece of infrastructure is known as telemetry. Telemetry is critical to observability because it helps you understand exactly what’s going on in your infrastructure. The metrics introduced previously, for example, MyAppnodejs_active_requests_total, are telemetry data.

To get metrics into Grafana, you can use either the Prometheus software or Grafana Agent to scrape metrics. Grafana Agent collects and forwards the telemetry data to open-source deployments of the Grafana Stack, Grafana Cloud, or Grafana Enterprise, where your data can be analyzed. For example, you can configure Grafana Agent to pull the data from ‘MyApp’ every five seconds and send the results to Grafana Cloud.

“MyApp”指标在HTTP端点中可用,但它们如何到达Grafana,并随后进入仪表板?记录和传输应用程序或基础设施读数的过程称为遥测。遥测对可观察性至关重要,因为它可以帮助您准确了解基础设施中发生了什么。前面介绍的指标,例如MyAppnodejs_active_requests_total,是遥测数据。
要在Grafana中获取指标,您可以使用Prometheus软件或Grafana Agent来获取指标。Grafana Agent收集遥测数据并将其转发到Grafana Stack、Grafana Cloud或Grafana Enterprise的开源部署,在那里可以分析您的数据。例如,您可以将Grafana Agent配置为每五秒钟从“MyApp”中提取一次数据,并将结果发送到Grafana Cloud。

Metrics data is only one type of telemetry data; the other kinds are logs and traces. Using Grafana Agent can be a great option to send telemetry data because as you scale your observability practices to include logs and traces, which Grafana Agent also supports, you’ve got a solution already in place.

The following image illustrates how Grafana Agent works as an intermediary between ‘MyApp’ and Grafana Cloud.

指标数据只是遥测数据的一种类型;其他类型是日志和痕迹。使用Grafana Agent是发送遥测数据的一个很好的选择,因为当您将可观察性实践扩展到包括日志和跟踪(Grafana代理也支持)时,您已经有了一个解决方案。
下图展示了Grafana Agent如何作为“MyApp”和Grafana Cloud之间的中介进行工作。

Bringing it together
The combination of Prometheus and Grafana Agent gives you control over the metrics you want to report, where they come from, and where they’re going. Once the data is in Grafana, it can be stored in a Grafana Mimir database. Grafana dashboards consist of visualizations populated by data queried from the Prometheus data source. The PromQL query filters and aggregates the data to provide you the insight you need. With those steps, we’ve gone from raw numbers, generated by software, into Prometheus, delivered to Grafana, queried by PromQL, and visualized by Grafana.

Prometheus和Grafana Agent的结合让你可以控制你想要报告的指标,它们来自哪里,去哪里。一旦数据在Grafana中,就可以将其存储在Grafana-Mimir数据库中。Grafana仪表板由从Prometheus数据源查询的数据填充的可视化组成。PromQL查询过滤和聚合数据,为您提供所需的洞察力。通过这些步骤,我们已经从软件生成的原始数字变成了Prometheus,交付给Grafana,由PromQL查询,并由Grafana可视化。

One great next step is to build a dashboard in Grafana and start turning that raw Prometheus telemetry data into insights about what’s going with your services and infrastructure.
Another great step is to learn about Grafana Mimir, which is essentially a database for Prometheus data. If you’re wondering how to make this work for a large volumes of metrics with a lot of data and fast querying, check out Grafana Mimir.
If you’re interested in working with Prometheus data in Grafana directly, check out the Prometheus data source documentation, or check out PromQL basics.

下一个伟大的步骤是在Grafana中构建一个仪表板,并开始将普罗米修斯的原始遥测数据转化为有关您的服务和基础设施的见解。
另一个伟大的步骤是了解Grafana Mimir,它本质上是普罗米修斯数据的数据库。如果你想知道如何在大量数据和快速查询的情况下实现这一点,请查看Grafana Mimir。
如果您有兴趣在Grafana中直接使用Prometheus数据,请查看Prometheus数据源文档,或查看PromQL基础知识。

posted @   钱塘江畔  阅读(54)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 【杭电多校比赛记录】2025“钉耙编程”中国大学生算法设计春季联赛(1)
点击右上角即可分享
微信分享提示