prometheus基础
1、简介
1、开源的系统监控和告警系统
Prometheus is an open-source systems monitoring and alerting toolkit
官网:https://prometheus.io/docs/introduction/overview/
2、prometheus架构图
简单理解一下,prometheus架构中主要包括:prometheus server、exporter、pushgateway、alertmanager、grafana
其中,prometheus server主要包括:
1、retrieval,负载在活跃的target主机上主机相应数据;
2、tsdb,存储采集的信息;
3、http server,接受查询的模块;
3、主要特征
1、以时间为序列的,以度量名称和相应的key-value构建的多维度数据模型;
a multi-dimensional data model with time series data identified by metric name and key/value pairs
2、promQL语言灵活;
PromQL, a flexible query language to leverage this dimensionality
3、可以依靠本地本地存储实现;
no reliance on distributed storage; single server nodes are autonomous
4、依靠HTTP协议,通过pull的方式时间序列上的变化;
time series collection happens via a pull model over HTTP
5、可以通过pushgateway实现数据采集;
pushing time series is supported via an intermediary gateway
6、可以通过服务发现或者静态配置来发现目标target;
targets are discovered via service discovery or static configuration
7、图形界面多样化;
multiple modes of graphing and dashboarding support
4、基本概念
4.1、数据模型(data model)
#以时间序列为基础,同一条数据流需要有相同的度量值(metric)和相同的标签(labels)
Prometheus fundamentally stores all data as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.
# 那么,时间戳 + metric + label = 样本
# natotion语法
<metric name>{<label name>=<label value>, ...}
4.2、度量类型/数据类型(metric type)
# counter 计数器
# 累积的数值,只能增加,或者重置为0,常用于统计服务器请求总数、任务完成数等;
A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.
# gauge 测量器
# 可增可减,常用于监控温度、内存、CPU使用情况;
A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.
# Histogram 柱状图
# 统计、计算桶内的的样本情况
A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets.
# summary 统计图
# 与柱状图类似,可用于统计总数,
Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.
4.3、jobs and instances
# 实例,instance
# 在prometheus环境中,可以被查询请求的一个端点均可称为一个instance。
# 为某一需求而采集的instance集合称为jobs,
In Prometheus terms, an endpoint you can scrape is called an instance, usually corresponding to a single process. A collection of instances with the same purpose, a process replicated for scalability or reliability for example, is called a job.
For example, an API server job with four replicated instances:
job: api-server
instance 1: 1.2.3.4:5670
instance 2: 1.2.3.4:5671
instance 3: 5.6.7.8:5670
instance 4: 5.6.7.8:5671
5、prometheus的配置文件
配置文件可以查看官网:https://prometheus.io/docs/prometheus/latest/configuration/configuration/