k8s Prometheus自定义监控指标
prometheus-adapter
Prometheus并非Kubernetes系统的聚合API服务器,其PromQL接口无法直接作为自定义指标数据源,我们还需要一个专门的中间层将PromQL的指标转换为符合Kubernetes系统聚合API格式的指标。这些自定义指标再经由Kubernetes系统上的custom.metrics.k8s.io或external.metrics.k8s.io API提供给相应的客户端使用,例如HPAv2等。目前最流行的中间层解决方案是托管在GitHub上的prometheus-adapter项目,另外可选的还有kube-metrics-adapter等。
配置适配器
适配器通过以下方式考虑指标:
发现机制(Discovery)
定义适配器如何从Prometheus中为当前规则查找待暴露的指标,使用seriesQuery来指定传递给Prometheus的查询条件,且能够使用seriesFilters进一步缩小指标范围。下面的条件表示从每个名称空间查询所有Pod上的http_requests_total指标,其中的kubernetes_namespace代表名称空间的名称标识,而kubernetes_pod_name代表Pod自身名称标识,它们是适配器中固定的Go模板变量。
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
关联方式
定义发现机制中指定的指标可以附加到Kubernetes的哪些资源上,即暴露哪些资源的指定指标。关联方式使用resources字段进行定义,支持两种格式:一种是嵌套使用template字段以Go模板的形式限定目标资源,使用Group代表资源群组,使用Resouce代表资源类型;另一种是嵌套使用overrides字段将特定的资源标签转为Kubernetes资源类型。
下面的示例把具体的名称空间的名称统一为固定的资源类型标识namespace(也可以是namespaces),把具体的Pod名称统一为固定的资源类型标识pod(也可以是pods),它们都隶属于core群组,因而无须指定群组名称。
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
指标命名
定义如何将Prometheus的指标名称转换为所需的自定义指标名称,它由name字段进行定义,并嵌套使用match字段选定要转换的指标(默认为“.*”),使用as字段指定要使用的名称,支持正则表达式的分组引用机制,例如$0或${0}等。例如,下面的示例表示把所有指标名称中的_total后缀修改为_per_second。
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
查询语句
定义具体发往PromQL的查询语句,在metricsQuery字段以Go模板格式进行定义,并在具体执行时基于目标对象的信息进行模板渲染后转为具体PromQL语句。模板固定以Series引用发现机制中指定的指标名称;以LabelMatchers引用资源标签匹配条件列表,目前该匹配条件的默认值是资源类型及其所属的名称空间,因而集群级别的资源无此条件;以GroupBy引用分组条件列表,目前该分组条件默认为资源类型。例如,下面的语句代表以指定的指标查询满足标签选择条件的、监控对象上的Prometheus指标,而后将其速率值进行分组求和:
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
配置样例
rules:
# Each rule represents a some naming and discovery logic.
# Each rule is executed independently of the others, so
# take care to avoid overlap. As an optimization, rules
# with the same `seriesQuery` but different
# `name` or `seriesFilters` will use only one query to
# Prometheus for discovery.
# some of these rules are taken from the "default" configuration, which
# can be found in pkg/config/default.go
# this rule matches cumulative cAdvisor metrics measured in seconds
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
resources:
# skip specifying generic resource<->label mappings, and just
# attach only pod and namespace resources by mapping label names to group-resources
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
# specify that the `container_` and `_seconds_total` suffixes should be removed.
# this also introduces an implicit filter on metric family names
name:
# we use the value of the capture group implicitly as the API name
# we could also explicitly write `as: "$1"`
matches: "^container_(.*)_seconds_total$"
# specify how to construct a query to fetch samples for a given series
# This is a Go template where the `.Series` and `.LabelMatchers` string values
# are available, and the delimiters are `<<` and `>>` to avoid conflicts with
# the prometheus query language
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"
# this rule matches cumulative cAdvisor metrics not measured in seconds
- seriesQuery: '{__name__=~"^container_.*_total",container!="POD",namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
seriesFilters:
# since this is a superset of the query above, we introduce an additional filter here
- isNot: "^container_.*_seconds_total$"
name: {matches: "^container_(.*)_total$"}
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"
# this rule matches cumulative non-cAdvisor metrics
- seriesQuery: '{namespace!="",__name__!="^container_.*"}'
name: {matches: "^(.*)_total$"}
resources:
# specify an a generic mapping between resources and labels. This
# is a template, like the `metricsQuery` template, except with the `.Group`
# and `.Resource` strings available. It will also be used to match labels,
# so avoid using template functions which truncate the group or resource.
# Group will be converted to a form acceptible for use as a label automatically.
template: "<<.Resource>>"
# if we wanted to, we could also specify overrides here
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"
# this rule matches only a single metric, explicitly naming it something else
# It's series query *must* return only a single metric family
- seriesQuery: 'cheddar{sharp="true"}'
# this metric will appear as "cheesy_goodness" in the custom metrics API
name: {as: "cheesy_goodness"}
resources:
overrides:
# this should still resolve in our cluster
brand: {group: "cheese.io", resource: "brand"}
metricsQuery: 'count(cheddar{sharp="true"})'
# external rules are not tied to a Kubernetes resource and can reference any metric
# https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects
externalRules:
- seriesQuery: '{__name__="queue_consumer_lag",name!=""}'
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (name)
- seriesQuery: '{__name__="queue_depth",topic!=""}'
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (name)
# Kubernetes metric queries include a namespace in the query by default
# but you can explicitly disable namespaces if needed with "namespaced: false"
# this is useful if you have an HPA with an external metric in namespace A
# but want to query for metrics from namespace B
resources:
namespaced: false
# TODO: should we be able to map to a constant instance of a resource
# (e.g. `resources: {constant: [{resource: "namespace", name: "kube-system"}}]`)?
自定义规则
prometheus-adapter-values.yaml
# Prometheus 地址要和实际环境保持一致
prometheus:
url: http://prom-prometheus-server.monitoring.svc.cluster.local
port: 80
path: ""
replicas: 1
metricsRelistInterval: 1m
listenPort: 6443
service:
annotations: {}
port: 443
type: ClusterIP
rules:
default: true # 是否加载默认规则;
custom:
# - seriesQuery: '{__name__=~"^http_requests_.*",kubernetes_namespace!="",kubernetes_pod_name!=""}'
# resources:
# overrides:
# kubernetes_namespace: {resource: "namespace"}
# kubernetes_pod_name: {resource: "pod"}
# metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
existing:
external: []
tls:
enable: false
ca: |-
# Public CA file that signed the APIService
key: |-
# Private key of the APIService
certificate: |-
# Public key of the APIService
应用自定义指标
Helm Hub的仓库中名为prometheus-community的项目便是用于部署prometheus-adapter的Chart,部署时需要自定义的通常只是与后端的Prometheus服务相关的参数。
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prometheus-adapter -f prometheus-adapter-values.yaml prometheus-community/prometheus-adapter
查看指标信息
$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "pods/http_requests_total",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": ["get"]
},
{
"name": "namespaces/http_requests_total",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": ["get"]
}
]
}
查看自定义指标信息
# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second",
},
"items": [
{
"describedObject": {
"kind": "Pod",
"name": "frontend-server-abcd-0123",
"apiVersion": "/__internal",
},
"metricName": "http_requests_per_second",
"timestamp": "2018-08-07T17:45:22Z",
"value": "16m"
},
{
"describedObject": {
"kind": "Pod",
"name": "frontend-server-abcd-4567",
"apiVersion": "/__internal",
},
"metricName": "http_requests_per_second",
"timestamp": "2018-08-07T17:45:22Z",
"value": "22m"
}
]
}
参考文档
https://github.com/kubernetes-sigs/prometheus-adapter