服务发现
服务发现:
小型环境静态配置即可,大型环境必须使用服务发现
官方文档: https://prometheus.io/docs/prometheus/latest/configuration/configuration/
基于文件的服务发现:
仅优先于静态配置的服务发现
prom定期从文件中加载targer信息,文件可是json、yaml。必须有target李彪、可选的标签信息
配置文件可由其他系统生成,如puppet、ansible、saltstack
配置:
1)修改promtheus主配置文件
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'prometheus'
file_sd_configs:
- files:
- targets/prometheus-*.yml
refresh_interval: 2m
- job_name: 'nodes'
file_sd_configs:
- files:
- targets/nodes-*.yml
refresh_interval: 2m
2)配置服务发现文件
mkdir targets
vim targets/prometheus.yml
- targets:
- 2.2.2.43:9090
labels:
app: prometheus
job: prometheus
vim targets/nodes-linux.yml
- targets:
- 2.2.2.43:9100
- 2.2.2.53:9100
- 2.2.2.63:9100
labels:
app: node-exporter
job: node
基于consul服务发现:
1)启动consul
生产环境参考consul服务篇,以下只是简单启动一个测试使用
#简单启动
consul agent -server -ui \
-bootstrap-expect 1 \
-auto-reload-config \
-node node3 \
-bind '{{ GetInterfaceIP "eth0" }}' \
-advertise '{{ GetInterfaceIP "eth0" }}' \
-client 0.0.0.0 \
-retry-join 2.2.2.35 \
-rejoin \
-log-level info
2)注册服务
cat > node_export.json <<eof
{
"services": [
{
"id": "node_exporter-node01",
"name": "node01",
"address": "2.2.2.43",
"port": 9100,
"tags": ["nodes"],
"checks": [{
"http": "http://2.2.2.43:9100/-/healthy",
"interval": "5s"
}]
},
{
"id": "node_exporter-node02",
"name": "node02",
"address": "2.2.2.53",
"port": 9100,
"tags": ["nodes"],
"checks": [{
"http": "http://2.2.2.53:9100/-/healthy",
"interval": "5s"
}]
},
{
"id": "node_exporter-node03",
"name": "node03",
"address": "2.2.2.63",
"port": 9100,
"tags": ["nodes"],
"checks": [{
"http": "http://2.2.2.63:9100/-/healthy",
"interval": "5s"
}]
}]
}
eof
cat > prom.json <<eof
{
"id": "exporter-01",
"name": "node01",
"address": "2.2.2.15",
"port": 9090,
"tags": ["nodes", "linux"],
"checks": [{
"http": "http://2.2.2.15:9090/-/healthy",
"interval": "5s"
}]
}
eof
#使用api注册服务
curl -X PUT -d @node_export.json 2.2.2.35:8500/v1/agent/service/register
curl -X PUT -d @prom.json 2.2.2.35:8500/v1/agent/service/register
#注销服务
curl -X PUT 2.2.2.35:8500/v1/agent/service/deregister/exporter-01
curl -X PUT -d '{
"id": "exporter-01",
"name": "exporter-node1",
"address": "2.2.2.15",
"port": 9090,
"tags": ["nodes", "linux"],
"checks": [{
"http": "http://2.2.2.15:9090/-/healthy",
"interval": "5s"
}]
}' 2.2.2.35:8500/v1/agent/service/register
3)修改promtheus主配置文件
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
#抓取consul自身指标
- job_name: 'consul'
metrics_path: '/v1/agent/metrics'
params:
format:
- 'prometheus'
static_configs:
- targets:
- '2.2.2.35:8500'
- '2.2.2.45:8500'
#抓取node标签的指标
- job_name: 'nodes'
consul_sd_configs:
- server: "2.2.2.45:8500"
tags:
- "nodes"
refresh_interval: 2m
4)启动prom
prometheus --config.file=/opt/prometheus/prometheus.yml --web.enable-lifecycle --web.enable-admin-api
基于k8s的api的服务发现:
注:目前k8s的配置有问题,博主没有成功实现出来,有时间了再补充完整
支持将api-server中的node、svc、pod、ep、ingress等资源当做target监控
每个资源都有自己的发现机制。需要prom的role机制来实现
也支持daemonset控制器部署node-exporter来监控节点
监控k8s资源资源时,必须该资源有2个注解:
prometheus.io/port: 9153 #用于普罗米修斯服务发现的端口
prometheus.io/scrape: true #允许普罗米修斯抓取数据
node资源发现:
node role将k8s的每个节点当做一个target,都监听kubelet使用的端口
依次检索NodeInternalIP、NodeExternalIP、NodeLegactHostIP、NodeHostName字段。并将发现的第一个地址作为目标地址(address)
可用元标签有:
meta_kubernetes_node_name: #node名称
__meta_kubernetes_node_label_<labelname>: #node自己的标签
__meta_kubernetes_node_labelpresent_<labelname>: #
__meta_kubernetes_node_annotation_<annotationname>: #node注解信息
__meta_kubernetes_node_annotationpresent_<annotationname>:
__meta_kubernetes_node_address_<address_type>: #node地址
pod资源发现:
pod role负责发现k8s每个Pod资源并暴露其容器为target
把Pod上声明的每个端口视作一个target;
会为未指定端口的容器创建“无端口”类型的target,以便于用户通过relabel机制手动添加端口
可用的部分metadata标签如下
__meta_kubernetes_namespace: The namespace of the pod object.
__meta_kubernetes_pod_name: The name of the pod object.
__meta_kubernetes_pod_ip: The pod IP of the pod object.
__meta_kubernetes_pod_label_<labelname>: Each label from the pod object.
__meta_kubernetes_pod_annotation_<annotationname>: Each annotation from the pod object.
__meta_kubernetes_pod_ready: Set to true or false for the pod's ready state.
__meta_kubernetes_pod_phase: Set to Pending, Running, Succeeded, Failed or Unknown in the lifecycle.
__meta_kubernetes_pod_node_name: The name of the node the pod is scheduled onto.
__meta_kubernetes_pod_host_ip: The current host IP of the pod object.
__meta_kubernetes_pod_uid: The UID of the pod object.
#判断k8s是否允许抓取数据,也就是前面的,在集群资源的注解中加入2个
relabel_configs:
- source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
regex: true
action: keep
1)创建sa
apiVersion: v1
kind: ServiceAccount
metadata:
name: prom-sa
namespace: monitor
---
apiVersion: v1
kind: Secret
metadata:
name: prom-secret
namespace: monitor
annotations:
kubernetes.io/service-account.name: prom-sa
type: kubernetes.io/service-account-token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prom-cr
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- namespaces
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prom-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prom-cr
subjects:
- kind: ServiceAccount
name: prom-sa
namespace: monitor
2)获取token
token=`kubectl describe secrets -n monitor prom-secret |awk '/token:/{print $2}'`
echo $token >/opt/prometheus/k8s.token
3)普罗米修斯配置