exporter采集器
exporter:
prometheus的指标暴露器,在各个被监控节点运行,采集数据,等待promtheus主服务来拉取数据
官方exporter: https://prometheus.io/download/
其他exporter: https://prometheus.io/docs/instrumenting/exporters/
黑盒监控:通过http、tcp等从外部探测服务状态,达到故障第一时间处理
白盒监控:从服务内部,检测内部指标,达到预知潜在的问题
blackbox_exporter:
官方黑盒导出器
通过http、https、dns、tcp、icmp、grpc对被监控节点进行数据采集
- http api可用性检测
- tcp 端口监听检测
- icmp 主机存活检测
- dns 域名解析
下载: https://prometheus.io/download/#blackbox_exporter
端口: 9115
安装:
1)下载
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz
tar xf blackbox_exporter-0.24.0.linux-amd64.tar.gz
mv blackbox_exporter-0.24.0.linux-amd64/blackbox_exporter /bin/
ln -s `pwd` /opt/blackbox_exporter
2)配置service
cat > /etc/systemd/system/blackbox_exporter.service <<-eof
[Unit]
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/bin/blackbox_exporter \
--config.file=/opt/blackbox_exporter/blackbox.yml \
--web.listen-address=:9115
ExecReload=/bin/kill -1 \$MAINPID
TimeoutStopSec=10s
Restart=on-failore
[Install]
WantedBy=multi-user.target
eof
systemctl daemon-reload
systemctl enable --now blackbox_exporter
配置语法:
modules:
http_2xx: #规则名称
prober: http #探测协议
t
http:
preferred_ip_protocol: "ip4" #强制使用ipv6,默认情况下HTTP探针会走IPV6的协议
method: POST #请求方法
valid_http_versions: #有效的http协议版本
- http/1.1
valid_http_statuses: [] #返回的有效状态码,默认2xx
valid_http_content_regex: 正则 #用于检查http响应内容,如果响应内容不匹配,探测失败
headers:
Content-Type: application/json
body: '{}'
basic_auth:
username: "username"
password: "mysecret"
tls_config:
insecure_skip_verify: false
ca_file: "/certs/my_cert.crt"
cert_file:
key_file:
server_name:
min_version: 协议版本 #tls协议最低版本,TLS10 (TLS 1.0), TLS11 (TLS
# 1.1), TLS12 (TLS 1.2), TLS13 (TLS 1.3),默认1.2
body_size_limit: 100M #限制接受响应数据的字节大小,如果有Content-Length响应头,则忽略此配置,默认0,不限制大小
follow_redirects: true #跟随跳转,默认true为开启
fail_if_sll: true #true时,当目标使用ssl时,探测失败
fail_if_not_ssl: true #true时,当目标不使用ssl时,探测失败
fail_if_ssl_expired: true #当ssl证书过期时,探测将失败
fail_if_matches_regexp: 正则 #匹配则探测失败
fail_if_not_matches_regexp: 正则 #不匹配则探测失败
fail_if_header_matches:
fail_if_header_not_matches:
proxy_url: 代理
no_proxy: 字符串 #不代理的内容
proxy_from_environment: false #代理来自环境变量,默认fasle为关闭
enable_http2: true #启用http2,默认true开启
ip_protocol_fallback: true #启用协议回滚,当preferred_ip_protocol 配置的协议不可用时,改为使用另一个协议,false为只用一种协议,默认true
body: 主体 #手动指定请求主体
body_file: 文件 #从文件获取主体
http_header_match_spec: #指定 HTTP 响应头的匹配规则,如果 HTTP 响应头的值与正则表达式匹配,探测将成功
- header: 头部
regexp: 内容
tcp_connect:
prober: tcp
tcp:
source_ip_address: 源ip
preferred_ip_protocol
ip_protocol_fallback
query_response: #发送数据到tcp连接中,匹配返回的响应数据,不匹配则失败
- send: 内容1
- send: 内容2
- expect: 正则 #发送,内容1、2,然后匹配,若匹配则发送内容3
send: 内容3
- expect: 正则 #匹配内容3返回的数据
starttls: false #升级为tls,默认false关闭
tls: true
tls_config:
dns_test:
preferred_ip_protocol
ip_protocol_fallback
source_ip_address
transport_protocol: 协议 #指定 DNS 查询使用的传输协议:udp、tcp,默认udp
dns_over_tls: false #启用tls协议,只用于tcp协议有效,默认false
tls_config:
query_name: 域名 #查询的域名,一般不在此处配置,而是在普罗米修斯中配置,实现动态传参
query_type: 类型 #查询类型,"A"、"AAAA"、"CNAME"、"MX"等,默认A记录
query_class: "IN" #查询类型,IN为网络查询,默认;CH为查询服务器版本和状态信息,HS为MIT的服务器系统
recursion_desired: true #地柜查询,默认true
validate_answer_rrs: #验证 DNS 应答资源记录的规则
fail_if_matches_regexp
fail_if_all_match_regexp
fail_if_not_matches_regexp
fail_if_none_matches_regexp
validate_authority_rrs: #验证 DNS 授权资源记录的规则,选项同上
validate_additional_rrs: #验证 DNS 附加资源记录的规则,选项同上
grpc_plain:
prober: grpc
grpc:
service: 服务名 #查询健康状态的服务名称
preferred_ip_protocol
ip_protocol_fallback
tls: false
service: "service1"
tls_config:
icmp_test:
preferred_ip_protocol
ip_protocol_fallback
source_ip_address
dont_fragment: #设置ip头中的df位,仅用于ipv4,需要原始套接字(即Linux上的root或CAP_NET_RAW)
payload_size: 大小
ttl: ttl值 #指定ttl值,0-255,检测网络链接变化
prometheus端配置:
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: blackbox_all
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- 'http://www.qq.com'
labels:
group: web
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: url
- target_label: __address__
replacement: 127.0.0.1:9115
- source_labels: [__meta_dns_name]
target_label: __param_hostname
- source_labels: [__meta_dns_name]
target_label: vhost
配置案例:
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
- send: "SSH-2.0-blackbox-ssh-check"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp_ttl5:
prober: icmp
timeout: 5s
icmp:
ttl: 5
node_exporter:
采集物理机数据
下载: https://github.com/prometheus/node_exporter
端口: 9100
grafana模板: 8919
安装:
二进制安装:
1)下载
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xf node_exporter-1.3.1.linux-amd64.tar.gz
cd node_exporter-1.3.1.linux-amd64/
mv node_exporter /bin
ln -s `pwd` /opt/node_exporter
2)写service文件
cat > /etc/systemd/system/node_exporter.service <<-eof
[Unit]
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/bin/node_exporter \
--web.listen-address=":9100" \
--collector.filesystem.ignored-mount-points /boot \
--collector.filesystem.ignored-fs-types devtmpfs,overlay \
--collector.ntp \
--collector.mountstats \
--collector.systemd \
--collector.tcpstat \
--collector.ethtool \
--collector.logind \
--collector.supervisord \
--collector.zoneinfo \
--collector.processes
ExecReload=/bin/kill -HUP \$MAINPID
TimeoutStopSec=10s
Restart=on-failore
[Install]
WantedBy=multi-user.target
eof
systemctl daemon-reload
systemctl enable --now node_exporter
3)promtheus配置收集
vim prometheus.yml
...
- job_name: 'nodes'
static_configs:
- targets:
- 2.2.2.43:9100
- 2.2.2.53:9100
- 2.2.2.63:9100
systemctl restart prometheus
k8s部署:
cat > node-exporter-ds.yml <<eof
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
#如果要直接使用主机ip+端口方式启动,开启hostNetwork即可,但注意不要有9端口占用
#hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: prom/node-exporter
args:
- --path.sysfs=/host/sys
- --path.procfs=/host/proc
- --collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)
#直接使用宿主机网络时,可手动指定监听端口,避免和宿主机的端口冲突
- --web.listen-address=":9100"
ports:
- name: metrics
containerPort: 9100
volumeMounts:
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
eof
常用指标:
node_memory_MemTotal_bytes
node_memory_MemFree_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes
node_cpu_seconds_total
cpu饱和度:
跟踪cpu的平均负载就能获取到相关主机的cpu饱和度
将主机的数量考虑在内的一段时间内的平均运行队列的长度
平均负载少于cpu数量是正常的,但长时间超过cpu数则表示已饱和
内存:
可用: 可用+buffer+cache
已用: 总空间-可用
使用率: 已用/总空间
cadvisor:
监控pod的类似exporter,对docker具有原生支持
github: https://github.com/google/cadvisor/
端口: 8080
grafana模板: 8588
安装:
docker:
VERSION=v0.36.0
docker run -d \
-v /:/rootfs:ro \
-v /var/run:/var/run:ro \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-v /dev/disk/:/dev/disk:ro \
-p 8081:8080 \
--name cadvisor \
--privileged \
--device=/dev/kmsg \
spcodes/cadvisor:$VERSION
#镜像地址
google/cadvisor #国内
spcodes/cadvisor #国内
gcr.io/cadvisor/cadvisor:$VERSION #官网镜像
k8s:
kubelet在1.7.3版本以前,cadvisor的metrics数据集成在kubelet的metrics中
1.7.3以后版本及以后的版本,cadvisor从kubelet的metrics独立出来了
目前k8s v1.28中内置的cadvisor是0.47.2
#获取k8s admin账号的token
token=$(kubectl describe secrets -n `kubectl get secrets -A |awk '/admin-user/{print $1,$2}'` |awk '/^token/{print $2}')
#直接访问kubelet暴露的端口
curl -kH "Authorization: Bearer $token" https://127.0.0.1:10250/metrics/cadvisor
#通过api-server方式访问
curl -kH "Authorization: Bearer $token" https://2.2.2.15:6443/api/v1/nodes/2.2.2.15/proxy/metrics/cadvisor
如果非要运行,也可以用ds控制器运行:
cat > cadvisor-ds.yaml <<eof
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor-ds
namespace: monitor
spec:
selector:
matchLabels:
app: cadvisor
template:
metadata:
labels:
app: cadvisor
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
#如果要直接使用主机ip+端口方式启动,开启hostNetwork即可,但注意不要有8080端口占用
hostNetwork: true
restartPolicy: Always
containers:
- name: cadvisor
image: zcube/cadvisor:v0.45.0
imagePullPolicy: IfNotPresent
#直接使用宿主机网络时,可手动指定监听端口,避免和宿主机的端口冲突
args:
- -port=8081
ports:
- containerPort: 8081
resources:
requests:
cpu: "0.5"
memory: "500Mi"
limits:
cpu: "2"
memory: "1Gi"
securityContext:
runAsUser: 0
privileged: true
volumeMounts:
- name: root
mountPath: /rootfs
- name: run
mountPath: /var/run
- name: sys
mountPath: /sys
- name: containerd
mountPath: /var/lib/containerd
- name: kmsg
mountPath: /dev/kmsg
volumes:
- name: root
hostPath:
path: /
- name: run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: containerd
hostPath:
path: /var/lib/containerd
- name: kmsg
hostPath:
path: /dev/kmsg
eof
tomcat_exporter:
第三方指标暴露器,访问:/metrics 即可
github: https://github.com/nlighten/tomcat_exporter
安装:
容器运行
1)生成下载依赖库文件的脚本
tee > install.sh <<EOF
#!/bin/bash
set -e
cat > sort.sh <<eof
#!/bin/bash
while IFS= read -r line
do
echo "\$line" |awk -F. '{ printf("%03d%03d%03d\n", \$1,\$2,\$3); }'
done |sort |awk '{ printf("%d.%d.%d\\n", substr(\$0,0,3), substr(\$0,4,3), substr(\$0,7,3)); }' |awk 'NR>1'
eof
cd /opt/
files=(
simpleclient
simpleclient_common
simpleclient_hotspot
simpleclient_servlet
simpleclient_servlet_common
tomcat_exporter_client
tomcat_exporter_servlet
)
lib_dir=/usr/local/tomcat/lib
web_dir=/usr/local/tomcat/webapps
simpleclient_url='https://repo1.maven.org/maven2/io/prometheus'
simpleclient_vs=\`curl -sL \$simpleclient_url/\$files |sed -nr 's#^<a.*title="([0-9.]+)/".*#\1#p' |bash sort.sh |tail -n1\`
exporter_url='https://repo1.maven.org/maven2/nl/nlighten'
exporter_vs=\`curl -sL \$exporter_url/\${files[-1]} |sed -nr 's#^<a.*title="([0-9.]+)/".*#\1#p' |bash sort.sh |tail -n1\`
for name in \${files[*]} ;do
if [ \$name = 'tomcat_exporter_client' ] ;then
url=\$exporter_url/\$name/\$exporter_vs/\$name-\${exporter_vs}.jar
curl -sL \$url -o \$lib_dir/\$name-\${exporter_vs}.jar && echo \$url
elif [ \$name = 'tomcat_exporter_servlet' ] ;then
url=\$exporter_url/\$name/\$exporter_vs/\$name-\${exporter_vs}.war
curl -sL \$url -o \$web_dir/metrics.war && echo \$url
else
url=\$simpleclient_url/\$name/\$simpleclient_vs/\$name-\${simpleclient_vs}.jar
curl -sL \$url -o \$lib_dir/\$name-\${simpleclient_vs}.jar && echo \$url
fi
done
EOF
2)安装docker
apt-get -y install apt-transport-https ca-certificates curl software-properties-common && \
curl -fsSL mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add - && \
sudo add-apt-repository -y "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable" && \
sudo apt-get -y update && \
sudo apt-get -y install docker-ce
3)生成dockerfile,构建镜像
cat > Dockerfile <<eof
FROM tomcat:9.0-jdk17-openjdk-slim
RUN sed -i -e 's/deb.debian.org/mirrors.ustc.edu.cn/g' -e 's|security.debian.org/debian-security|mirrors.ustc.edu.cn/debian-security|g' /etc/apt/sources.list && \
apt-get update && apt-get install -y curl vi
add install.sh /opt/
run bash /opt/install.sh
expose 8080 8443 8009
cmd ["/usr/local/tomcat/bin/catalina.sh","run"]
eof
docker build -t tomcat-exporter:9.0-jdk17-openjdk-slim .
redis_exporter:
github: https://github.com/oliver006/redis_exporter
端口: 9121
13106
11323