Monitoring-K8s事件监控kube-eventer
一、背景
-
监控是保障系统稳定性的重要组成部分,在Kubernetes开源生态中,资源类的监控工具与组件监控百花齐放。
- cAdvisor:kubelet内置的cAdvisor,监控容器资源,如容器cpu、内存;
- Kube-state-metrics:kube-state-metrics通过监听 API Server 生成有关资源对象的状态指标,主要关注元数据,比如 Deployment、Pod、副本状态等;
- metrics-server:metrics-server 也是一个集群范围内的资源数据聚合工具,是 Heapster 的替代品,k8s的HPA组件就会从metrics-server中获取数据;
- 还有node-exporter、各个官方、非官方的exporter,使用 Prometheus 来抓取这些数据然后存储,告警,可视化。但这些还远远不够。
-
监控的实时性与准确性不足
- 大部分资源监控都是基于推或者拉的模式进行数据离线,因此通常数据是每隔一段时间采集一次,如果在时间间隔内出现一些毛刺或者异常,而在下一个采集点到达时恢复,大部分的采集系统会吞掉这个异常。而针对毛刺的场景,阶段的采集会自动削峰,从而造成准确性的降低。
-
监控的场景覆盖范围不足
- 部分监控场景是无法通过资源表述的,比如Pod的启动停止,是无法简单的用资源的利用率来计量的,因为当资源为0的时候,我们是不能区分这个状态产生的真实原因。
-
基于上述两个问题,Kubernetes是怎么解决的呢?
-
目前k8s监控可以分为:资源监控,性能监控,安全健康等,但是在K8s中,如何表示一个资源对象的状态及一些列的资源状态转换,需要事件监控来表示,目前阿里有开源的K8s事件监控项目kube-eventer, 其将事件分为两种,一种是Warning事件,表示产生这个事件的状态转换是在非预期的状态之间产生的;另外一种是Normal事件,表示期望到达的状态,和目前达到的状态是一致的。
-
可以收集pod/node/kubelet等资源对象的event,还可以收集自定义资源对象的event,汇聚处理发送到配置好好的接受端,架构图如下所示。
二、事件监控
-
在Kubernetes中,事件分为两种,一种是Warning事件,表示产生这个事件的状态转换是在非预期的状态之间产生的;另外一种是Normal事件,表示期望到达的状态,和目前达到的状态是一致的。我们用一个Pod的生命周期进行举例,当创建一个Pod的时候,首先Pod会进入Pending的状态,等待镜像的拉取,当镜像录取完毕并通过健康检查的时候,Pod的状态就变为Running。此时会生成Normal的事件。而如果在运行中,由于OOM或者其他原因造成Pod宕掉,进入Failed的状态,而这种状态是非预期的,那么此时会在Kubernetes中产生Warning的事件。那么针对这种场景而言,如果我们能够通过监控事件的产生就可以非常及时的查看到一些容易被资源监控忽略的问题。
-
一个标准的Kubernetes事件有如下几个重要的属性,通过这些属性可以更好地诊断和告警问题。
- Namespace:产生事件的对象所在的命名空间。
- Kind:绑定事件的对象的类型,例如:Node、Pod、Namespace、Componenet等等。
- Timestamp:事件产生的时间等等。
- Reason:产生这个事件的原因。
- Message: 事件的具体描述。
# kubectl get event --all-namespaces
三、eventer介绍
- kube-eventer是Kubernetes社区中针对事件监控、报警、chatOps场景的开源组件,更多信息可以点击查看。在早期的kube-eventer中已经支持了钉钉、微信、slack等即时通信软件机器人的接入,但是每个机器人的演进速度、功能支持有所不同,造成开发者无法在不同的即时通信机器人之间拥有一致性的体验。为了解决这个问题,在最新版本的kube-eventer推出了支持泛化能力的Webhook Sink。开发者可以通过自定义请求地址、鉴权方式、请求体结构等内容支持各种类Webhook的事件离线信道。
- 针对Kubernetes的事件监控场景,Kuernetes社区在Heapter中提供了简单的事件离线能力,后来随着Heapster的废弃,相关的能力也一起被归档了。为了弥补事件监控场景的缺失,阿里云容器服务发布并开源了kubernetes事件离线工具kube-eventer。支持离线kubernetes事件到钉钉机器人、SLS日志服务、Kafka开源消息队列、InfluxDB时序数据库等等。
- 官方仓库地址 :https://github.com/AliyunContainerService/kube-eventer
- 支持下列通知方式 :
四、kube eventer部署
[root@node1 monitoring]# cat kube-eventer.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: kube-eventer
name: kube-eventer
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-eventer
template:
metadata:
labels:
app: kube-eventer
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
# 节点标签选择
nodeSelector:
service_role: monitoring
dnsPolicy: ClusterFirstWithHostNet
serviceAccount: kube-eventer
containers:
- image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.2.0-484d9cd-aliyun
name: kube-eventer
command:
- "/kube-eventer"
- "--source=kubernetes:https://kubernetes.default"
## .e.g,dingtalk sink demo
#- --sink=dingtalk:[your_webhook_url]&label=[your_cluster_id]&level=[Normal or Warning(default)]
#- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=07055f32-a04e-4ad7-9cb1-d22352769e1c&level=Warning&label=oa-k8s
- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=07055f32-a04e-4ad7-9cb1-d223&level=Warning&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body&custom_body_configmap_namespace=kube-system&method=POST
env:
# If TZ is assigned, set the TZ value as the time zone
- name: TZ
value: "Asia/Shanghai"
volumeMounts:
- name: localtime
mountPath: /etc/localtime
readOnly: true
- name: zoneinfo
mountPath: /usr/share/zoneinfo
readOnly: true
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 250Mi
volumes:
- name: localtime
hostPath:
path: /etc/localtime
- name: zoneinfo
hostPath:
path: /usr/share/zoneinfo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-eventer
rules:
- apiGroups:
- ""
resources:
- events
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-eventer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-eventer
subjects:
- kind: ServiceAccount
name: kube-eventer
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-eventer
namespace: kube-system
---
apiVersion: v1
data:
content: >-
{"msgtype": "text","text": {"content": "EventType:{{ .Type }}\nEventNamespace:{{ .InvolvedObject.Namespace }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventObject:{{ .InvolvedObject.Name }}\nEventReason:{{ .Reason }}\nEventTime:{{ .LastTimestamp }}\nEventMessage:{{ .Message }}"}}
kind: ConfigMap
metadata:
name: custom-webhook-body
namespace: kube-system
- Webhook Sink支持根据事件的Kind、Reason、Level、Namespace进行过滤,支持通过泛化的逻辑将数据离线给自定义Webhook系统、钉钉、微信、贝洽(bear chat)、slack等等。那么如何使用泛化的Webhook来实现上述的逻辑呢?首先我们先来看下Webhook Sink的参数与定义。
# --sink=webhook:<WEBHOOK_URL>&level=<Normal or Warning, Warning default>
--sink=webhook:<WEBHOOK_URL>?level=Warning&namespaces=ns1,ns2&kinds=Node,Pod&method=POST&header=customHeaderKey=customHeaderValue
- level -事件级别(可选。默认:警告。选项:警告和正常)
- namespaces -要过滤的名称空间(可选。默认值:所有名称空间,使用逗号分隔多个名称空间,Regexp模式支持)
- kinds -要过滤的种类(可选。默认:所有种类,使用逗号分隔多个种类。选项:Node,Pod等。)
- reason-进行过滤的原因(可选。默认值:空,支持Regexp模式)。您可以在查询中使用多原因字段。
- method -发送请求的方法(可选。默认:GET)
- header-请求中的标头(可选。默认值:空)。您可以在查询中使用多标题字段。
- custom_body_configmap-请求主体模板的configmap名称。您可以使用模板来自定义请求主体。(可选的。)
- custom_body_configmap_namespace-请求主体模板的configmap命名空间。(可选的。)
- 其中level、namespaces、kinds、reason都是用来过滤的Filter,其中reson是支持正则的,可以通过标准的正则表达式提供更强大的过滤能力,并且reason可以在参数中设置多条,例如reaon=(a|b)&reson=(c|d)。默认情况下webhook的body为
{
"EventType": "{{ .Type }}",
"EventKind": "{{ .InvolvedObject.Kind }}",
"EventReason": "{{ .Reason }}",
"EventTime": "{{ .EventTime }}",
"EventMessage": "{{ .Message }}"
}
- 开发者可以通过解析这个Json格式的Body获取事件的内容,此外开发者也可以通过custom_body_configmap与custom_body_configmap_namespace字段进行自定义设置。其中configmap的结构如下,默认kube-eventer会从configmap的content字段中获取Body的内容。
apiVersion: v1
data:
content: >-
{"EventType": "{{ .Type }}","EventKind": "{{ .InvolvedObject.Kind }}""EventReason": "{{
.Reason }}","EventTime": "{{ .EventTime }}","EventMessage": "{{ .Message
}}"}
kind: ConfigMap
metadata:
name: custom-webhook-body
namespace: kube-system
五、泛化Webhook配置例子
5.1、钉钉
- 参数示例
--sink=webhook:https://oapi.dingtalk.com/robot/send?access_token=token&level=Normal&kinds=Pod&header=Content-Type=application/json&custom_body_configmap=custom-body&custom_body_configmap_namespace=kube-system&method=POST
- configmap内容
{"msgtype": "text","text": {"content":"EventType:{{ .Type }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventReason:{{ .Reason }}\nEventTime:{{ .EventTime }}\nEventMessage:{{ .Message }}"},"markdown": {"title":"","text":""}}
5.2、微信
- 参数示例
--sink=webhook:http://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=633a31f6-7f9c-4bc4-97a0-0ec1eefa5898&level=Normal&kinds=Pod&header=Content-Type=application/json&custom_body_configmap=custom-body&custom_body_configmap_namespace=kube-system&method=POST
- configmap内容
{"msgtype": "text","text": {"content": "EventType:{{ .Type }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventReason:{{ .Reason }}\nEventTime:{{ .EventTime }}\nEventMessage:{{ .Message }}"}}
5.3、slack
- 参数示例
--sink=webhook:https://hooks.slack.com/services/d/B00000000/XXX?&level=Normal&kinds=Pod&header=Content-Type=application/json&custom_body_configmap=custom-body&custom_body_configmap_namespace=kube-system&method=POST
- configmap内容
{"channel": "testing","username": "Eventer","text":"EventType:{{ .Type }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventReason:{{ .Reason }}\nEventTime:{{ .EventTime }}\nEventMessage:{{ .Message }}"}
5.4、贝洽(bear chat)
- 参数示例
--sink=webhook:https://hook.bearychat.com/=bwIsS/incoming/xxxxxx?kinds=Pod&header=Content-Type=application/json&custom_body_configmap=custom-body&custom_body_configmap_namespace=kube-system&method=POST
- configmap内容
{"text":"EventType:{{ .Type }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventReason:{{ .Reason }}\nEventTime:{{ .EventTime }}\nEventMessage:{{ .Message }}"
5.5、Worktile Webhook
- 参数示例
- --sink=webhook:http://10.2.3.235:10010/eventer/?level=Warning&namespaces=rc,kubewatch&kinds=Pod&custom_body_configmap=custom-webhook-body&custom_body_configmap_namespace=kube-system&method=POST
- configmap内容
---
apiVersion: v1
data:
content: >-
{"EventType": "{{ .Type }}\n","EventKind": "{{ .InvolvedObject.Kind }}\n","EventObject": "{{ .InvolvedObject.Name }}\n","EventReason": "{{
.Reason }}\n","EventTime": "{{ .LastTimestamp }}\n","EventMessage": "{{ .Message}}"}
kind: ConfigMap
metadata:
name: custom-webhook-body
namespace: kube-system
- hook proxy server
from flask import Flask
from flask import request, jsonify, abort
from flask import make_response,Response
import datetime
import json
import requests
import re
# worktile webhook
wt_webhook_url = 'https://hook.worktile.com/incoming/xxxxxxxxxxxxxxxxx'
def worktile_webhook(hookurl=wt_webhook_url, string=None):
print("string", string)
print('send message to worktile webhook ')
headers = {
"Content-Type": "application/json; charset=UTF-8",
}
Data = {
"attachment": {
"fallback" : "Kubernetes Event Driven",
"color" : "#51bcb6",
"title" : "Kubernetes Event Driven",
"text" : re.sub(r'\\n', '\n', string)
}
}
response = requests.post(hookurl, data=json.dumps(Data), headers=headers).text
# 字段格式化
class Json_eventer(object):
@staticmethod
def js(payload):
print("js", payload)
"""
{
"EventType": "Warning",
"EventKind": "Pod"
"EventReason": "Failed",
"EventTime": "0001-01-01 00:00:00 +0000 UTC",
"EventMessage": "Error: ImagePullBackOff"
}
"""
worktile_webhook(hookurl=wt_webhook_url, string=payload)
app = Flask(__name__)
@app.route('/kubewatch/', methods=['GET', 'POST'])
def webhook():
if request.method == 'GET':
return jsonify({'status': 'success'}), 200
elif request.method == 'POST':
data = request.get_data()
payload = bytes.decode(data, 'UTF-8')
# 待优化项
payload = re.sub(r'[{}]', '', payload)
payload = re.sub(r'"', '', payload)
payload = re.sub(r',', '', payload)
isim = Json_eventer().js(payload=payload)
return jsonify({'status': 'success'}), 200
else:
abort(400)
if __name__ == '__main__':
app.run(host='0.0.0.0', port='10010')
- eventer告警通知
六、Kubernetes Event Daily statistical
from flask import Flask
from flask import request, jsonify, abort
from flask import make_response,Response
import datetime
import json
import requests
import re
from prettytable import PrettyTable
# worktile webhook
wt_webhook_url = 'https://hook.worktile.com/incoming/xxxxxxxxxxxxxxxxx'
def worktile_webhook(hookurl=wt_webhook_url, string=None):
print("string", string)
print('send message to worktile webhook ')
headers = {
"Content-Type": "application/json; charset=UTF-8",
}
Data = {
"attachment": {
"fallback" : "Kubernetes Event Driven",
"color" : "#51bcb6",
"title" : "Kubernetes Event Driven",
"text" : re.sub(r'\\n', '\n', string)
}
}
response = requests.post(hookurl, data=json.dumps(Data), headers=headers).text
services = {}
def webhook_eventer_statistical(hookurl=wt_webhook_url,string=None, gets=None):
if string:
string = re.sub(r'\\n', '\n', string)
sv = string.split('\n')[2].split(':')[1][:-17]
tm = string.split('\n')[4][22:-10]
# tz = pytz.timezone('Asia/Shanghai')
# now_time = datetime.datetime.now(tz).strftime('%H:%M')
# start_str = '20:00'
# start_time = datetime.datetime.strptime(start_str, '%H:%M').strftime('%H:%M')
# if not string:
if sv not in services:
services[str(sv)] = [str(tm), str(sv), len([str(tm), ], )]
else:
services[str(sv)][0] = str(services[str(sv)][0]) + '\n' + str(tm)
services[str(sv)][2] = services[str(sv)][2] + 1
# print(services)
if gets == 'get':
# table = PrettyTable([ '当前日期',' 失败时间 ', ' 资源名称', ' 失败次数 '])
table = PrettyTable([' FailureTime ', ' ResourcesName ', ' FailuresNumber '])
for x in services.values():
table.add_row(x)
table.border = False
# table.junction_char = "*"
# table.horizontal_char = '-'
# table.vertical_char = '|'
# # table.align["数"] = 'l'
table = str(table)
with open('./eventer.txt', 'w', encoding='utf-8') as fp:
fp.write(table)
headers = {
"Content-Type": "application/json; charset=UTF-8",
# "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36",
}
Data = {
"attachment": {
"fallback": "Kubernetes Event Daily statistical ",
"color": "#DC143C",
"title": "Kubernetes Event Daily statistical",
"text": table
}
}
print(Data, 'worktile')
response = requests.post(hookurl, data=json.dumps(Data), headers=headers).text
services.clear()
# 字段格式化
class Json_eventer(object):
@staticmethod
def js(payload):
print("js", payload)
"""
{
"EventType": "Warning",
"EventKind": "Pod"
"EventReason": "Failed",
"EventTime": "0001-01-01 00:00:00 +0000 UTC",
"EventMessage": "Error: ImagePullBackOff"
}
"""
worktile_webhook(hookurl=wt_webhook_url, string=payload)
webhook_eventer_statistical(wt_webhook_url, string=payload)
app = Flask(__name__)
@app.route('/kubewatch/', methods=['GET', 'POST'])
def webhook():
if request.method == 'GET':
return jsonify({'status': 'success'}), 200
elif request.method == 'POST':
data = request.get_data()
payload = bytes.decode(data, 'UTF-8')
# 待优化项
payload = re.sub(r'[{}]', '', payload)
payload = re.sub(r'"', '', payload)
payload = re.sub(r',', '', payload)
isim = Json_eventer().js(payload=payload)
return jsonify({'status': 'success'}), 200
else:
abort(400)
@app.route('/eventer/statistical', methods=['GET'])
def eventer_statistical_webhook():
if request.method == 'GET':
webhook_eventer_statistical(hookurl=wt_webhook_url,gets='get')
return jsonify({'status': 'success'}), 200
else:
abort(400)
if __name__ == '__main__':
app.run(host='0.0.0.0', port='10010')