pod日志采集-DaemonSet(ElFK方案)
采集方案
部署方式是采用DaemonSet的方式,采集时按照k8s集群的namespace进行分类,然后根据namespace的名称创建不同的topic到kafka中
K8S-日志文件说明
一般情况下,容器中的日志在输出到标准输出(stdout)时,会以-json.log的命名方式保存在/var/lib/docker/containers目录中,当然如果修改了docker的数据目录,那就是在修改后的数据目录中了,例如:
这里能看到,有这么个文件: /data/docker/containers/container id/-json.log,然后k8s默认会在/var/log/containers和/var/log/pods目录中会生成这些日志文件的软连接,如下所示:
cattle-node-agent-tvhlq_cattle-system_agent-8accba2d42cbc907a412be9ea3a628a90624fb8ef0b9aa2bc6ff10eab21cf702.log
etcd-k8s-master01_kube-system_etcd-248e250c64d89ee6b03e4ca28ba364385a443cc220af2863014b923e7f982800.log
然后,会看到这个目录下存在了此宿主机上的所有容器日志,文件的命名方式为:
[podName]_[nameSpace]_[depoymentName]-[containerId].log
上面这个是deployment的命名方式,其他的会有些不同,例如:DaemonSet,StatefulSet等,不过所有的都有一个共同点,就是
*_[nameSpace]_*.log
kafka部署
operator部署
Strimzi是目前最主流的operator方案。集群数据量较小的话,可以采用NFS共享存储,数据量较大的话可使用local pv存储。
opertor下载
https://strimzi.io/downloads/
查看对应的版本
选择.tgz下载
安装
[root@master01 kafka]# ll
总用量 80
drwxr-xr-x 4 root root 122 10月 25 16:09 strimzi-kafka-operator
-rw-r--r-- 1 root root 81283 10月 25 16:06 strimzi-kafka-operator-helm-3-chart-0.31.1.tgz
[root@master01 kafka]# helm install strimzi ./strimzi-kafka-operator
NAME: strimzi
LAST DEPLOYED: Sun Oct 8 21:16:31 2023
NAMESPACE: kafka
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing strimzi-kafka-operator-0.31.1
2.资源清单下载
下载对应版本的yaml清单
https://github.com/strimzi/strimzi-kafka-operator/releases
wget https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.31.1/strimzi-0.31.1.tar.gz
解压
tar -xf strimzi-0.31.1.tar.gz
yaml说明
- kafka-persistent.yaml:部署具有三个 ZooKeeper 和三个 Kafka 节点的持久集群。(推荐)
- kafka-jbod.yaml:部署具有三个 ZooKeeper 和三个 Kafka 节点(每个节点使用多个持久卷)的持久集群。
- kafka-persistent-single.yaml:部署具有单个 ZooKeeper 节点和单个 Kafka 节点的持久集群。
- kafka-ephemeral.yaml:部署具有三个 ZooKeeper 和三个 Kafka 节点的临时群集。
- kafka-ephemeral-single.yaml:部署具有三个 ZooKeeper 节点和一个 Kafka 节点的临时群集。
[root@master01 kafka20231025]# cd strimzi-0.31.1/examples/kafka/
[root@master01 kafka]# ll
总用量 20
-rw-r--r-- 1 redis docker 713 9月 21 2022 kafka-ephemeral-single.yaml
-rw-r--r-- 1 redis docker 713 9月 21 2022 kafka-ephemeral.yaml
-rw-r--r-- 1 redis docker 957 9月 21 2022 kafka-jbod.yaml
-rw-r--r-- 1 redis docker 865 9月 21 2022 kafka-persistent-single.yaml
-rw-r--r-- 1 redis docker 865 9月 21 2022 kafka-persistent.yaml
创建pvc/pv
此处以nfs存储为例,提前创建pvc资源,分别用于3个zookeeper和3个kafka持久化存储数据使用。
- 创建pv
[root@master01 kafka]# cat kafka-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-my-cluster-zookeeper-0
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-my-cluster-zookeeper-1
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-my-cluster-zookeeper-2
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-0-my-cluster-kafka-0
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-0-my-cluster-kafka-1
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-0-my-cluster-kafka-2
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
- 创建pvc
[root@master01 kafka]# cat kafka-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-my-cluster-zookeeper-0
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-my-cluster-zookeeper-1
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-my-cluster-zookeeper-2
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-0-my-cluster-kafka-0
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-0-my-cluster-kafka-1
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: data-0-my-cluster-kafka-2
namespace: kafka
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
安装验证
[root@master01 kafka]# kubectl get pod -n kafka
NAME READY STATUS RESTARTS AGE
my-cluster-entity-operator-7c68d4b9d9-tg56j 3/3 Running 0 2m15s
my-cluster-kafka-0 1/1 Running 0 2m54s
my-cluster-kafka-1 1/1 Running 0 2m54s
my-cluster-kafka-2 1/1 Running 0 2m54s
my-cluster-zookeeper-0 1/1 Running 0 3m19s
my-cluster-zookeeper-1 1/1 Running 0 3m19s
my-cluster-zookeeper-2 1/1 Running 0 3m19s
strimzi-cluster-operator-56fdbb99cb-gznkw 1/1 Running 0 97m
kafka-ui部署
docker run -d \
-p 9096:8080 \
-v /data/kafka-client:/usr/local/kafka-map/data \
-e DEFAULT_USERNAME=admin \
-e DEFAULT_PASSWORD=admin \
--name kafka-map \
--restart always dushixiang/kafka-map:latest
filebeat部署
部署采用的DaemonSet方式进行,这里没有啥可说的,参照官方文档直接部署即可
filebeat-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: kube-system
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
labels:
k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["apps"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: kube-system
labels:
k8s-app: filebeat
filebeat-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: kube-system
data:
filebeat.yml: |-
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_pi6000_*log
fields:
log_topic: pi6000
env: dev
multiline.pattern: '(^\[[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\])|(^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3})|(^[0-9]{2}:[0-9]{2}:[0-9]{2})'
multiline.negate: true
multiline.match: after
multiline.max_lines: 100
- type: container
enabled: true
paths:
- /var/log/containers/*_default_*log
fields:
log_topic: default
env: dev
multiline.pattern: '(^\[[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\])|(^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3})|(^[0-9]{2}:[0-9]{2}:[0-9]{2})'
multiline.negate: true
multiline.match: after
multiline.max_lines: 100
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
processors:
#添加k8s元数据信息
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
#移除多余的字段
- drop_fields:
fields:
- host
- ecs
- log
- agent
- input
- stream
- container
- kubernetes.pod.uid
- kubernetes.namespace_uid
- kubernetes.namespace_labels
- kubernetes.node.uid
- kubernetes.node.labels
- kubernetes.replicaset
- kubernetes.labels
- kubernetes.node.name
ignore_missing: true
- script:
lang: javascript
id: format_time
tag: enable
source: |
function process(event) {
var str = event.Get("message");
// 用括号提取时间戳
var regex = /^\[(.*?)\]/;
var match = str.match(regex);
if (match && match.length > 1) {
var time = match[1]; //提取的不带括号的时间戳
event.Put("time", time);
}
// 提取不带括号的时间戳
var regex2 = /^\d{2}:\d{2}:\d{2}/;
var match2 = str.match(regex2);
if (match2) {
time = match2[0]; // Extracted timestamp
event.Put("time", time);
}
}
#优化层级结构
- script:
lang: javascript
id: format_k8s
tag: enable
source: |
function process(event) {
var k8s = event.Get("kubernetes");
var newK8s = {
podName: k8s.pod.name,
nameSpace: k8s.namespace,
imageAddr: k8s.container.name,
hostName: k8s.node.hostname
};
event.Put("k8s", newK8s);
}
#添加时间,可以在logstash处理
- timestamp:
field: time
timezone: Asia/Shanghai
layouts:
- '2006-01-02 15:04:05'
- '2006-01-02 15:04:05.999'
test:
- '2019-06-22 16:33:51'
output.kafka:
hosts: ["my-cluster-kafka-brokers.kafka.svc:9092"]
topic: '%{[fields.log_topic]}'
partition.round_robin:
reachable_only: true
required_acks: -1
compression: gzip
filebeat-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: kube-system
labels:
k8s-app: filebeat
spec:
selector:
matchLabels:
k8s-app: filebeat
template:
metadata:
labels:
k8s-app: filebeat
spec:
serviceAccountName: filebeat
terminationGracePeriodSeconds: 30
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: filebeat
image: registry.us-east-1.aliyuncs.com/oll/filebeat:7.12.0
args: [
"-c", "/etc/filebeat.yml",
"-e",
]
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: config
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 0640
name: filebeat-config
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
# data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
# When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
path: /var/lib/filebeat-data
type: DirectoryOrCreate
部署
[root@master01 ds-filebeat-7.12]# kubectl apply -f .
[root@master01 ds-filebeat-7.12]# kubectl -n kube-system get pods | grep filebeat
filebeat-5pvvq 1/1 Running 0 74m
filebeat-74rbc 1/1 Running 0 74m
filebeat-md8k4 1/1 Running 0 74m
filebeat-ssg6g 1/1 Running 0 74m
filebeat-stlxt 1/1 Running 0 74m
访问kafka数据验证
查看kafka topic信息,已经成功创建了名为pi6000的topic
可以看到使用 Filebeat 优化了日志的层级结构和字段,使数据更易于分析
logstash部署
es集群部署可参考:https://www.cnblogs.com/Unstoppable9527/p/18329622
安装
[root@node03 ~]# wget https://artifacts.elastic.co/downloads/logstash/logstash-8.8.2-x86_64.rpm
[root@node03 ~]# rpm -ivh logstash-8.8.2-x86_64.rpm
[root@node03 ~]# systemctl enable logstash
Created symlink /etc/systemd/system/multi-user.target.wants/logstash.service → /usr/lib/systemd/system/logstash.service.
增加环境变量
[root@node03 ~]# vim /etc/profile
export PATH=$PATH:/usr/share/logstash/bin
[root@node03 ~]# source /etc/profile
[root@node03 ~]# logstash -V
Using bundled JDK: /usr/share/logstash/jdk
logstash 8.8.2
修改pipeline文件
[root@node03 ~]# cat > /etc/logstash/conf.d/log-to-es.conf << EOF
input {
kafka {
bootstrap_servers => "my-cluster-kafka-brokers.kafka.svc:9092"
auto_offset_reset => "latest" # 从最新的偏移量开始消费
decorate_events => true # 此属性会将当前topic、offset、group、partition等信息也带到message中
topics => ["pi6000"]
group_id => "test"
#消费3个分区,并行处理提高消费
consumer_threads => 3
codec => "json"
}
}
filter {
#匹配多种日志格式
grok {
match => { "message" => [
"%{TIMESTAMP_ISO8601:log_time} +%{LOGLEVEL:log_level} +%{JAVACLASS:log_class} +%{INT:log_line} - %{GREEDYDATA:log_message}",
"%{TIMESTAMP_ISO8601:log_time} +%{LOGLEVEL:log_level}\s*-%{GREEDYDATA:log_message}",
"%{TIMESTAMP_ISO8601:log_time} +%{LOGLEVEL:log_level} +%{JAVACLASS:log_class} - %{GREEDYDATA:log_message}",
"\[%{TIMESTAMP_ISO8601:log_time}\] +%{LOGLEVEL:log_level} +%{JAVACLASS:log_class} +%{INT:log_line} - %{GREEDYDATA:log_message}",
"%{TIME:log_time} \[%{DATA:thread}\] \[%{DATA:empty_field}\] %{LOGLEVEL:log_level}\s*%{JAVACLASS:log_class} - %{GREEDYDATA:log_message}",
"%{TIME:log_time} %{LOGLEVEL:log_level}\s* - %{GREEDYDATA:log_message}"
]}
}
#移除不匹配的message,生产中不建议用哦
if "_grokparsefailure" in [tags] {
drop {}
}
#移除不需要的字段
mutate {
#remove_field => [ "message","ecs","agent","log","host","input" ]
rename => { "[kubernetes][container][image]" => "[kubernetes][container][image_name]" }
remove_field => [ "ecs","agent","log","host","input" ]
}
date {
match => ["log_time", "yyyy-MM-dd HH:mm:ss.SSS"]
target => "@timestamp"
timezone => "Asia/Shanghai" # 根据你的时区进行调整
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["https://10.0.0.14:9200", "https://10.0.0.15:9200", "https://10.0.0.16:9200"]
index => "pi6000-ds-%{+yyyy.MM.dd}"
template_overwrite => true
user => "elastic"
password => "123456"
ssl => true
ssl_certificate_verification => false
}
}
EOF
启动logstash
[root@node03 ~]# systemctl start logstash
kibana分析数据
kibana部署可参考 https://www.cnblogs.com/Unstoppable9527/p/18329632
- 可以根据deployment对服务进行区分
总结
个人认为,在日志收集的第一层使用 Filebeat 进行一些预处理,可以缩短整个流程的处理时间,因为瓶颈往往出现在 Elasticsearch 和 Logstash 上。因此,尽可能在 Filebeat 阶段完成耗时操作,如果无法处理再使用 Logstash。另一个容易被忽视的重要点是简化日志内容,这显著减少了日志体积。我做过测试,相同数量的日志,未经简化的体积为 20GB,而经过优化后的体积不到 10GB,这对整个 Elasticsearch 集群的性能提升非常显著。此外,可以通过版本控制来管理 Filebeat 的配置文件,这在维护时提供了记录和变更管理的便利。
本文来自博客园,作者:&UnstopPable,转载请注明原文链接:https://www.cnblogs.com/Unstoppable9527/p/18334767