K8s中构建生产级EFK日志系统
在Kubernetes中构建生产级EFK日志系统的完整指南
日志管理是Kubernetes生产环境的核心需求之一。本文将手把手教你如何通过EFK(Elasticsearch+Fluentd+Kibana)实现高效、可靠的日志统一管理,所有配置均符合生产环境标准。
一、架构设计:生产级EFK的关键点
-
核心组件角色
- Fluentd:以DaemonSet形式运行在每个节点,实时采集容器、节点、K8S事件日志
- Elasticsearch:集群化部署,承担日志存储与检索
- Kibana:提供可视化查询界面
- 生产增强:需考虑集群高可用、安全认证、性能优化、日志生命周期管理
-
数据流向
容器日志 -> Fluentd收集 -> Elasticsearch存储 -> Kibana可视化
二、生产级Elasticsearch部署
使用Helm部署(推荐)
helm repo add elastic https://helm.elastic.co
helm install elasticsearch elastic/elasticsearch \
--version 7.17.3 \
-f values-prod.yaml
values-prod.yaml关键配置
# 集群节点配置
nodeGroups:
- name: master
roles: [ "master" ]
replicas: 3
persistence:
enabled: true
size: 100Gi
- name: data
roles: [ "data" ]
replicas: 3
persistence:
enabled: true
size: 500Gi
- name: ingest
roles: [ "ingest" ]
replicas: 2
# 资源限制
resources:
requests:
memory: "4Gi"
cpu: "1"
limits:
memory: "8Gi"
# 安全配置
securityContext:
fsGroup: 1000
runAsUser: 1000
# TLS证书配置
secretMounts:
- name: elastic-certificates
secretName: elastic-certificates
path: /usr/share/elasticsearch/config/certs
生产注意项
- 分离master/data/ingest角色节点
- 使用本地SSD或高性能云盘持久化存储
- 配置JVM堆内存(不超过物理内存的50%)
- 启用X-Pack安全模块(TLS加密、RBAC权限)
三、Fluentd优化配置
DaemonSet配置示例
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: logging
spec:
template:
spec:
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1.16.1-debian-elasticsearch8-1.1
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch.logging.svc.cluster.local"
- name: FLUENT_ELASTICSEARCH_SCHEME
value: "https"
- name: FLUENT_ELASTICSEARCH_SSL_VERIFY
value: "true"
- name: FLUENT_ELASTICSEARCH_SSL_VERSION
value: "TLSv1_2"
resources:
limits:
memory: 512Mi
requests:
cpu: 100m
memory: 200Mi
关键优化配置
- 多行日志处理(Java堆栈跟踪)
<filter kubernetes.**> @type concat key log multiline_start_regexp /^\d{4}-\d{2}-\d{2}/
- 日志缓冲机制
<buffer> @type file path /var/log/fluentd-buffer chunk_limit_size 32MB total_limit_size 8GB flush_interval 5s retry_max_times 5
- 敏感信息过滤
<filter **> @type grep <exclude> key message pattern /password|token|secret/ </exclude>
四、Kibana安全部署
Ingress配置示例
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kibana
annotations:
nginx.ingress.kubernetes.io/auth-url: "https://auth.example.com/oauth2/auth"
nginx.ingress.kubernetes.io/auth-signin: "https://auth.example.com/oauth2/start?rd=$escaped_request_uri"
spec:
tls:
- hosts:
- kibana.example.com
secretName: tls-cert
rules:
- host: kibana.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kibana
port:
number: 5601
生产建议
- 强制HTTPS访问
- 集成企业级SSO认证(OAuth2/OIDC)
- 配置审计日志
- 定期备份可视化仪表板配置
五、日志采集策略
-
标准输出采集(推荐)
# 应用Dockerfile配置 CMD ["java", "-jar", "app.jar", ">>", "/proc/1/fd/1"]
-
Sidecar模式处理文件日志
apiVersion: v1 kind: Pod metadata: name: file-log-app spec: containers: - name: app image: my-app:latest volumeMounts: - name: app-logs mountPath: /var/log/app - name: log-collector image: fluent/fluentd:v1.16-1 volumeMounts: - name: app-logs mountPath: /var/log/app env: - name: FLUENTD_CONF value: file-log.conf volumes: - name: app-logs emptyDir: {}
六、生产环境验证
-
日志完整性测试
# 生成测试日志 kubectl run log-test --image=busybox -- /bin/sh -c "while true; do echo 'Test log message'; sleep 5; done" # Kibana中查询 kubernetes.pod_name:"log-test" AND message:"Test log message"
-
性能压测指标
指标 预期值 日志采集延迟 < 5秒 ES索引速率 > 5000 docs/秒 Fluentd CPU使用率 < 50% per节点
七、运维管理进阶
-
日志生命周期策略
PUT _ilm/policy/log_policy { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "50GB", "max_age": "7d" } } }, "delete": { "min_age": "30d", "actions": { "delete": {} } } } } }
-
监控告警配置
- Elasticsearch:集群健康状态、分片分配、磁盘使用率
- Fluentd:缓冲队列长度、输出错误率
- Kibana:HTTP请求延迟、活跃连接数
常见故障排查
-
日志未出现在Kibana
- 检查Fluentd日志:
kubectl logs -f fluentd-xxxxx
- 验证Elasticsearch索引模板是否匹配
- 检查网络策略是否放行9300/9200端口
- 检查Fluentd日志:
-
日志采集延迟高
- 调整Fluentd的flush_interval和chunk_limit_size
- 增加Elasticsearch的ingest节点数量
- 检查节点IO性能
通过以上配置,您将获得一个具备企业级可靠性的日志管理系统。建议定期进行灾难恢复演练,并持续监控系统健康状态。