OOMKilled
问题描述:某应用节点频繁重启
通过 describe 查看详情发现
kubectl -n <yournamespace> describe pod <yourapplicationpodid>
Command: java Args: -Denv=PRO -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/applog/MICRO-MTK -Xms3816M -Xmx3816M -jar /usr/local/app.jar --server.port=8080 State: Running Started: Wed, 01 Feb 2023 17:10:14 +0800 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 01 Feb 2023 15:38:35 +0800 Finished: Wed, 01 Feb 2023 17:10:13 +0800 Ready: True Restart Count: 1 Limits: cpu: 2 memory: 4Gi Requests: cpu: 1 memory: 4Gi
跟踪节点日志为发现oom 相关日志,排查代码未发现易造成内存溢出的逻辑
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/applog/MICRO-MTK
均未触发
推测原因是:
现在容器限制4g , jvm限制3.7g,垃圾回收不及时有可能在oom之前触发容器的上限,导致被kill
调整:
将xmx 和 xms 设置为2g
持续观察节点恢复正常
参考:
https://kubernetes.io/zh-cn/docs/concepts/configuration/manage-resources-containers/