012k8s node oom记录

一、腾讯云事件总线报警

(1)根据如下报警,如何查看受影响的应用及处理:

云服务产品告警通知
尊敬的腾讯云用户,您好!
您的腾讯云账号(账号 ID:xxx)云服务服务产品 云服务器 事件告警已触发,请您及时关注并处理。
告警事件:内存oom
告警产品:云服务器
告警资源:uuid:9fdaxxx11-fb289621ff11 | deviceLanIp:10.x.x.x | deviceWanIp: | uniqVpcId:vpc-xx | instance: ins-xxx
告警地域:ap-beijing
事件产生时间:2023-02-10 23:24:06 (UTC+08:00)
事件状态: 通知
查看更多详情,请登录腾讯云「事件总线」产品控制台(https://console.cloud.tencent.com/eb)查看与管理。

(2)登录账号xxx实例的机器ins-xxx

root@k8s-node04-xx-xx:/var/log# egrep -i "oom" syslog*
syslog.1:Feb 10 23:20:06 k8s-node04-xx kernel: [55411453.207593] calico-node invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-997
syslog.1:Feb 10 23:20:06 k8s-node04-xx kernel: [55411453.223999]  oom_kill_process.cold+0xb/0x10
syslog.1:Feb 10 23:20:06 k8s-node04-xx kernel: [55411453.346001] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
syslog.1:Feb 10 23:20:06 k8s-node04-xx kernel: [55411453.346975] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=3b348612155fb5723c5a039c24304e00b1032b2ebdc2ef2995ca7e7a35cfc261,mems_allowed=0,global_oom,task_memcg=/kubepods/besteffort/pod2b9362bd-274f-407f-a6f5-ee30cfe367f7/0620891f6eb730a6c8b86cfa9062d9ee0b63bd7bb872e88a2e65ab54f43a8818,task=java,pid=2939478,uid=0
syslog.1:Feb 10 23:20:06 k8s-node04-xx kernel: [55411453.347082] Out of memory: Killed process 2939478 (java) total-vm:3737592kB, anon-rss:2342156kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:4820kB oom_score_adj:1000
syslog.1:Feb 10 23:20:06 k8s-node04-xx kernel: [55411453.482554] oom_reaper: reaped process 2939478 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

根据pod id和java pid可以找到受到oom影响的应用,重点观察下即可;  

posted @ 2023-02-11 21:40  arun_yh  阅读(119)  评论(0编辑  收藏  举报