OOM Killer机制
简介
- 这是Linux内核的一个机制, 用于监控占用内存过大, 尤其是瞬间占用内存很快的进程, 在内存将要耗尽时判断哪个进程最坏(打个分), 分最高就把它杀掉。
如何查看
-
shell(ubuntu18.04LTS看了下没有messages目录, CentOS7是有的)
grep "Out of memory" /var/log/messages
触发时机
-
内核在触发OOM机制时会调用到out_of_memory()函数, 其调用顺序如下:
__alloc_pages //内存分配时调用 |-->__alloc_pages_nodemask |--> __alloc_pages_slowpath |--> __alloc_pages_may_oom // 调用前先会判断flag: oom_killer_disabled的值, 默认为0, 表示打开OOM_kill | --> out_of_memory //触发
-
Linux中内存都是以page的形式管理的, 不关咋么申请内存, 都会调用alloc_page()函数, 最终调用到函数out_of_memory(), 触发OOM机制。
-
内核监测到系统内存不足时触发out_of_memory()函数, 以下为源码:
/** * out_of_memory - kill the "best" process when we run out of memory * 当内存溢出时杀死打分最高的进程 * @oc: pointer to struct oom_control oc: 指向oom_control结构体的一个指针 * * If we run out of memory, we have the choice between either * killing a random task (bad), letting the system crash (worse) * OR try to be smart about which process to kill. * 如果内存溢出, 我们可以选择随便杀死一个进程使系统崩溃或者明智地选择要杀死的进程。 * Note that we don't have to be perfect here, we just have to be good. * 我们不需要做的完美, 只要做好就行了。 */ bool out_of_memory(struct oom_control *oc) { // 释放量 unsigned long freed = 0; // oom限制策略? enum oom_constraint constraint = CONSTRAINT_NONE; // 如果关闭了oom killer机制 if (oom_killer_disabled) return false; // memcg是Linux内核中用于管理cgroup中kernel 内存的模块 if (!is_memcg_oom(oc)) { // 阻塞唤醒调用链 blocking_notifier_call_chain(&oom_notify_list, 0, &freed); // 如果有释放量 if (freed > 0) /* Got some memory back in the last second. */ return true; } /* * If current has a pending SIGKILL or is exiting, then automatically * select it. * 如果当前一个待决或存在的终止信号, 就自动选择它。 * The goal is to allow it to allocate so that it may * quickly exit and free its memory. * 目的是允许它自动分配从而使它快速5退出并释放其内存 */ if (task_will_free_mem(current)) { // 标记 oom受害者? mark_oom_victim(current); // 唤醒 oom收割者? 名字好中二啊 wake_oom_reaper(current); return true; } /* * The OOM killer does not compensate for IO-less reclaim. * OOM Killer机制并不会对少IO的内存重申进行补偿 * pagefault_out_of_memory lost its gfp context so we have to * make sure exclude 0 mask - all other users should have at least * ___GFP_DIRECT_RECLAIM to get here. */ if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) return true; /* * Check if there were limitations on the allocation (only relevant for * NUMA and memcg) that may require different handling. * 检查分配是否有限制, 可能需要不同的处理。 */ constraint = constrained_alloc(oc); if (constraint != CONSTRAINT_MEMORY_POLICY) oc->nodemask = NULL; check_panic_on_oom(oc, constraint); if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task && current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) && current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { get_task_struct(current); oc->chosen = current; oom_kill_process(oc, "Out of memory (oom_kill_allocating_task)"); return true; } select_bad_process(oc); //选择一个“最坏的”进程杀掉。 /* Found nothing?!?! */ if (!oc->chosen) { dump_header(oc, NULL); pr_warn("Out of memory and no killable processes...\n"); /* * If we got here due to an actual allocation at the * system level, we cannot survive this and will enter * an endless loop in the allocator. Bail out now. */ if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) panic("System is deadlocked on memory\n"); } if (oc->chosen && oc->chosen != (void *)-1UL) oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : "Memory cgroup out of memory"); return !!oc->chosen; }
-
选一个最坏的进程
/* * Simple selection loop. We choose the process with the highest number of * 'points'. In case scan was aborted, oc->chosen is set to -1. * 简单的选择循环, 我们选择打分最高的进程。为防止扫描退出, oc指针对应的oom_control的chosen flag被设置为-1 */ static void select_bad_process(struct oom_control *oc) { if (is_memcg_oom(oc)) mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc); else { struct task_struct *p; // 加了个读锁, RCU: Read-Copy Update rcu_read_lock(); // 遍历进程 for_each_process(p) if (oom_evaluate_task(p, oc)) break; rcu_read_unlock(); } oc->chosen_points = oc->chosen_points * 1000 / oc->totalpages; }
-
杀掉进程
static void oom_kill_process(struct oom_control *oc, const char *message) { struct task_struct *victim = oc->chosen; struct mem_cgroup *oom_group; static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); /* * If the task is already exiting, don't alarm the sysadmin or kill * its children or threads, just give it access to memory reserves * so it can die quickly * 如果任务已经存在, 不要警告系统管理员或杀死该任务的子任务或子线程, 只需要给予它内存保留, 它很快就会结束。 */ task_lock(victim); if (task_will_free_mem(victim)) { mark_oom_victim(victim); wake_oom_reaper(victim); task_unlock(victim); put_task_struct(victim); return; } task_unlock(victim); if (__ratelimit(&oom_rs)) dump_header(oc, victim); /* * Do we need to kill the entire memory cgroup? * Or even one of the ancestor memory cgroups? * 我们需要杀死整个内存组?还是说甚至包括一个此内存组之前的内存组 * Check this out before killing the victim task. * 在杀死受害任务之前检查此 */ oom_group = mem_cgroup_get_oom_group(victim, oc->memcg); __oom_kill_process(victim, message); /* * If necessary, kill all tasks in the selected memory cgroup. * 如果需要, 就杀死被选中的内存组中的所有任务 */ if (oom_group) { mem_cgroup_print_oom_group(oom_group); mem_cgroup_scan_tasks(oom_group, oom_kill_memcg_member, (void*)message); mem_cgroup_put(oom_group); } }
可能导致你的进程被Linux杀死的原因
- 内存泄漏
- 进程所需的内存资源太大, 系统无法满足
- 可能是同一主机的另外一个进程占用资源过多, 但是打分最高的是你的进程, 然后OOM Killer机制把你的进程给杀了, 这就很坑爹了。
- 甚至有可能会因为是同一个内存组中的一个任务导致此内存组中的任务全部被杀???......