Softlockup&Hardlockup检测机制

前言#

Linux自身具备一定的异常检测机制,softlockuphardlockup是典型的两种,softlockup检测内核是否出现了长时间不调度其他任务执行的异常情况。hardlockup则更进一步检测内核是否出现了长时间不响应中断的异常情况。softlockuphardlockup的定义如下:

A 'softlockup' is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds, without giving other tasks a chance to run.
A 'hardlockup' is defined as a bug that causes the CPU to loop in kernel mode for more than 10 seconds, without letting other interrupts have a chance to run.

这两种异常检测机制具有一定的相似性,因此设计的思路是一体的。但是在检测的目标上又存在差异,所以实现上有一些不同。

watchdog#

watchdog机制是一种常见的keep-alive方法,其原理是周期性的执行一个任务检查某个值是否已经更新,这个检查过程称之为watch dog,而更新值的动作被称为touch dog
softlockuphardlockup机制针对的是单核的检测,因此对于每一个CPU内核都有两个dog分别对应softlockuphqrdlockup

  • softlockupdogwatchdog_touch_ts,记录了上一次touch dog的时间戳。
  • hardlockupdoghrtimer_interrupts,记录hrtimer高精度定时器中断发生的次数。
static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);

在内核中存在三类程序可以被执行的,按照优先级从高到底分别是NMI处理函数、Normal Interrupt处理函数和Task。从本质上来说,softlockup检测的是NMINormal Interrupt正常响应的情况下,Task之间的调度能否正常发生,hardlockup检测的是NMI正常响应的情况下,Normal Interrupt能否正常响应和被调度执行。

NoteNMI作为不可屏蔽中断,保证了任何条件下都能执行。

softlockup#

为了满足检测目标,softlockup需要有一个内核线程能够touch dog(更新watchdog_touch_ns),并且该线程必须在softlockup检查时启动。同时还需要一个周期定时器任务,检查watchdog_touch_tsnow之间的距离是否超过门限,如果超过就认为发生了softlockup。默认超时时长softlockup_thresh20s(2 * watchdog_thresh)softlockup检查在is_softlockupkernel/watchdog.c)中实现:

static int is_softlockup(unsigned long touch_ts)
{
    unsigned long now = get_timestamp();

    if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
        /* Warn about unreasonable delays. */
        if (time_after(now, touch_ts + get_softlockup_thresh()))
            return now - touch_ts;
    }
    return 0;
}

为了保证softlockup的有效性,更新watchdog_touch_nsTask必须拥有最高的任务优先级,否则即使正常发生调度低优先级任务也无法及时更新时间戳。因此在老的内核版本更新watch_touch_nsTask[watchdog/x],随着STOP调度类(比实时任务的优先级更高)的引入,更新线程变成了[migration/x]

migration线程作为内核中优先级最高的线程,负责内核热插拔、停止CPU运行等工作。migration线程管理了一个work_queue,当有任务需要执行时migration就会进入RUNNABLE状态等待调度,一旦发生调度migration一定能够拿到执行权更新watchdog_touch_ns,保证了softlockup检查的有效性。

而检查softlockup的任务必须交给优先级更高的中断,内核中的hrtimer可以周期性的触发中断,在hrtimer的处理函数watchdog_timer_fn中可以检查[migration/x]是否正常更新了watchdog_touch_nshrtimer定时器的触发周期是softlockup_thresh / 5(默认值是4s)。

softlockup检查机制的整体流程如下:

  • hrtimer周期性的触发执行中断处理程序watchdog_timer_fn
    1. work_queue插入任务softlockup_fn
    2. 检查watchdog_touch_ns是否异常
    3. 睡眠,等待下一次触发
  • migration线程
    1. work_queue唤醒
    2. 检查队列,取出softlockup_fn执行
    3. 更新watchdog_touch_ns
    4. work_queue为空,进入睡眠

如果migration线程在任务队列中长时间没有被调度执行(核上的任务长时间的占据了CPU),则说明出现了softlockup异常,需要对现场进行dump

hrtimer

migration

周期触发

watch

插入softlockup_fn

touch

唤醒

watchdog_touch_ns

中断

work_queue

检查任务队列

执行softlockup_fn

唤醒migration线程

进行softlockup检查

softlockup检查机制的整体流程

hardlockup#

hardlockup的检测机制和softlockup类似,但是检测的目标不同,hardlockup检测的是普通中断长时间不响应,hardlockup的检查在kernel/watchdog.cis_hardlockup中实现,判断hrtimer_interrupts是否在进行递增,如果没有递增则认为发生了hardlockup

/* watchdog detector functions */
bool is_hardlockup(void)
{
    unsigned long hrint = __this_cpu_read(hrtimer_interrupts);

    if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
        return true;

    __this_cpu_write(hrtimer_interrupts_saved, hrint);
    return false;
}

hardlockup的默认超时时长watchdog_thresh10s,是softlockup的一半。和softlockup不一样的是hrtimer_interrupts没有记录时间戳信息,如何判断是否超时呢?
Linux使用的是周期性的NMI。基于perf subsystemcycles事件,perfcounter可以设置溢出阈值,当perf event的发生次数达到阈值时会触发一次NMI中断,同时cycles与时间存在一定的关系,具体可以看kernel/watchdog.cwatchdog_nmi_enable函数。顺着调用链可以看到hardlockup_detector_event_create函数(在kernel/watchdog_hld.c中)调用了hw_nmi_get_sample_period(在arch/x86/kernel/apic/hw_nmi.c中),这个函数是一个体系结构相关的函数,在这里获取了cycles溢出的NMI中断的触发周期watchdog_thresh

u64 hw_nmi_get_sample_period(int watchdog_thresh)
{
    return (u64)(cpu_khz) * 1000 * watchdog_thresh;
}

周期性的NMI触发执行回调函数进行watch(检查hrtimer_interrupts是否递增),hrtimer则负责定期的touch(增加hrtimer_interrupts)。

hardlockupsoftlockup之间通过hrtimer产生了交集,所以hrtiemr的处理函数不仅要watch watchdog_touch_ts进行softlockup检查,同时还需要touch hrtimer_interrupts更新中断触发次数。

NOTE:2024-03-15更新
hardlockup的超时周期是通过cycles NMI中断的触发周期来保障的,但是在一些具有睿频模式(turbo mode)的CPU上通过cycles数量推算时间这个方法会不准确,NMI中断的触发周期会缩小导致误报。所谓睿频模式指的是CPU会根据情况自动的调整CPU的频率和关闭CPU,比如在一个四核处理器上运行单线程程序,此时会关闭三个核心,提高运行核心的频率从而提高性能,并且降低功耗。但是这会带来两个问题,动态频率会导致基于cycles NMI中断周期不准,第二个问题是停止的CPU的时钟会不更新。因此在这个场景下内核中有一个配置选项CONFIG_HARDLOCKUP_CHECK_TIMESTAMP,开启这个配置选项以后在NMI中断的回调函数中会检查时间戳,如果距离上一次hardlockup检查过去了4/5 * watchdog_thresh(能够保证至少一次hrtimer_interrupts更新)才进行hardlockup检查。此外,如果ktime是基于jiffies(每个时钟中断更新一次)的,在停止的CPU上jiffies并不会更新,此时通过一个计数器nmi_rearmed判断是否达到了时间间隔要求。这个特性可以参考如下代码:

#ifdef CONFIG_HARDLOCKUP_CHECK_TIMESTAMP
static DEFINE_PER_CPU(ktime_t, last_timestamp);
static DEFINE_PER_CPU(unsigned int, nmi_rearmed);
static ktime_t watchdog_hrtimer_sample_threshold __read_mostly;

void watchdog_update_hrtimer_threshold(u64 period)
{
	/*
	 * The hrtimer runs with a period of (watchdog_threshold * 2) / 5
	 *
	 * So it runs effectively with 2.5 times the rate of the NMI
	 * watchdog. That means the hrtimer should fire 2-3 times before
	 * the NMI watchdog expires. The NMI watchdog on x86 is based on
	 * unhalted CPU cycles, so if Turbo-Mode is enabled the CPU cycles
	 * might run way faster than expected and the NMI fires in a
	 * smaller period than the one deduced from the nominal CPU
	 * frequency. Depending on the Turbo-Mode factor this might be fast
	 * enough to get the NMI period smaller than the hrtimer watchdog
	 * period and trigger false positives.
	 *
	 * The sample threshold is used to check in the NMI handler whether
	 * the minimum time between two NMI samples has elapsed. That
	 * prevents false positives.
	 *
	 * Set this to 4/5 of the actual watchdog threshold period so the
	 * hrtimer is guaranteed to fire at least once within the real
	 * watchdog threshold.
	 */
	watchdog_hrtimer_sample_threshold = period * 2;
}

static bool watchdog_check_timestamp(void)
{
	ktime_t delta, now = ktime_get_mono_fast_ns();

	delta = now - __this_cpu_read(last_timestamp);
	if (delta < watchdog_hrtimer_sample_threshold) {
		/*
		 * If ktime is jiffies based, a stalled timer would prevent
		 * jiffies from being incremented and the filter would look
		 * at a stale timestamp and never trigger.
		 */
		if (__this_cpu_inc_return(nmi_rearmed) < 10)
			return false;
	}
	__this_cpu_write(nmi_rearmed, 0);
	__this_cpu_write(last_timestamp, now);
	return true;
}
#else
static inline bool watchdog_check_timestamp(void)
{
	return true;
}
#endif

watchdog相关配置接口#

启用或禁用watchdog:

  • /proc/sys/kernel/soft_watchdog:启用或禁用softlockup
  • /proc/sys/kernel/nmi_watchdog:启用或禁用hardlockup
  • /proc/sys/kernel/watchdog: 同时启用或禁用softlockuphardlockup,读取的返回值是soft_watchdognmi_watchdog取或。

设置哪些core启用watchdog:

  • /proc/sys/kernel/watchdog_cpumask

设置lockup超时门限:

  • /proc/sys/kernel/watchdog_thresh:设置NMI watchdog超时门限,softlockup_thresh2 * watchdog_thresh

设置超时的处理:

  • /proc/sys/kernel/hardlockup_panic:出现hardlockup时是否panic

相关源码#

watchdog初始化#

watchdog_enablekernel/watchdog.c)执行流程:

  1. 启动hrtimer
    • 设置hrtimer的中断处理函数watchdog_timer_fn
    • 设置定时器触发周期为2 * watchdog_thresh / 5
  2. 创建cycles perf eventwatchdog_nmi_enable -> hardlockup_detector_perf_enable -> hardlockup_detector_event_create
    • 设置NMI中断触发周期为watchdog_thresh
    • 设置中断处理函数watchdog_overflow_callback
static void watchdog_enable(unsigned int cpu)
{   
    ...
    hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
    hrtimer->function = watchdog_timer_fn;
    hrtimer_start(hrtimer, ns_to_ktime(sample_period),
              HRTIMER_MODE_REL_PINNED_HARD);
    ...
    if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
        watchdog_nmi_enable(cpu);
    ...
}

hrtimer#

watchdog_timer_fnkernel/watchdog.c)执行流程:

  1. 递增hrtimer_interrupts
  2. migration线程的work_queue插入softlockup_fn,让migration进入调度队列
  3. 检查softlockup
/* watchdog kicker functions */
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
    ...
    /* kick the hardlockup detector */
    watchdog_interrupt_count();
    ...
    /* kick the softlockup detector */
    if (completion_done(this_cpu_ptr(&softlockup_completion))) {
        reinit_completion(this_cpu_ptr(&softlockup_completion));
        stop_one_cpu_nowait(smp_processor_id(),
                softlockup_fn, NULL,
                this_cpu_ptr(&softlockup_stop_work));
    }
    duration = is_softlockup(touch_ts);
    if (unlikely(duration)) {
        ....
    }
    return HRTIMER_RESTART;
}

cycles NMI#

cycles计数器溢出触发NMI回调函数watchdog_overflow_callbackkerne/watchdog_hld.c)中检查hardlockup

static void watchdog_overflow_callback(struct perf_event *event,
                       struct perf_sample_data *data,
                       struct pt_regs *regs)
{
    if (is_hardlockup()) {
        ....
    }
    ...
    return;
}

References#

[1] lockup-watchdogs

posted @   ZouTaooo  阅读(589)  评论(0编辑  收藏  举报
点击右上角即可分享
微信分享提示
主题色彩