OpenWRT(11)：failsafe mode触发和流程说明

failsafe是一种安全模式，允许用户在遇到配置错误或者其他问题导致设备无法访问时，通过一个简化配置来重新获得对设备的控制。

为什么需要failsafe(配置错误或者无法访问)？怎么触发(通过按键、命令行、环境变量、网络)？触发后干什么(仅启动ssh服务、shell等有限服务。仅提供最小服务，用于恢复系统。IP为192.168.1.1，子网掩码伟255.255.255.0)？failsafe有什么限制(需要根文件伟只读分区，如squashfs)？

下面分析failsafe的启动流程。

1 /etc/preinit中failsafe相关流程

如下是/etc/preinit调用的/lib/preinit脚本：

lib/preinit/
├── 00_preinit.conf--配置文件，定义了preinit一些配置参数。
├── 02_default_set_state--preinit_main钩子：调用/etc/diag.sh和/lib/functions/leds.sh两个脚本，对外提供set_state()函数，参数可以是preinit、failsafe、preinit_regular、upgrade、done中的一个。
├── 02_sysinfo--preinit_main钩子：从设备树（Device Tree）获取系统信息，如主板名称和型号，并保存到/tmp/sysinfo目录。
├── 10_indicate_failsafe--failsafe钩子：通过控制台、网络、LED提示进入failsafe模式。
├── 10_indicate_preinit--preinit_main钩子：配置预初始化网络接口的IP地址。并通过网络和LED提示进入preinit阶段。
├── 30_failsafe_wait--preinit_main钩子：主要负责在系统启动过程中等待用户输入，以决定是否进入故障安全模式。
├── 40_run_failsafe_hook--preinit_main钩子：定义了当系统进入故障安全模式时应该执行的脚本和操作。
├── 50_indicate_regular_preinit--preinit_main钩子：提示正常启动模式，发送网络消息和LED指示。
├── 70_initramfs_test--preinit_main钩子：用于检测是否使用了 initramfs 并执行相关操作。
├── 80_mount_root--preinit_main钩子：用于挂载根文件系统，并在系统升级时恢复配置文件。如果INITRAMFS环境变量设置为 "1"，则不会添加此钩子。
├── 81_urandom_seed--用于将随机种子文件的内容写入/dev/urandom，以增强系统的随机性。
├── 99_10_failsafe_dropbear--failsafe钩子：用于启动一个 Dropbear SSH 服务器，以便在故障安全模式下远程访问设备。
├── 99_10_failsafe_login--failsafe钩子：用于在故障安全模式下启动一个登录shell。
└── 99_10_run_init--preinit_main钩子：用于在启动阶段执行一些清理工作，如反初始化网络配置。

failsafe相关的脚本有：

提供LED提示函数。
failsafe模式提示。
判断或等待进入failsafe模式。
进入failsafe模式启动ssh和shell服务。

1.1 02_default_set_state

/etc/diag.sh中提供set_state()函数，根据不同状态调用不同led配置函数：

done -> status_led_off/status_led_on
preinit -> status_led_blink_preinit
failsafe -> status_led_blink_failsafe
preinit_regular/upgrade -> status_led_blink_preinit_regular

/lib/functions/leds.sh中实现diag.sh中调用的函数：

涉及到3种led trigger：none、timer、heartbeat。

几种blink函数如下：preinit-一秒5次；failsafe-一秒10次；preinit_regular-一秒2.5次。

status_led_blink_preinit() {
    led_timer $status_led 100 100
}

status_led_blink_failsafe() {
    led_timer $status_led 50 50
}

status_led_blink_preinit_regular() {
    led_timer $status_led 200 200
}

1.2 10_indicate_failsafe

提示用户进入failsafe模式：

indicate_failsafe_led () {
    set_state failsafe
}

indicate_failsafe() {
    [ "$pi_preinit_no_failsafe" = "y" ] && return
    echo "- failsafe -"
    preinit_net_echo "Entering Failsafe!\n"--通过网络广播进入failsafe。
    indicate_failsafe_led--闪烁LED提示进入failsafe。
}

boot_hook_add failsafe indicate_failsafe

1.3 30_failsafe_wait

30_failsafe_wait定义了 OpenWrt 系统中处理failsafe模式的逻辑，包括等待用户按键输入和根据按键决定是否进入故障安全模式。这些机制为系统管理员或用户提供了一种在系统启动失败时进行干预的方法。

fs_wait_for_key () {
    local timeout=$3
    local timer
    local do_keypress
    local keypress_true="$(mktemp)"
    local keypress_wait="$(mktemp)"
    local keypress_sec="$(mktemp)"
    if [ -z "$keypress_wait" ]; then
        keypress_wait=/tmp/.keypress_wait
        touch $keypress_wait
    fi
    if [ -z "$keypress_true" ]; then
        keypress_true=/tmp/.keypress_true
        touch $keypress_true
    fi
    if [ -z "$keypress_sec" ]; then
        keypress_sec=/tmp/.keypress_sec
        touch $keypress_sec
    fi

    trap "echo 'true' >$keypress_true; lock -u $keypress_wait ; rm -f $keypress_wait" INT
    trap "echo 'true' >$keypress_true; lock -u $keypress_wait ; rm -f $keypress_wait" USR1

    [ -n "$timeout" ] || timeout=1
    [ $timeout -ge 1 ] || timeout=1
    timer=$timeout
    lock $keypress_wait
    {
        while [ $timer -gt 0 ]; do
            pi_failsafe_net_message=true \
                preinit_net_echo "Please press button now to enter failsafe"
            echo "$timer" >$keypress_sec
            timer=$(($timer - 1))
            sleep 1
        done
        lock -u $keypress_wait
        rm -f $keypress_wait
    } &

    [ "$pi_preinit_no_failsafe" != "y" ] && echo "Press the [$1] key and hit [enter] $2"
    echo "Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level"
    # if we're on the console we wait for input
    {
        while [ -r $keypress_wait ]; do
            timer="$(cat $keypress_sec)"

            [ -n "$timer" ] || timer=1
            timer="${timer%%\ *}"
            [ $timer -ge 1 ] || timer=1
            do_keypress=""
            {
                read -t "$timer" do_keypress
                case "$do_keypress" in
                $1)
                    echo "true" >$keypress_true
                    ;;
                1 | 2 | 3 | 4)
                    echo "$do_keypress" >/tmp/debug_level
                    ;;
                *)
                    continue;
                    ;;
                esac
                lock -u $keypress_wait
                rm -f $keypress_wait
            }
        done
    }
    lock -w $keypress_wait

    keypressed=1
    [ "$(cat $keypress_true)" = "true" ] && keypressed=0

    rm -f $keypress_true
    rm -f $keypress_wait
    rm -f $keypress_sec

    return $keypressed
}

failsafe_wait() {
    FAILSAFE=
    [ "$pi_preinit_no_failsafe" = "y" ] && {
        fs_wait_for_key "" "" $fs_failsafe_wait_timeout
        return
    }--首先检查环境变量pi_preinit_no_failsafe是否设置为y，如果是，则跳过故障安全模式的等待。
    grep -q 'failsafe=' /proc/cmdline && FAILSAFE=true && export FAILSAFE--检查/proc/cmdline是否包含failsafe=参数，如果是，则设置FAILSAFE变量为 "true"。
    if [ "$FAILSAFE" != "true" ]; then
        fs_wait_for_key f 'to enter failsafe mode' $fs_failsafe_wait_timeout && FAILSAFE=true--如果FAILSAFE不为"true"，则调用fs_wait_for_key函数等待用户按键。
        [ -f "/tmp/failsafe_button" ] && FAILSAFE=true && echo "- failsafe button "$(cat /tmp/failsafe_button)" was pressed -"--检查是否存在/tmp/failsafe_button文件，如果存在，则认为故障安全按钮被按下，设置FAILSAFE为 "true"。
        [ "$FAILSAFE" = "true" ] && export FAILSAFE && touch /tmp/failsafe--如果FAILSAFE为 "true"，则创建/tmp/failsafe文件。
    fi
}

boot_hook_add preinit_main failsafe_wait

综合来看影响failsafe模式的方式有：

按键触发：在系统启动过程中，如果用户按下了特定的按键，系统会进入故障安全模式。
命令行参数：在启动时，如果命令行参数中包含了failsafe=1或类似的值，系统会认为应该进入故障安全模式。这可以通过检查/proc/cmdline文件来实现。
手动设置环境变量：通过设置环境变量pi_preinit_no_failsafe为 "y" 来禁用故障安全模式的自动触发，但仍然可以通过其他方式手动进入。
网络触发：某些系统可能通过网络发送特定的信号或命令来触发故障安全模式，尤其是在远程管理的场景中。
FAILSAFE：如果设置为 "true"，系统将采取一系列预定义的措施来响应故障安全事件。

1.4 40_run_failsafe_hook

在系统启动失败时执行备用启动脚本。

run_failsafe_hook() {
    [ "$pi_preinit_no_failsafe" = "y" ] && return--首先检查pi_preinit_no_failsafe是否被设置为"y"，如果是，则跳过故障安全模式的检查和执行。
    if [ "$FAILSAFE" = "true" ]; then
    lock /tmp/.failsafe--创建一个锁文件/tmp/.failsafe，以防止其他启动流程干扰故障安全模式的执行。
    boot_run_hook failsafe--使用boot_run_hook调用failsafe钩子，这会触发所有注册到故障安全钩子的脚本和函数的执行，包括indicate_failsafe、failsafe_dropbear、failsafe_shell。
    while [ ! -e /tmp/sysupgrade ]; do--如果系统处于故障安全模式，并且检测到/tmp/sysupgrade文件存在，脚本会等待直到系统升级完成。
        lock -w /tmp/.failsafe
    done
    exit
    fi
}

boot_hook_add preinit_main run_failsafe_hook

1.5 99_10_failsafe_dropbear

启动一个 Dropbear SSH 服务器，以便在故障安全模式下远程访问设备。

#!/bin/sh

failsafe_dropbear () {
    dropbearkey -t rsa -s 1024 -f /tmp/dropbear_rsa_failsafe_host_key--调用dropbearkey命令生成 RSA 和 ED25519 两种类型的 SSH 主机密钥。
    dropbearkey -t ed25519 -f /tmp/dropbear_ed25519_failsafe_host_key
    dropbear -r /tmp/dropbear_rsa_failsafe_host_key -r /tmp/dropbear_ed25519_failsafe_host_key <> /dev/null 2>&1--使用dropbear命令启动 SSH 服务器，指定上面生成的两个密钥文件。
}

boot_hook_add failsafe failsafe_dropbear

1.6 99_10_failsafe_login

在故障安全模式下启动一个登录 shell。

failsafe_shell() {
    local console="$(sed -e 's/ /\n/g' /proc/cmdline | grep '^console=' | head -1 | sed -e 's/^console=//' -e 's/,.*//')"--查找/proc/cmdline文件中定义的控制台设备。这通常是通过内核启动参数console=指定的。
    [ -n "$console" ] || console=console--如果没有在/proc/cmdline中找到控制台设备，或者找到的设备不存在于系统中，脚本将默认使用console。
    [ -c "/dev/$console" ] || return 0--脚本使用-c测试来确定找到的控制台设备是否存在。
    while true; do
        ash --login <"/dev/$console" >"/dev/$console" 2>"/dev/$console"--命令的输入和输出都被重定向到控制台设备。这意味着用户的输入将通过控制台设备发送，shell 的输出也会显示在控制台设备上。
        sleep 1
    done &--&符号将shell进程放到后台执行，这样即使shell正在运行，系统也可以继续执行其他任务。
}

boot_hook_add failsafe failsafe_shell

2 /etc/profile

/etc/profile作为ash登录配置文件，如果FAILSAFE为1情况下，显示/etc/banner.failsafe内容，不执行/etc/profile.d中脚本。

[ -e /tmp/.failsafe ] && export FAILSAFE=1--如果/tmp/.failsafe存在，它将设置环境变量FAILSAFE为1。

[ -f /etc/banner ] && cat /etc/banner
[ -n "$FAILSAFE" ] && cat /etc/banner.failsafe
...
[ -n "$FAILSAFE" ] || {--如果FAILSAFE环境变量没有设置，它将执行/etc/profile.d/目录下所有以.sh结尾的脚本文件。
    for FILE in /etc/profile.d/*.sh; do
        [ -e "$FILE" ] && . "$FILE"
    done
    unset FILE
}
...

3 /etc/init.d/boot

/etc/init.d/boot中，如果FAILSAFE为true则touch /tmp/.failsafe。

    [ "$FAILSAFE" = "true" ] && touch /tmp/.failsafe

更多参考：《[OpenWrt Wiki] Failsafe mode, factory reset, and recovery mode》。

posted on 2024-08-24 23:59 ArnoldLu 阅读(171) 评论(0) 编辑收藏举报

刷新页面返回顶部

Arnold Lu@南京