goroutine基于信号的抢占式调度(version: 1.22.4)

 


1、基于信号的抢占式调度


1.1 main 函数

可以看到在main goroutine起来后,就启动了sysmon 线程

src/runtime/proc.go:146

复制代码
func main() {
    mp := getg().m
    .....
    mainStarted = true
    if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
        systemstack(func() {
            newm(sysmon, nil, -1)    // 启动 sysmon 线程
        })
    }
...
}
复制代码

 1.2 sysmon 函数

    可以看到每隔 delay的时间(最后固定为10ms) sysmon 就会检查一次,通过retake函数进行抢占

src/runtime/proc.go:5946

复制代码
// Always runs without a P, so write barriers are not allowed.
//
//go:nowritebarrierrec
func sysmon() {
    ...
    for {
        if idle == 0 { // start with 20us sleep...
            delay = 20
        } else if idle > 50 { // start doubling the sleep after 1ms...
            delay *= 2
        }
        if delay > 10*1000 { // up to 10ms
            delay = 10 * 1000
        }
        usleep(delay)   // 系统调用休眠delay ms
        ...
        // retake P's blocked in syscalls
        // and preempt long running G's
        if retake(now) != 0 {             // 抢占处理系统调用,和长时间运行的G
            idle = 0
        } else {
            idle++
        }
        ...
        unlock(&sched.sysmonlock)
    }
}
复制代码

usleep 系统调用

复制代码
TEXT runtime·usleep(SB),NOSPLIT,$16
    MOVL    $0, DX
    MOVL    usec+0(FP), AX
    MOVL    $1000000, CX
    DIVL    CX
    MOVQ    AX, 0(SP)
    MOVL    $1000, AX    // usec to nsec
    MULL    DX
    MOVQ    AX, 8(SP)

    // nanosleep(&ts, 0)
    MOVQ    SP, DI
    MOVL    $0, SI
    MOVL    $SYS_nanosleep, AX
    SYSCALL
    RET
View Code
复制代码

1.3 retake 函数

    retake 每次执行都将循环allp 中的每一个p,当p处于 _Prunning 运行状态或者 _Psyscall 系统调用状态时,
    判断器执行时间是否超过了预设的抢占时间,也就是是否运行超过 10 ms,超过10ms 就执行 preemptone 进行抢占

src/runtime/proc.go:6104

复制代码
const forcePreemptNS = 10 * 1000 * 1000 // 预设抢占时间为  10ms

func retake(now int64) uint32 {
    n := 0
    lock(&allpLock)
    for i := 0; i < len(allp); i++ {
        pp := allp[i]
        if pp == nil {
            // This can happen if procresize has grown allp but not yet created new Ps.
            continue
        }
        pd := &pp.sysmontick
        s := pp.status
        sysretake := false
        if s == _Prunning || s == _Psyscall {
            // Preempt G if it's running for too long.
            t := int64(pp.schedtick)
            if int64(pd.schedtick) != t {
                pd.schedtick = uint32(t)  // 初始化调度计数 使得 pp.schedtick = pp.schedtick
                pd.schedwhen = now        // 将调度时间初始化为now
            } else if pd.schedwhen+forcePreemptNS <= now {   // 当预设的抢占时间 pd.schedwhen+forcePreemptNS 小于当前时间
                preemptone(pp)  // 执行抢占
                // In case of syscall, preemptone() doesn't work, because there is no M wired to P.
                sysretake = true
            }
        }
        ...
    }
    unlock(&allpLock)
    return uint32(n)
}
复制代码

1.4  协作式抢占

   关于协作式的抢占可以查看此文:https://www.cnblogs.com/shiqi17/articles/18255472

src/runtime/proc.go:6205

复制代码
// Tell the goroutine running on processor P to stop.
// This function is purely best-effort. It can incorrectly fail to inform the
// goroutine. It can inform the wrong goroutine. Even if it informs the
// correct goroutine, that goroutine might ignore the request if it is
// simultaneously executing newstack.
// No lock needs to be held.
// Returns true if preemption request was issued.
// The actual preemption will happen at some point in the future
// and will be indicated by the gp->status no longer being
// Grunning
func preemptone(pp *p) bool {
    mp := pp.m.ptr()
    if mp == nil || mp == getg().m {
        return false
    }
    gp := mp.curg
    if gp == nil || gp == mp.g0 {
        return false
    }

    gp.preempt = true  // 允许抢占

    // Every call in a goroutine checks for stack overflow by
    // comparing the current stack pointer to gp->stackguard0.
    // Setting gp->stackguard0 to StackPreempt folds
    // preemption into the normal stack overflow check.
    gp.stackguard0 = stackPreempt  // 将g的栈设置为溢出,在编译器给函数调用插入的扩栈检查将会识别

    // Request an async preemption of this P.
    if preemptMSupported && debug.asyncpreemptoff == 0 {
        pp.preempt = true
        preemptM(mp)  // 发送信号异步抢占
    }

    return true
}
复制代码

1.5 信号系统抢占

    可以看到在 preemptM 函数中调用了signalM 向绑定当前p的m发出抢占信号,该线程接收到信号,操作系统就接管了该线程,调用该信号注册的处理函数

src/runtime/signal_unix.go:368

复制代码
// preemptM sends a preemption request to mp. This request may be handled asynchronously and may be coalesced with other requests to
// the M. When the request is received, if the running G or P are marked for preemption and the goroutine is at an asynchronous
// safe-point, it will preempt the goroutine. It always atomically increments mp.preemptGen after handling a preemption request.
func preemptM(mp *m) {
    ...if mp.signalPending.CompareAndSwap(0, 1) {
        if GOOS == "darwin" || GOOS == "ios" {
            pendingPreemptSignals.Add(1)
        }
        // If multiple threads are preempting the same M, it may send many signals to the same M such that it hardly make progress, causing
        // live-lock problem. Apparently this could happen on darwin. See issue #37741. Only send a signal if there isn't already one pending.
        signalM(mp, sigPreempt)   // 发送信号执行抢占
    }
    ...
}
复制代码

获取pid发送信号sigPreempt 23

src/runtime/os_linux.go:550

// signalM sends a signal to mp.
func signalM(mp *m, sig int) {
    tgkill(getpid(), int(mp.procid), sig)
}

执行发送信号系统调用

src/runtime/sys_linux_amd64.s:171

TEXT ·tgkill(SB),NOSPLIT,$0
    MOVQ    tgid+0(FP), DI
    MOVQ    tid+8(FP), SI
    MOVQ    sig+16(FP), DX
    MOVL    $SYS_tgkill, AX
    SYSCALL
    RET

2、信号处理


2.1 初始化信号

    以amd64 架构为例,go代码启动顺序为:_rt0_amd64_linux -> _rt0_amd64 -> rt0_go -> mstart -> mstart0 引导顺序可查看

    而启动新的M新的线程启动函数为: mstart -> mstart0  启动一个goroutine和线程可查看 go如何启动一个goroutine 以及 go创建m绑定系统线程

    当线程运行到minit 时将会初始化信号栈及信号掩码,而mstartm0则给信号绑定了处理函数

go/src/runtime/proc.go:1702

复制代码
func mstart1() {
    gp := getg()
    ...
    minit()         // 初始化信号处理栈,和信号掩码
    // Install signal handlers; after minit so that minit can prepare the thread to be able to handle the signals.
    if gp.m == &m0 {
        mstartm0()  // 给信号注册绑定处理函数
    }
    ...
    schedule()      // 开启调度 
}
复制代码

2.1.1 初始信号栈

go/src/runtime/os_linux.go:383

// Called to initialize a new m (including the bootstrap m). Called on the new thread, cannot allocate memory.
func minit() {
    minitSignals() // 初始化信号

    // Cgo-created threads and the bootstrap m are missing a procid. We need this for asynchronous preemption and it's useful in debuggers.
    getg().m.procid = uint64(gettid())
}

minitSignalStack函数用于设置当前os线程的备用信号栈。备用信号栈是在处理信号时使用的一个独立的栈空间,这样可以防止信号处理器中断正常的栈操作。

sigaltstack(nil, &st)检查当前线程是否已经设置了备用信号栈,如果没有设置(st.ss_flags&_SS_DISABLE != 0),或者当前不是cgo环境(!iscgo),则使用signalstack(&mp.gsignal.stack)设置备用信号栈为gsignal栈。
如果已经设置了备用信号栈(在非Go线程中设置,然后调用Go函数的情况),则使用setGsignalStack(&st, &mp.goSigStack)将gsignal栈设置为当前线程的备用信号栈。
mp.newSigstack记录了是否设置了新的备用信号栈,这样在unminit时可以撤销这个设置。

go/src/runtime/signal_unix.go:1261

复制代码
// minitSignals is called when initializing a new m to set the
// thread's alternate signal stack and signal mask.
func minitSignals() {
    minitSignalStack()
    minitSignalMask()
}

// minitSignalStack is called when initializing a new m to set the
// alternate signal stack. If the alternate signal stack is not set
// for the thread (the normal case) then set the alternate signal
// stack to the gsignal stack. If the alternate signal stack is set
// for the thread (the case when a non-Go thread sets the alternate
// signal stack and then calls a Go function) then set the gsignal
// stack to the alternate signal stack. We also set the alternate
// signal stack to the gsignal stack if cgo is not used (regardless
// of whether it is already set). Record which choice was made in
// newSigstack, so that it can be undone in unminit.
func minitSignalStack() { 
    mp := getg().m
    var st stackt          // 申请信号备用栈
    sigaltstack(nil, &st)  // 将当前线程的信号备用栈设置为 st
    if st.ss_flags&_SS_DISABLE != 0 || !iscgo {
        signalstack(&mp.gsignal.stack)
        mp.newSigstack = true
    } else {
        setGsignalStack(&st, &mp.goSigStack) // 将st 与当前m 的信号备用栈绑定
        mp.newSigstack = false
    }
}
复制代码

2.1.2 初始化信号掩码

minitSignalMask函数用于设置线程的信号掩码。信号掩码决定了哪些信号可以被当前线程接收。

复制代码
// minitSignalMask is called when initializing a new m to set the
// thread's signal mask. When this is called all signals have been
// blocked for the thread.  This starts with m.sigmask, which was set
// either from initSigmask for a newly created thread or by calling
// sigsave if this is a non-Go thread calling a Go function. It
// removes all essential signals from the mask, thus causing those
// signals to not be blocked. Then it sets the thread's signal mask.
// After this is called the thread can receive signals.
func minitSignalMask() {
    nmask := getg().m.sigmask
    for i := range sigtable {
        if !blockableSig(uint32(i)) {
            sigdelset(&nmask, i)
        }
    }
    sigprocmask(_SIG_SETMASK, &nmask, nil)
}
复制代码

2.2 绑定信号处理函数

 src/runtime/proc.go:1745

复制代码
// mstartm0 implements part of mstart1 that only runs on the m0.
//
// Write barriers are allowed here because we know the GC can't be
// running yet, so they'll be no-ops.
//
//go:yeswritebarrierrec
func mstartm0() {
    // Create an extra M for callbacks on threads not created by Go.
    // An extra M is also needed on Windows for callbacks created by
    // syscall.NewCallback. See issue #6751 for details.
    if (iscgo || GOOS == "windows") && !cgoHasExtraM {
        cgoHasExtraM = true
        newextram()
    }
    initsig(false)   // 初始化信号处理
}
复制代码

    信号绑定处理函数

复制代码
// Initialize signals.
// Called by libpreinit so runtime may not be initialized.
//
//go:nosplit
//go:nowritebarrierrec
func initsig(preinit bool) {
    ...for i := uint32(0); i < _NSIG; i++ {
        t := &sigtable[i]
        if t.flags == 0 || t.flags&_SigDefault != 0 {
            continue
        }
        // We don't need to use atomic operations here because there shouldn't be any other goroutines running yet.
        fwdSig[i] = getsig(i)
        if !sigInstallGoHandler(i) {
            // Even if we are not installing a signal handler, set SA_ONSTACK if necessary.
            if fwdSig[i] != _SIG_DFL && fwdSig[i] != _SIG_IGN {
                setsigstack(i)
            } else if fwdSig[i] == _SIG_IGN {
                sigInitIgnored(i)
            }
            continue
        }
        handlingSig[i] = 1
        setsig(i, abi.FuncPCABIInternal(sighandler))   // 设置信号处理函数 sighandler
} }
复制代码

2.3 信号处理函数

src/runtime/signal_unix.go:619

复制代码
// sighandler is invoked when a signal occurs. The global g will be
// set to a gsignal goroutine and we will be running on the alternate
// signal stack. The parameter gp will be the value of the global g
// when the signal occurred. The sig, info, and ctxt parameters are
// from the system signal handler: they are the parameters passed when
// the SA is passed to the sigaction system call.
//
// The garbage collector may have stopped the world, so write barriers
// are not allowed.
//
//go:nowritebarrierrec
func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
    // The g executing the signal handler. This is almost always
    // mp.gsignal. See delayedSignal for an exception.
    gsignal := getg()
    mp := gsignal.m
    c := &sigctxt{info, ctxt}
    ...

    if sig == sigPreempt && debug.asyncpreemptoff == 0 && !delayedSignal {
        // Might be a preemption signal.
        doSigPreempt(gp, c)
        // Even if this was definitely a preemption signal, it
        // may have been coalesced with another signal, so we
        // still let it through to the application.
    }
    ...
}
复制代码

2.4 处理抢占信号

2.4.1 安全检查

wantAsyncPreempt 中检查了 Goroutine 是否需要被抢占,并且是否是安全的抢占点,

isAsyncSafePoint 进一步检查当前的程序计数器 (PC)、栈指针 (SP) 和链接寄存器 (LR) 是否处于一个安全的抢占点,ok 表示是否在安全点,newpc 是调整后的程序计数器,

ctxt.pushCall中将程序计数器调整到 newpc,并注入对 asyncPreempt 函数的调用。

src/runtime/signal_unix.go:341

复制代码
// doSigPreempt handles a preemption signal on gp.
func doSigPreempt(gp *g, ctxt *sigctxt) {
    // Check if this G wants to be preempted and is safe to
    // preempt.
    if wantAsyncPreempt(gp) {
        if ok, newpc := isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()); ok {
            // Adjust the PC and inject a call to asyncPreempt.
            ctxt.pushCall(abi.FuncPCABI0(asyncPreempt), newpc)
        }
    }

    // Acknowledge the preemption.
    gp.m.preemptGen.Add(1)
    gp.m.signalPending.Store(0)

    if GOOS == "darwin" || GOOS == "ios" {
        pendingPreemptSignals.Add(-1)
    }
}
复制代码

总之这个函数就是检查是否可以安全地对 gp 进行抢占,如果可以,就调整程序计数器 (PC) 并注入对 asyncPreempt 的调用,最后承认抢占并更新相关状态。

2.4.2 压栈模拟函数调用

  1. sp := uintptr(c.rsp()) 获取信号处理函数的栈指针寄存器
  2. sp -= goarch.PtrSize 栈向下增长,预留一个指针的空间
  3. *(*uintptr)(unsafe.Pointer(sp)) = resumePC  将当前pc 存入栈顶(相当于将模拟CPUJ将resumePC压入栈)
  4. c.set_rsp(uint64(sp)) 更新信号上下文中的栈指针寄存器 (rsp),设置为新的栈顶位置 sp。
  5. c.set_rip(uint64(targetPC)) 更新信号上下文中的程序计数器寄存器 (rip),设置为 targetPC,使得下一条执行的指令是 targetPC 所指向的地址

src/runtime/signal_amd64.go:80

func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
    // Make it look like we called target at resumePC.
    sp := uintptr(c.rsp())
    sp -= goarch.PtrSize
    *(*uintptr)(unsafe.Pointer(sp)) = resumePC  // 将当前pc 压栈,模拟pc 调用函数
    c.set_rsp(uint64(sp))
    c.set_rip(uint64(targetPC))                 // 将PC计数器设置为 targetPC 也就是
}

总的来说pushCall就是模拟了一个函数调用,即resumePC 调用targetPC,并且将PC计数器设置为了targetPC,那么当程序中断处理结束,就会执行targetPC,调用我们期望的抢占函数asyncPreempt

 2.4.3 保存现场

在asyncPreempt函数中呢就执行了保存所有相关寄存器的值,调用函数 asyncPreempt2(这里面从新执行了调度),然后恢复寄存器状态并返回

src/runtime/preempt_amd64.s:7

复制代码
TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
    PUSHQ BP
    MOVQ SP, BP
    // Save flags before clobbering them
    PUSHFQ
    // obj doesn't understand ADD/SUB on SP, but does understand ADJSP
    ADJSP $368
    // But vet doesn't know ADJSP, so suppress vet stack checking
    NOP SP
    MOVQ AX, 0(SP)
    MOVQ CX, 8(SP)
    MOVQ DX, 16(SP)
    MOVQ BX, 24(SP)
    MOVQ SI, 32(SP)
    MOVQ DI, 40(SP)
    MOVQ R8, 48(SP)
    MOVQ R9, 56(SP)
    MOVQ R10, 64(SP)
    MOVQ R11, 72(SP)
    MOVQ R12, 80(SP)
    MOVQ R13, 88(SP)
    MOVQ R14, 96(SP)
    MOVQ R15, 104(SP)
    #ifdef GOOS_darwin
    #ifndef hasAVX
    CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $0
    JE 2(PC)
    #endif
    VZEROUPPER
    #endif
    MOVUPS X0, 112(SP)
    MOVUPS X1, 128(SP)
    MOVUPS X2, 144(SP)
    MOVUPS X3, 160(SP)
    MOVUPS X4, 176(SP)
    MOVUPS X5, 192(SP)
    MOVUPS X6, 208(SP)
    MOVUPS X7, 224(SP)
    MOVUPS X8, 240(SP)
    MOVUPS X9, 256(SP)
    MOVUPS X10, 272(SP)
    MOVUPS X11, 288(SP)
    MOVUPS X12, 304(SP)
    MOVUPS X13, 320(SP)
    MOVUPS X14, 336(SP)
    MOVUPS X15, 352(SP)
    CALL ·asyncPreempt2(SB)
    MOVUPS 352(SP), X15
    MOVUPS 336(SP), X14
    MOVUPS 320(SP), X13
    MOVUPS 304(SP), X12
    MOVUPS 288(SP), X11
    MOVUPS 272(SP), X10
    MOVUPS 256(SP), X9
    MOVUPS 240(SP), X8
    MOVUPS 224(SP), X7
    MOVUPS 208(SP), X6
    MOVUPS 192(SP), X5
    MOVUPS 176(SP), X4
    MOVUPS 160(SP), X3
    MOVUPS 144(SP), X2
    MOVUPS 128(SP), X1
    MOVUPS 112(SP), X0
    MOVQ 104(SP), R15
    MOVQ 96(SP), R14
    MOVQ 88(SP), R13
    MOVQ 80(SP), R12
    MOVQ 72(SP), R11
    MOVQ 64(SP), R10
    MOVQ 56(SP), R9
    MOVQ 48(SP), R8
    MOVQ 40(SP), DI
    MOVQ 32(SP), SI
    MOVQ 24(SP), BX
    MOVQ 16(SP), DX
    MOVQ 8(SP), CX
    MOVQ 0(SP), AX
    ADJSP $-368
    POPFQ
    POPQ BP
    RET
View Code
复制代码

2.4.4 重新调度

src/runtime/preempt.go:301

复制代码
//go:nosplit
func asyncPreempt2() {
    gp := getg()
    gp.asyncSafePoint = true
    if gp.preemptStop {
        mcall(preemptPark)  // 从新调度
    } else {
        mcall(gopreempt_m)  // 重新调度
    }
    gp.asyncSafePoint = false
}
复制代码

macall就不再赘述,就是切换到g0栈去执行目标函数

src/runtime/asm_amd64.s:428

复制代码
// func mcall(fn func(*g))
// Switch to m->g0's stack, call fn(g).
// Fn must never return. It should gogo(&g->sched)
// to keep running g.
TEXT runtime·mcall<ABIInternal>(SB), NOSPLIT, $0-8
    MOVQ    AX, DX    // DX = fn

    // Save state in g->sched. The caller's SP and PC are restored by gogo to
    // resume execution in the caller's frame (implicit return). The caller's BP
    // is also restored to support frame pointer unwinding.
    MOVQ    SP, BX    // hide (SP) reads from vet
    MOVQ    8(BX), BX    // caller's PC
    MOVQ    BX, (g_sched+gobuf_pc)(R14)
    LEAQ    fn+0(FP), BX    // caller's SP
    MOVQ    BX, (g_sched+gobuf_sp)(R14)
    // Get the caller's frame pointer by dereferencing BP. Storing BP as it is
    // can cause a frame pointer cycle, see CL 476235.
    MOVQ    (BP), BX // caller's BP
    MOVQ    BX, (g_sched+gobuf_bp)(R14)

    // switch to m->g0 & its stack, call fn
    MOVQ    g_m(R14), BX
    MOVQ    m_g0(BX), SI    // SI = g.m.g0
    CMPQ    SI, R14    // if g == m->g0 call badmcall
    JNE    goodm
    JMP    runtime·badmcall(SB)
goodm:
    MOVQ    R14, AX        // AX (and arg 0) = g
    MOVQ    SI, R14        // g = g.m.g0
    get_tls(CX)        // Set G in TLS
    MOVQ    R14, g(CX)
    MOVQ    (g_sched+gobuf_sp)(R14), SP    // sp = g0.sched.sp
    PUSHQ    AX    // open up space for fn's arg spill slot
    MOVQ    0(DX), R12
    CALL    R12        // fn(g)
    POPQ    AX
    JMP    runtime·badmcall2(SB)
    RET
View Code
复制代码
解除绑定重新调度

src/runtime/proc.go:4088

复制代码
// preemptPark parks gp and puts it in _Gpreempted.
//
//go:systemstack
func preemptPark(gp *g) {
    ...
    casGToPreemptScan(gp, _Grunning, _Gscan|_Gpreempted)
    dropg() // 解开绑定
    ....
    schedule() // 从新调度
}
复制代码

src/runtime/proc.go:4081

复制代码
func gopreempt_m(gp *g) {
    goschedImpl(gp, true)
}


func goschedImpl(gp *g, preempted bool) {
    ...

    dropg()    // 解除绑定
    lock(&sched.lock)
    globrunqput(gp)
    unlock(&sched.lock)

    if mainStarted {
        wakep()
    }

    schedule()  // 重新调度
}
复制代码

 

posted @   G1733  阅读(32)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
点击右上角即可分享
微信分享提示