go语言调度gmp原理(4)

go语言调度gmp原理(4)

触发调度

下面简单介绍所有触发调度的时间点。因为调度器的runtime.schedule会重新选择goroutine在线程上执行,所以我们只要找到该函数的调用方,就能找到所有触发调度的时间点

这里重点介绍运行时触发调度的几条路径

  • 主动挂起——runtime.gopark -> runtime.park_m
  • 系统调用——runtime.exitsyscall -> runtime.exitsyscall0
  • 协作式调度——runtime.Gosched -> runtime.gosched_m -> runtime.goschedIpml
  • 系统监控——runtime.sysmon -> runtime.retake -> runtime.preemptone

这里介绍的调度时间点不是将线程的运行权直接交给其他任务,而是通过调度器的runtime.schedule重新调度。

主动挂起

runtime.gopark是触发调度最常用的方法,该函数会将当前goroutine暂停,暂停的任务不会放回运行队列。下面分析该函数的实现原理

func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
	if reason != waitReasonSleep {
		checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
	}
	mp := acquirem()
	gp := mp.curg
	status := readgstatus(gp)
	if status != _Grunning && status != _Gscanrunning {
		throw("gopark: bad g status")
	}
	mp.waitlock = lock
	mp.waitunlockf = unlockf
	gp.waitreason = reason
	mp.waittraceev = traceEv
	mp.waittraceskip = traceskip
	releasem(mp)
	// can't do anything that might move the G between Ms here.
	mcall(park_m)
}

上述函数会通过runtime.mcall切换到g0的栈上调用runtime.park_m

func park_m(gp *g) {
	mp := getg().m

	if trace.enabled {
		traceGoPark(mp.waittraceev, mp.waittraceskip)
	}

	// N.B. Not using casGToWaiting here because the waitreason is
	// set by park_m's caller.
	casgstatus(gp, _Grunning, _Gwaiting)
	dropg()

	if fn := mp.waitunlockf; fn != nil {
		ok := fn(gp, mp.waitlock)
		mp.waitunlockf = nil
		mp.waitlock = nil
		if !ok {
			if trace.enabled {
				traceGoUnpark(gp, 2)
			}
			casgstatus(gp, _Gwaiting, _Grunnable)
			execute(gp, true) // Schedule it back, never returns.
		}
	}
	schedule()
}

runtime.park_m会将当前goroutine的状态从_Grunning切换至_Gwaiting,调用runtime.dropg移除线程和goroutine之间的关联,此后就可以调用runtime.schedule触发新一轮调度了

当goroutine等待的特定条件满足后,运行时会调用runtime.goready将因调用runtime.gopark而陷入休眠的goroutine唤醒

func goready(gp *g, traceskip int) {
	systemstack(func() {
		ready(gp, traceskip, true)
	})
}

func ready(gp *g, traceskip int, next bool) {
	if trace.enabled {
		traceGoUnpark(gp, traceskip)
	}

	status := readgstatus(gp)

	// Mark runnable.
	mp := acquirem() // disable preemption because it can be holding p in a local var
	if status&^_Gscan != _Gwaiting {
		dumpgstatus(gp)
		throw("bad g->status in ready")
	}

	// status is Gwaiting or Gscanwaiting, make Grunnable and put on runq
	casgstatus(gp, _Gwaiting, _Grunnable)
	runqput(mp.p.ptr(), gp, next)
	wakep() //wakep将会唤醒一个p来执行g,调用startm方法来调度空闲的m运行p,如有必要将会创建一个m
	releasem(mp) //解除禁止抢占
}

runtime.ready会将准备就绪的goroutine的状态切换至_Grunnable,并将其加入处理器的运行队列中,等待调度器的调度

系统调用

系统调用也会触发运行时调度器的调度。为了处理特殊的系统调用,我们甚至在goroutine中加入了_Gsyscall状态,go语言通过syscall.Syscall和syscall.RawSyscall等使用汇编语言编写的方法封装操作系统提供的所有系统调用

在通过汇编指令INVOKE_SYSCALL执行系统调用前后,会调用运行时的runtime.entersyscall和runtime.exitsyscall,真是这一层封装能让我们能在陷入系统调用前触发运行时准备和清理工作

不过处于性能的考虑,如果这次系统调用不需要运行时参与,就会使用syscall.RawSyscall简化这一过程,不在调用运行时函数,go语言对Linux 386架构上不同系统调用的分类,我们会按需决定是否需要运行时参与。

系统调用 类型
SYS_TIME RawSyscall
SYS_GETTIMEOFDAY RawSyscall
SYS_SETRLIMIT RawSyscall
SYS_GETRLIMIT RawSyscall
SYS_EPOLL_WAIT Syscall

SYS_SETRLIMIT用于设置资源使用限制

由于直接进行系统调用胡阻塞当前线程,所以只有可以立即返回的系统调用才可能被设置成RawSyscall类型,列如SYS_EPOLL_CREATE、SYS_EPOLL_WAIT(超时时间为0)、SYS_TIME等

正常的系统调用过程比较复杂,下面分别介绍进入系统调用前的准备工作和系统调用结束后的收尾工作

准备工作

runtime.entersyscall会在获取当前程序计数器和栈位置之后调用runtime.reentersyscall,它会完成goroutine进入系统调用前的准备工作

func reentersyscall(pc, sp uintptr) {
	gp := getg()

	// Disable preemption because during this function g is in Gsyscall status,
	// but can have inconsistent g->sched, do not let GC observe it.
	gp.m.locks++

	// Entersyscall must not call any function that might split/grow the stack.
	// (See details in comment above.)
	// Catch calls that might, by replacing the stack guard with something that
	// will trip any stack check and leaving a flag to tell newstack to die.
	gp.stackguard0 = stackPreempt
	gp.throwsplit = true

	// Leave SP around for GC and traceback.
	save(pc, sp)
	gp.syscallsp = sp
	gp.syscallpc = pc
	casgstatus(gp, _Grunning, _Gsyscall)
	if gp.syscallsp < gp.stack.lo || gp.stack.hi < gp.syscallsp {
		systemstack(func() {
			print("entersyscall inconsistent ", hex(gp.syscallsp), " [", hex(gp.stack.lo), ",", hex(gp.stack.hi), "]\n")
			throw("entersyscall")
		})
	}

	if trace.enabled {
		systemstack(traceGoSysCall)
		// systemstack itself clobbers g.sched.{pc,sp} and we might
		// need them later when the G is genuinely blocked in a
		// syscall
		save(pc, sp)
	}

	if sched.sysmonwait.Load() {
		systemstack(entersyscall_sysmon)
		save(pc, sp)
	}

	if gp.m.p.ptr().runSafePointFn != 0 {
		// runSafePointFn may stack split if run on this stack
		systemstack(runSafePointFn)
		save(pc, sp)
	}

	gp.m.syscalltick = gp.m.p.ptr().syscalltick
	gp.sysblocktraced = true
	pp := gp.m.p.ptr()
	pp.m = 0
	gp.m.oldp.set(pp)
	gp.m.p = 0
	atomic.Store(&pp.status, _Psyscall)
	if sched.gcwaiting.Load() {
		systemstack(entersyscall_gcwait)
		save(pc, sp)
	}

	gp.m.locks--
}

准备工作包括:

  1. 禁止线程上发生的抢占,防止出现内存不一致问题
  2. 保证当前函数不会触发栈分裂或者增长
  3. 保存当前程序计数器PC和栈指针SP中的内容
  4. 将goroutine的状态更新至_Gsyscall
  5. 将goroutine的处理器和线程暂时分离,并将处理器的状态更新到_Psyscall
  6. 释放当前线程上的锁

需要注意的是,runtime.reentersyscall会使处理器和线程分离,当前线程会陷入系统调用等待返回,当锁被释放后,会有其他goroutine抢占处理器资源

恢复工作

当系统调用结束后,调用退出系统调用的函数runtime.exitsyscall为当前goroutine重新分配资源,该函数有两种执行路径

  1. 调用runtime.exitsyscallfast
  2. 切换至调度器的gorotuine并调用runtime.exitsyscall0
func exitsyscall() {
	gp := getg()

	gp.m.locks++ // see comment in entersyscall
	if getcallersp() > gp.syscallsp {
		throw("exitsyscall: syscall frame is no longer valid")
	}

	gp.waitsince = 0
	oldp := gp.m.oldp.ptr()
	gp.m.oldp = 0
	if exitsyscallfast(oldp) {
		// When exitsyscallfast returns success, we have a P so can now use
		// write barriers
		if goroutineProfile.active {
			// Make sure that gp has had its stack written out to the goroutine
			// profile, exactly as it was when the goroutine profiler first
			// stopped the world.
			systemstack(func() {
				tryRecordGoroutineProfileWB(gp)
			})
		}
		if trace.enabled {
			if oldp != gp.m.p.ptr() || gp.m.syscalltick != gp.m.p.ptr().syscalltick {
				systemstack(traceGoStart)
			}
		}
		// There's a cpu for us, so we can run.
		gp.m.p.ptr().syscalltick++
		// We need to cas the status and scan before resuming...
		casgstatus(gp, _Gsyscall, _Grunning)

		// Garbage collector isn't running (since we are),
		// so okay to clear syscallsp.
		gp.syscallsp = 0
		gp.m.locks--
		if gp.preempt {
			// restore the preemption request in case we've cleared it in newstack
			gp.stackguard0 = stackPreempt
		} else {
			// otherwise restore the real _StackGuard, we've spoiled it in entersyscall/entersyscallblock
			gp.stackguard0 = gp.stack.lo + _StackGuard
		}
		gp.throwsplit = false

		if sched.disable.user && !schedEnabled(gp) {
			// Scheduling of this goroutine is disabled.
			Gosched()
		}

		return
	}

	gp.sysexitticks = 0
	if trace.enabled {
		// Wait till traceGoSysBlock event is emitted.
		// This ensures consistency of the trace (the goroutine is started after it is blocked).
		for oldp != nil && oldp.syscalltick == gp.m.syscalltick {
			osyield()
		}
		// We can't trace syscall exit right now because we don't have a P.
		// Tracing code can invoke write barriers that cannot run without a P.
		// So instead we remember the syscall exit time and emit the event
		// in execute when we have a P.
		gp.sysexitticks = cputicks()
	}

	gp.m.locks--

	// Call the scheduler.
	mcall(exitsyscall0)

	// Scheduler returned, so we're allowed to run now.
	// Delete the syscallsp information that we left for
	// the garbage collector during the system call.
	// Must wait until now because until gosched returns
	// we don't know for sure that the garbage collector
	// is not running.
	gp.syscallsp = 0
	gp.m.p.ptr().syscalltick++
	gp.throwsplit = false
}

这两种路径分别通过不同的方法查找一个用于执行当前goroutine的处理器P,快速路径runtime.exitsyscallfast中包含两个不同的分支

  1. 如果goroutine的原处理器处于_Psyscall状态,会直接调用wirep将goroutine与处理器进行关联
  2. 如果调度器中存在空闲处理器,会调用runtime.acquriep使用空闲处理器处理当前goroutine

另一条相对较慢的路径runtime.exitsyscall0会将当前goroutine切换至_Grunnable状态并移除线程M和当前goroutine的关联

  1. 当我们通过runtime.pidleget获取到空闲处理器时,就会在该处理器上执行goroutine
  2. 在其他情况下,我们会将当前gorotuine放到全局运行队列中,等待调度器的调度

无论哪种情况,我们在这个函数中都会调用runtime.schedule触发调度器的调度。

协作式调度

runtime.Gosched函数会主动让出处理器,允许其他goroutine运行。该函数无法挂起goroutine,调度器可能会将当前goroutine调度到其他线程上

func Gosched() {
	checkTimeouts()
	mcall(gosched_m)
}

func gosched_m(gp *g) {
	if trace.enabled {
		traceGoSched()
	}
	goschedImpl(gp)
}

func goschedImpl(gp *g) {
	status := readgstatus(gp)
	if status&^_Gscan != _Grunning {
		dumpgstatus(gp)
		throw("bad g status")
	}
	casgstatus(gp, _Grunning, _Grunnable)
	dropg()
	lock(&sched.lock)
	globrunqput(gp)
	unlock(&sched.lock)

	schedule()
}

经过连续几次跳转,我们最终在g0的栈上调用runtime.goschedImpl,运行时会更新goroutine的状态到_Grunnable,让出当前处理器并将goroutine重新放回全局队列,最后该函数会调用,runtime.schedule触发调度

posted @   每天提醒自己要学习  阅读(91)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
点击右上角即可分享
微信分享提示