Golang 协程栈
前言
在1.4版本之前go的协程栈管理使用分段栈机制实现。实现方式:当检测到函数需要更多栈时,分配一块新栈,旧栈和新栈使用指针连接起来,函数返回就释放。 这样的机制存在2个问题:
- 多次循环调用同一个函数会出现“hot split”问题,例子:stacksplit.go
- 每次分配和释放都要额外消耗
为了解决这2个问题,官方使用:连续栈。连续栈的实现方式:当检测到需要更多栈时,分配一块比原来大一倍的栈,把旧栈数据copy到新栈,释放旧栈。
连续栈
栈的扩容和缩容代码量很大,所以精简了很大一部分。在看连续栈的源码前我们不妨思考一下下面的问题:
- 扩容和缩容的触发条件是什么?
- 扩容和缩容的大小如何计算出来?
- 扩容和缩容这个过程做了什么?对性能是否有影响?
栈扩容
1 func newstack() { 2 thisg := getg() 3 ...... 4 gp := thisg.m.curg 5 ...... 6 // Allocate a bigger segment and move the stack. 7 oldsize := gp.stack.hi - gp.stack.lo 8 newsize := oldsize * 2 // 比原来大一倍 9 ...... 10 // The goroutine must be executing in order to call newstack, 11 // so it must be Grunning (or Gscanrunning). 12 casgstatus(gp, _Grunning, _Gcopystack) //修改协程状态 13 14 // The concurrent GC will not scan the stack while we are doing the copy since 15 // the gp is in a Gcopystack status. 16 copystack(gp, newsize, true) //在下面会讲到 17 ...... 18 casgstatus(gp, _Gcopystack, _Grunning) 19 gogo(&gp.sched) 20 }
每一个函数执行都要占用栈空间,用于保存变量,参数等。运行在协程里的函数自然是占用运行它的协程栈。但协程的栈是有限的,如果发现不够用,会调用stackalloc
分配一块新的栈,大小比原来大一倍。
栈缩容
func shrinkstack(gp *g) { gstatus := readgstatus(gp) ...... oldsize := gp.stack.hi - gp.stack.lo newsize := oldsize / 2 // 比原来小1倍 // Don't shrink the allocation below the minimum-sized stack // allocation. if newsize < _FixedStack { return } // Compute how much of the stack is currently in use and only // shrink the stack if gp is using less than a quarter of its // current stack. The currently used stack includes everything // down to the SP plus the stack guard space that ensures // there's room for nosplit functions. avail := gp.stack.hi - gp.stack.lo //当已使用的栈占不到总栈的1/4 进行缩容 if used := gp.stack.hi - gp.sched.sp + _StackLimit; used >= avail/4 { return } copystack(gp, newsize, false) //在下面会讲到 }
栈的缩容主要是发生在GC期间。一个协程变成常驻状态,繁忙时需要占用很大的内存,但空闲时占用很少,这样会浪费很多内存,为了避免浪费Go在GC时对协程的栈进行了缩容,缩容也是分配一块新的内存替换原来的,大小只有原来的1/2。
扩容和缩容这个过程做了什么?
1 func copystack(gp *g, newsize uintptr, sync bool) { 2 ...... 3 old := gp.stack 4 ...... 5 used := old.hi - gp.sched.sp 6 7 // allocate new stack 8 new := stackalloc(uint32(newsize)) 9 ...... 10 // Compute adjustment. 11 var adjinfo adjustinfo 12 adjinfo.old = old 13 adjinfo.delta = new.hi - old.hi //用于旧栈指针的调整 14 15 //后面有机会和 select / chan 一起分析 16 // Adjust sudogs, synchronizing with channel ops if necessary. 17 ncopy := used 18 if sync { 19 adjustsudogs(gp, &adjinfo) 20 } else { 21 ...... 22 adjinfo.sghi = findsghi(gp, old) 23 24 // Synchronize with channel ops and copy the part of 25 // the stack they may interact with. 26 ncopy -= syncadjustsudogs(gp, used, &adjinfo) 27 } 28 //把旧栈数据复制到新栈 29 // Copy the stack (or the rest of it) to the new location 30 memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy) 31 32 // Adjust remaining structures that have pointers into stacks. 33 // We have to do most of these before we traceback the new 34 // stack because gentraceback uses them. 35 adjustctxt(gp, &adjinfo) 36 adjustdefers(gp, &adjinfo) 37 adjustpanics(gp, &adjinfo) 38 ...... 39 // Swap out old stack for new one 40 gp.stack = new 41 gp.stackguard0 = new.lo + _StackGuard // NOTE: might clobber a preempt request 42 gp.sched.sp = new.hi - used 43 gp.stktopsp += adjinfo.delta 44 // Adjust pointers in the new stack. 45 gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, adjustframe, noescape(unsafe.Pointer(&adjinfo)), 0) 46 ...... 47 //释放旧栈 48 stackfree(old) 49 }
func adjustpointer(adjinfo *adjustinfo, vpp unsafe.Pointer) { pp := (*uintptr)(vpp) p := *pp ...... //如果这个整数型数字在旧栈的范围,就调整 if adjinfo.old.lo <= p && p < adjinfo.old.hi { *pp = p + adjinfo.delta ...... } }
在扩容和缩容这个过程中,做了很多调整。从连续栈的实现方式上我们了解到,不管是扩容还是缩容,都重新申请一块新栈,然后把旧栈的数据复制到新栈。协程占用的物理内存完全被替换了,而Go在运行时会把指针保存到内存里面,例如:gp.sched.ctxt
,gp._defer
,gp._panic
,包括函数里的指针。这部分指针值会被转换成整数型uintptr
,然后 + delta
进行调整。
Frame调整
如果只是想了解栈的扩缩容,上面就够了。这部分深入到细节,没兴趣可以跳过。在了解Frame调整前,先了解下 Stack Frame。Stack Frame :函数运行时占用的内存空间,是栈上的数据集合,它包括:
- Local variables
- Saved copies of registers modified by subprograms that could need restoration
- Argument parameters
- Return address
FP
,SP
,PC
,LR
- FP: Frame Pointer
– Points to the bottom of the argument list
- SP: Stack Pointer
– Points to the top of the space allocated for local variables
- PC: Program Counter
- LR:Caller's Program Counter
Stack frame layout
// (x86) // +------------------+ // | args from caller | // +------------------+ <- frame->argp // | return address | // +------------------+ // | caller's BP (*) | (*) if framepointer_enabled && varp < sp // +------------------+ <- frame->varp // | locals | // +------------------+ // | args to callee | // +------------------+ <- frame->sp
在Go里针对X86和ARM的Stack frame layout会不一样,这里只对X86进行分析。
1 func bb(a *int, aa *int) { 2 var v1 int 3 println("v1 before morestack", uintptr(unsafe.Pointer(&v1))) 4 5 cc(0) 6 7 println("a after morestack", uintptr(unsafe.Pointer(a))) 8 println("aa after morestack", uintptr(unsafe.Pointer(aa))) 9 println("v1 after morestack", uintptr(unsafe.Pointer(&v1))) 10 } 11 12 // for morestack 13 func cc(i int){ 14 i++ 15 if i >= 30 { 16 println("morestack done") 17 }else{ 18 cc(i) 19 } 20 } 21 22 func main() { 23 wg := sync.WaitGroup{} 24 wg.Add(1) 25 go func() { 26 var a, aa int 27 a = 1000 28 aa = 1000 29 30 println("a before morestack", uintptr(unsafe.Pointer(&a))) 31 println("aa before morestack", uintptr(unsafe.Pointer(&aa))) 32 33 bb(&a, &aa) 34 wg.Done() 35 }() 36 wg.Wait() 37 }
结果:
a before morestack 824633925560 aa before morestack 824633925552 v1 before morestack 824633925504 morestack done a after morestack 824634142648 aa after morestack 824634142640 v1 after morestack 824634142592
从结果看出bb的参数a,aa和变量v1地址在经过扩容后发生了变化,这个变化是怎么实现的呢?我们主要围绕下面3个问题进行分析:
- 如何确认函数Frame的位置
- 如何找到函数参数,变量的指针
- 如何确认父函数的Frame
从gentraceback开始
1 func gentraceback(pc0, sp0, lr0 uintptr, gp *g, skip int, pcbuf *uintptr, max int, callback func(*stkframe, unsafe.Pointer) bool, v unsafe.Pointer, flags uint) int { 2 ...... 3 g := getg() 4 ...... 5 if pc0 == ^uintptr(0) && sp0 == ^uintptr(0) { // Signal to fetch saved values from gp. 6 if gp.syscallsp != 0 { 7 ...... 8 } else { 9 //运行位置 10 pc0 = gp.sched.pc 11 sp0 = gp.sched.sp 12 ...... 13 } 14 } 15 nprint := 0 16 var frame stkframe 17 frame.pc = pc0 18 frame.sp = sp0 19 ...... 20 f := findfunc(frame.pc) 21 ...... 22 frame.fn = f 23 24 n := 0 25 for n < max { 26 ...... 27 f = frame.fn 28 if f.pcsp == 0 { 29 // No frame information, must be external function, like race support. 30 // See golang.org/issue/13568. 31 break 32 } 33 ...... 34 if frame.fp == 0 { 35 sp := frame.sp 36 ...... 37 //计算FP 38 frame.fp = sp + uintptr(funcspdelta(f, frame.pc, &cache)) 39 if !usesLR { 40 // On x86, call instruction pushes return PC before entering new function. 41 frame.fp += sys.RegSize 42 } 43 } 44 var flr funcInfo 45 if topofstack(f, gp.m != nil && gp == gp.m.g0) { 46 ...... 47 } else if usesLR && f.funcID == funcID_jmpdefer { 48 ...... 49 } else { 50 var lrPtr uintptr 51 if usesLR { 52 ...... 53 } else { 54 if frame.lr == 0 { 55 //获取调用函数的PC值 56 lrPtr = frame.fp - sys.RegSize 57 frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr))) 58 } 59 } 60 flr = findfunc(frame.lr) 61 ...... 62 } 63 64 frame.varp = frame.fp 65 if !usesLR { 66 // On x86, call instruction pushes return PC before entering new function. 67 frame.varp -= sys.RegSize 68 } 69 ...... 70 if framepointer_enabled && GOARCH == "amd64" && frame.varp > frame.sp { 71 frame.varp -= sys.RegSize 72 } 73 ...... 74 if callback != nil || printing { 75 frame.argp = frame.fp + sys.MinFrameSize 76 ...... 77 } 78 ...... 79 //当前为调整frame 80 if callback != nil { 81 if !callback((*stkframe)(noescape(unsafe.Pointer(&frame))), v) { 82 return n 83 } 84 } 85 ...... 86 n++ 87 skipped: 88 ...... 89 //确认父Frame 90 // Unwind to next frame. 91 frame.fn = flr 92 frame.pc = frame.lr 93 frame.lr = 0 94 frame.sp = frame.fp 95 frame.fp = 0 96 frame.argmap = nil 97 ...... 98 } 99 ...... 100 return n 101 }
-
确认当前位置
当发生扩缩容时,Go的runtime已经把PC保存到
gp.sched.pc
,SP保存到gp.sched.sp
。 -
找出函数信息
函数的参数、变量个数,frame size,file line等信息,编译通过后被保存进执行文件,执行时被加载进内存,这部分数据可以通过PC获取出来:findfunc -> findmoduledatap
func findmoduledatap(pc uintptr) *moduledata { for datap := &firstmoduledata; datap != nil; datap = datap.next { if datap.minpc <= pc && pc < datap.maxpc { return datap } } return nil }
- 计算FP
frame.fp = sp + uintptr(funcspdelta(f, frame.pc, &cache))
SP我们可以理解为函数的顶端,FP是函数的底部,有了SP,缺函数长度(frame size)。其实我们可以根据pcsp获取,因为它已经被映射进了内存,详情请看Go 1.2 Runtime Symbol Information。知道了FP和SP,我们就可以知道函数在协程栈的具体位置。
-
获取父函数PC指令(LR)
lrPtr = frame.fp - sys.RegSize frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr)))
父函数的PC指令放在了stack frame图的
return address
位置,我们可以直接拿出来,根据这个指令我们获得父函数的信息。 - 确认父函数Frame
frame.fn = flr frame.pc = frame.lr frame.lr = 0 frame.sp = frame.fp frame.fp = 0 frame.argmap = nil
从stack frame图可以看到子函数的FP等于父函数SP。知道了父函数的SP和PC,重复上面的步骤就可以找出函数所在整条调用链,我们平时看到panic出现的调用链就是这样出来的。
以adjustframe结束
1 func adjustframe(frame *stkframe, arg unsafe.Pointer) bool { 2 adjinfo := (*adjustinfo)(arg) 3 ...... 4 f := frame.fn 5 ...... 6 locals, args := getStackMap(frame, &adjinfo.cache, true) 7 // Adjust local variables if stack frame has been allocated. 8 if locals.n > 0 { 9 size := uintptr(locals.n) * sys.PtrSize 10 adjustpointers(unsafe.Pointer(frame.varp-size), &locals, adjinfo, f) 11 } 12 13 // Adjust saved base pointer if there is one. 14 if sys.ArchFamily == sys.AMD64 && frame.argp-frame.varp == 2*sys.RegSize { 15 ...... 16 adjustpointer(adjinfo, unsafe.Pointer(frame.varp)) 17 } 18 // Adjust arguments. 19 if args.n > 0 { 20 ...... 21 adjustpointers(unsafe.Pointer(frame.argp), &args, adjinfo, f) 22 } 23 return true 24 }
通过gentraceback获取frame在协程栈的准确位置,结合 Stack frame layout,我们就可以知道函数参数argp
和变量varp
地址。在64位系统,每个指针占用8个字节。以8做为步长,就可得出函数参数和变量里的指针并进行调整。
连续栈的缺点
连续栈虽然解决了分段栈的2个问题,但这种实现方式也会带来其他问题:
- 更多的虚拟内存碎片。尤其是你需要更大的栈时,分配一块连续的内存空间会变得更困难
- 指针会被限制放入栈。在go里面不允许二个协程的指针相互指向。这会增加实现的复杂性。
收益
- 栈增长1倍快了10%,增长50%只快了2%,增长25%慢了20%
Hot split
性能问题。