Five things that make Go fast-渣渣翻译-让GO语言更快的5个原因

原文地址：https://dave.cheney.net/2014/06/07/five-things-that-make-go-fast

翻译放在每个小段下面

Anthony Starks has remixed my original Google Present based slides using his fantastic Deck presentation tool. You can check out his remix over on his blog, mindchunk.blogspot.com.au/2014/06/remixing-with-deck.

Anthony Starks ：安东尼·斯塔克斯？
remix ：使混合，再搅拌
original ：原物；原作；原始的
Present ：呈现，现在
based ：
slides ：滑，下跌；幻灯片
fantastic ：奇异的，极好的
Deck ：夹板，行李仓n；装饰；装甲板，打扮vt；人名Deck
presentation ：展示；描述，陈述，介绍
check out ：检验；结账离开

Anthony Starks 使用他自己做的演示工具将我原来的文章中的幻灯片做了一番改进和加强，你们可以去他的博客找到相关的工具和内容

I was recently invited to give a talk at Gocon, a fantastic Go conference held semi-annually in Tokyo, Japan. Gocon 2014 was an entirely community-run one day event combining training and an afternoon of presentations surrounding the theme of Go in production.

recently ：最近
invited to ：被邀请干...
conference ：会议
semi-annually ：半年度
entirely ：完全
community-run ：社区运营
combining ：结合
presentations ：报告，提供，外观，表演
surround ：围绕

最近，我被邀请到Gocon做演讲。Gocon是日本东京的一个每半年举行一次的关于GO语言的会议。Gocon 2014是一个完全由社区运营的为期一天的活动，结合了培训和下午围绕“Go in production”主题的演讲。

The following is the text of my presentation. The original text was structured to force me to speak slowly and clearly, so I have taken the liberty of editing it slightly to be more readable.

I want to thank Bill Kennedy, Minux Ma, and especially Josh Bleecher Snyder, for their assistance in preparing this talk.

structure ：组织，安排
force ：促使，强迫
liberty：自由
slightly ：轻微的，稍稍

下面是我演讲的文档，原始文档的结构使我能够讲的缓慢而清晰，所以我稍微修改过后，使其更具可读性。

Good afternoon.

My name is David.

I am delighted to be here at Gocon today. I have wanted to come to this conference for two years and I am very grateful to the organisers for extending me the opportunity to present to you today.

delighted ：喜欢的，高兴的、
organisers ：组织者
opportunity ：机会

非常高兴，今天能够参加Gocon。两年前，我就在盼望可以参加这个会议，并且非常感谢组织方给我这个和大家分享的机会。

I want to begin my talk with a question.

Why are people choosing to use Go ?

When people talk about their decision to learn Go, or use it in their product, they have a variety of answers, but there always three that are at the top of their list

variety ：种类

首先，我有一个问题，人们问什么选择GO语言？

在他们谈论自己学习GO的原因，或者在生产中使用时，他们说了许多的答案。有三个观点是最多的。

concurrency 并发性
easy of deployment 容易部署
performance 性能表现

These are the top three.

The first, Concurrency.

Go’s concurrency primitives are attractive to programmers who come from single threaded scripting languages like Nodejs, Ruby, or Python, or from languages like C++ or Java with their heavyweight threading model.

primitive ：原始的，早起的，简单的
attractive ：有吸引力的
single threaded ：单线程
scripting ：脚本
heavyweight ：重量级

第一点，并发性

Go语言简单的并发特性，对于那些使用单线程的脚本语言，如Nodejs、Ruby、Python或者像C++和Java的重量级线程模型的程序员，都很有吸引力

Ease of deployment.

We have heard today from experienced Gophers who appreciate the simplicity of deploying Go applications.

appreciate ：感激，感谢，欣赏，赏识，重视

第二点，容易部署

我们从经验丰富的Go语言使用者（Gophers）那里听到，他们非常欣赏部署Go应用程序的简单性。

This leaves Performance.

I believe an important reason why people choose to use Go is because it is fast.

最后就是性能了。

我相信，人们选择Go语言最重要的原因，就是因为它快。

For my talk today I want to discuss five features that contribute to Go’s performance.

I will also share with you the details of how Go implements these features.

feature ：特征
contribute ：起促进作用

今天的演讲，我想讨论关于促进Go的性能的五个特性。
我还将与你分享Go如何实现这些特性的详细信息。

一、变量

The first feature I want to talk about is Go’s efficient treatment and storage of values.

efficient ：效率高的
treatment ：处理
storage  ：存储

第一个特性，就是Go语言处理和存储变量的高效性。

This is an example of a value in Go. When compiled, gocon consumes exactly four bytes of memory.

Let’s compare Go with some other languages

compiled ：编译
consumes ：消耗
compare ：对比

这是GO语言定义变量的一个例子，编译完成后，gocon占用将近4个字节的内存。

让我们对比一下其他语言。

Due to the overhead of the way Python represents variables, storing the same value using Python consumes six times more memory.

This extra memory is used by Python to track type information, do reference counting, etc

Let’s look at another example:

overhead ：离地面的，头顶上的；经费
represents ：表现
track ：足迹，踪迹，追踪
reference ：参考
etc ：等等

由于python表达变量的方式，存储相同的变量的开销占用内存空间高达6倍。

额外的空间用来追踪变量类型信息，记录引用计数，等等。

来看下一个例子：

Similar to Go, the Java int type consumes 4 bytes of memory to store this value.

However, to use this value in a collection like a List or Map, the compiler must convert it into an Integer object.

collection ：收集，收取
这里是集合的意思，一种数据类型

和Go语言一样，Java的int类型，占用4个字节来存储这个变量。

但是，要在List或Map之类的集合中使用这个变量，编译器必须将其转换为Integer对象。

So an integer in Java frequently looks more like this and consumes between 16 and 24 bytes of memory.

Why is this important ? Memory is cheap and plentiful, why should this overhead matter ?

frequently ：频繁的，屡次的
plentiful ：丰富的

所以，在Java中，integer通常是这样的，他将占用16-24个字节的空间

为什么这个很重要？内存又便宜又多。为什么这点开销关系重大？

This is a graph showing CPU clock speed vs memory bus speed.

Notice how the gap between CPU clock speed and memory bus speed continues to widen.

The difference between the two is effectively how much time the CPU spends waiting for memory.

lags ：延迟
graph ：曲线图
gap ：缺口
widen ：变宽

这是显示CPU时钟速度与内存总线速度的图表。
请注意CPU时钟速度和内存总线速度之间的差距如何继续扩大。
两者之间的区别实际上是CPU花费多少时间等待内存。

Since the late 1960’s CPU designers have understood this problem.

Their solution is a cache, an area of smaller, faster memory which is inserted between the CPU and main memory.

自1960年代后期以来，CPU设计师已经意识到这个问题。
他们的解决方案是增加缓存，一个更小，更快的内存区域，放在CPU和主内存之间。

CPU之所以发展迅猛更多的是依赖与于缓存 
那么如果缓存的有用数据能更多，那么缓存的性能也就能随之提高
缓存的性能提高将会带来更佳的程序性能
通常动态类型的语言，数组中的所有元素类型各不相同 ，需要被单独的存放到堆中 
而不是一个连续的存储数组，这就使CPU缓存无用武之地 
因为CPU缓存会把一片连续的内存空间读入 
而这种分散在不同的内存地址中的数据，缓存帮不上忙，就只能CPU去读取内存  
缓存读取一个数据实在3个CPU时钟周期 
而从内存读取一个数据则需要100CPU时钟周期
所以程序性能降低也就是理所应当的了

因此，我们在定义变量的时候，能用小的就用小的，尽量让数值留在CPU Cache，而不是在速度更慢的内存里。

缓存的参考文章：https://blog.51cto.com/wingeek/274006

This is a Location type which holds the location of some object in three dimensional space. It is written in Go, so each Location consumes exactly 24 bytes of storage.

We can use this type to construct an array type of 1,000 Locations, which consumes exactly 24,000 bytes of memory.

Inside the array, the Location structures are stored sequentially, rather than as pointers to 1,000 Location structures stored randomly.

This is important because now all 1,000 Location structures are in the cache in sequence, packed tightly together.

Location ：位置
dimensional ：空间的，维的，尺寸的
construct ：构建
structures ：结构
sequentially ：继续的
sequence ：顺序，连续
packed ：包装，充满...的
tightly ：紧紧的

这是一种位置类型，它将对象保存在三维空间中的位置（X、Y、Z）。它是用Go编写的，因此每个位置只消耗24个字节的存储空间。
我们可以使用这种类型来构造一个1,000个Locations的数组类型，它只消耗24,000个字节的内存。
在数组内部，Location结构按顺序存储，而不是作为指向随机存储的1,000个Location结构的指针。
这很重要，因为现在所有1,000个Location结构都按顺序放在缓存中，紧密排列在一起。

Go lets you create compact data structures, avoiding unnecessary indirection.

Compact data structures utilise the cache better.

Better cache utilisation leads to better performance.

compact ：合同，紧凑的
avoiding ：回避；撤销
indirection ：间接
utilise ：利用

Go可以让你创建紧凑的数据结构，避免不必要的间接。
紧凑的数据结构更好地利用缓存。
更好的缓存利用率可带来更好的性能。

二、内联函数

Function calls are not free.

函数调用不是免费的。

这是因为函数调用是有开销的。
这个开销大致分为两个部分，参数传递和保存当前程序的上下文。
对于传递参数的开销而言，传递的参数越多开销就越大；
对于保存当前程序上下文所花费的开销而言，函数越复杂需要花费的开销就越大。
如果一个很简单的函数其函数功能的开销甚至比函数调用的开销还要来的小的多，那就极其的不划算了。
内联函数的目的就在于，编译器会将一些功能极其简单的被调用函数代码内嵌到调用函数中。

Three things happen when a function is called.

A new stack frame is created, and the details of the caller recorded.

Any registers which may be overwritten during the function call are saved to the stack.

The processor computes the address of the function and executes a branch to that new address.

procedure ：程序，步骤
stack frame ：栈帧
registers ：注册，登记
processor ：加工，处理事物的人
execute ：执行，实现，使生效

调用函数后，发生的三件事。

1、创建一个新的栈帧，并记录调用者的详细信息。
2、在函数调用期间，可能被重写的任何寄存器都将保存到栈中。
3、处理器计算函数的地址并执行到该新地址的分支。

Because function calls are very common operations, CPU designers have worked hard to optimise this procedure, but they cannot eliminate the overhead.

Depending on what the function does, this overhead may be trivial or significant.

A solution to reducing function call overhead is an optimisation technique called Inlining.

unavoidable ：不可避免的
unavoidable  overhead ：不可避免的开销
common ：普遍的，常见的，共有的
operations ：操作
optimise ：使最优化
aliminate ：消除
depend on ：依赖
trivial ：不重要的
significant ：重要的 /sɪg'nɪfɪk(ə)nt/
reduce ：减少
optimisation : 优化
optimist ：乐天派
Mathematical optimization 最优化
technique ：技巧，技术
Inlining ：内联

因为函数调用是非常普遍和频繁的操作，因此，cpu的设计师们一直在努力寻找优化函数调用开销的方法，但，始终不能降低这个开销。

开销的大或者小，依赖于函数的功能的复杂程度。

内联函数就是减少函数调用开销的一种优化技术。

The Go compiler inlines a function by treating the body of the function as if it were part of the caller

Inlining has a cost; it increases binary size.

It only makes sense to inline when the overhead of calling a function is large relative to the work the function does, so only simple functions are candidates for inlining.

Complicated functions are usually not dominated by the overhead of calling them and are therefore not inlined.

treating ：处理
increases ：增加
binary ：二进制
sense ：识别，官能，辨别
relative ：相对的
candidate ：候选人
complicated ：结构复杂的
dominated ：受控的
therefore ：因此

Go编译器的内联是将一个函数处理到调用函数的内部，就相当于函数就是调用者的一部分。

内联是有代价的，他会增加二进制文件的大小。

会被做内联操作的只有那些调用开销比功能的开销大的函数，所以只有简单的函数才会被内联。

复杂函数的调用占用的那一点开销通常是微不足道的，不在监控范围，因此，不会被内联。

This example shows the function Double calling util.Max.

To reduce the overhead of the call to util.Max, the compiler can inline util.Max into Double, resulting in something like this

这是一个展示函数Double调用函数Max的例子。

为了减少调用Max的开销，编译器将Max内联在了Double中，结果如下：

After inlining there is no longer a call to util.Max, but the behaviour of Double is unchanged.

Inlining isn’t exclusive to Go. Almost every compiled or JITed language performs this optimisation. But how does inlining in Go work?

The Go implementation is very simple. When a package is compiled, any small function that is suitable for inlining is marked and then compiled as usual.

Then both the source of the function and the compiled version are stored.

exclusive ：高级的，专用的
JITed ：也是一张编译器
perform ：执行，完成，
implementation ：贯彻，落实，执行
suitable ：适当的，相配的

内联以后，不再调用Max，但Double的行为不变。
内联不是Go独有的。几乎每种编译或JITed语言都执行此优化。但在Go中，内联是如何运作的？
Go的实现非常简单。当一个包被编译，任何适合内联的小函数都会被标记，然后照常编译。
函数的源和编译的版本都会被存储起来。

This slide shows the contents of util.a. The source has been transformed a little to make it easier for the compiler to process quickly.

When the compiler compiles Double it sees that util.Max is inlinable, and the source of util.Max is available.

Rather than insert a call to the compiled version of util.Max, it can substitute the source of the original function.

Having the source of the function enables other optimizations.

这张幻灯片展示了util.a.的内容，原文件被编译后，变得更小了，是为了让程序运行的更快。

当编译器在编译Double的时候，他会发现Max已经被内联，并且，Max的源代码也是能访问到的。

与其插入一个对Max的编译版本的调用，倒不如使用内联替换源代码，使源代码可以得到优化。

In this example, although the function Test always returns false, Expensive cannot know that without executing it.

When Test is inlined, we get something like this

dead code elimination ：删除无用代码
executing ：执行

在这个例子中：景观函数Test返回了false，但是函数Expensive如果不执行，它将不会知道这一点。当Test被内联后，我们可以看见下面这些东西：

The compiler now knows that the expensive code is unreachable.

Not only does this save the cost of calling Test, it saves compiling or running any of the expensive code that is now unreachable.

The Go compiler can automatically inline functions across files and even across packages. This includes code that calls inlinable functions from the standard library.

automatically ：自动的
standard library ：标准库

编译器现在知道了expensive的代码没有任何结果。这不仅省掉了调用Test的的开销，还省去了编译和运行像expensive这样无效的代码。

Go编译器可以跨文件甚至是跨包自动内联函数，包括标准库中的函数。

三、逃逸分析

什么是逃逸：https://www.cnblogs.com/chenglc/p/9327700.html

介绍逃逸分析：https://www.iteye.com/topic/473355

简单的讲：逃逸就是在一个方法内创建的对象，被外部引用，在方法执行结束之后，但是外部引用还在，导致方法无法被GC回收，这就是逃逸，结果就是空间占用变大，GC负担增大，影响性能。

最好的方法就是：为了GC好，为了性能好，能在方法内创建对象，就不要在方法外创建对象。

Mandatory garbage collection makes Go a simpler and safer language.

This does not imply that garbage collection makes Go slow, or that garbage collection is the ultimate arbiter of the speed of your program.

What it does mean is memory allocated on the heap comes at a cost. It is a debt that costs CPU time every time the GC runs until that memory is freed.

mandatory ：命令的，强制的，义务的
garbage ：垃圾
collection ：收集
imply ：暗示
ultimate ：最后的，最终的
arbiter ：仲裁者
allocate ：分配，
heap ：堆
debt ：债务

强制垃圾回收机制（GC），让Go成为一个简单和安全的语言。这并不是在暗示说GC让go语言变慢，也不是说GC就是提高你的程序性能的最终王牌。

这里的意思是，给堆的内存分配是有代价的。在内存被释放之前，GC的运行一直占用着cpu资源。

There is however another place to allocate memory, and that is the stack.

Unlike C, which forces you to choose if a value will be stored on the heap, via malloc, or on the stack, by declaring it inside the scope of the function, Go implements an optimisation called escape analysis.

allocate ：分配
force ：武力，力量n；强迫，强加，推动vt
via ：通过
malloc ：内存动态分配函数，malloc的全称是memory allocation，中文叫动态内存分配
declare ：声明，宣布
scope ：视野，范围
implement ：使生效

还有另一个地方，用来分配内存。
Go语言据此完成了一项优化，叫做escape analysis.

和C不一样，它通过在函数范围内声明，让你选择是将变量存储在堆、通过malloc、或者存储在栈上。

Escape analysis determines whether any references to a value escape the function in which the value is declared.

If no references escape, the value may be safely stored on the stack.

Values stored on the stack do not need to be allocated or freed.

Lets look at some examples。

determines : 使下决心，确定，限定
reference ：提及，涉及
declare ：宣布，声明

逃逸分析判断变量引用是否逃逸，
如果没有逃逸，则该值可以安全地存储在栈中。
存储在栈中的值不需要分配或释放。
让我们看一些例子。

stack和heap？

stack作用域是本地的（locals），在函数执行完之后会自动收回，CPU控制，效率高 
heap则需要由程序来管理，效率低 
具体有篇文章讲这个： Memory stack vs heap 
因此，就算有GC，也应该把不需要传出的参数尽量控制在函数内。

Sum adds the numbers between 1 and 100 and returns the result. This is a rather unusual way to do this, but it illustrates how Escape Analysis works.

Because the numbers slice is only referenced inside Sum, the compiler will arrange to store the 100 integers for that slice on the stack, rather than the heap.

There is no need to garbage collect numbers, it is automatically freed when Sum returns.

illutrate ：目不识丁的，文盲的
arrange ：排列，安排，整理

在Sum中添加numbers，它是1-100之间的100个数字，并返回结果，这是一个非常规的方法，但是它能方便解释逃逸分析的工作原理。

因为numbers只在Sum中被引用，所以编译器安排这100个整型数字的队列存储在栈中，要比存储在堆中好。这样就不需要GC来回收numbers，当Sum结束后，将被自动释放。

This second example is also a little contrived. In CenterCursor we create a new Cursor and store a pointer to it in c.

Then we pass c to the Center() function which moves the Cursor to the center of the screen.

Then finally we print the X and Y locations of that Cursor.

Even though c was allocated with the new function, it will not be stored on the heap, because no reference c escapes the CenterCursor function.

contrived ：不自然的，勉强的

第二个例子，也比较勉强。在函数CenterCursor中，我们创建了一个新的Cursor对象，并将它指向c的位置，然后，我们将c传递给Center()，这个函数的作用是，将鼠标移动到屏幕中间。
最后，我们打印出鼠标的位置坐标X，Y。
尽管c是在新的函数中分配的，但是也没有被存放在堆中，因为没有函数CenterCursor之外对c的引用。

逃逸分析运行过程：

Go’s optimisations are always enabled by default. You can see the compiler’s escape analysis and inlining decisions with the -gcflags=-m switch.

Because escape analysis is performed at compile time, not run time, stack allocation will always be faster than heap allocation, no matter how efficient your garbage collector is.

I will talk more about the stack in the remaining sections of this talk.

perform ：执行
remaining ：剩余的
section ：章节，地区，部门

默认情况下，Go的优化是开启的。你可以使用-gcflags=-m switch查看编译器的逃逸分析和内联结果。
因为逃逸分析是在编译时执行的，不是在运行时，所以无论垃圾收集器的效率如何，栈分配总是比堆分配快。
我将在本演讲的其余部分详细讨论栈。

四、Goroutines

Goroutine是什么？https://www.jianshu.com/p/7ebf732b6e1f

https://baijiahao.baidu.com/s?id=1620972759226100794&wfr=spider&for=pc

Go has goroutines. These are the foundations for concurrency in Go.

I want to step back for a moment and explore the history that leads us to goroutines.

In the beginning computers ran one process at a time. Then in the 60’s the idea of multiprocessing, or time sharing became popular.

In a time-sharing system the operating systems must constantly switch the attention of the CPU between these processes by recording the state of the current process, then restoring the state of another.

This is called process switching.

foundations ：建立，基础，地基
concurrency ：并发
explore ：探索
constantly ：不断地
switch ：转换

Go的goroutines，是并发的根基。

我想带领大家探索协一下历史，以便了解goroutines。刚开始的时候，电脑在同一时间只能运行一个进程，然后在上个世纪60年代，多线程和分时技术的想法开始变得流行。在分时操作系统中，操作系统必须要在多个进程间切换CPU资源时，要记录当前进程的状态，然后恢复其他进程的运行。

这个就叫做进程切换。

进程切换的成本

There are three main costs of a process switch.

First is the kernel needs to store the contents of all the CPU registers for that process, then restore the values for another process.

The kernel also needs to flush the CPU’s mappings from virtual memory to physical memory as these are only valid for the current process.

Finally there is the cost of the operating system context switch, and the overhead of the scheduler function to choose the next process to occupy the CPU.

kernel ：内核
virtual ：虚拟的
valid ：有效的
scheduler ：调度程序
occupy ：占据

进程的切换主要有3个成本
首先，内核需要存储该进程的所有CPU寄存器的内容，然后恢复另一个进程的值。
内核还需要将CPU的映射从虚拟内存刷新到物理内存，因为这些映射仅对当前进程有效。
最后是操作系统上下文切换的成本，以及调度器选择下一个进程使用CPU资源的开销。

Processor registers ：寄存器

There are a surprising number of registers in a modern processor. I have difficulty fitting them on one slide, which should give you a clue how much time it takes to save and restore them.

Because a process switch can occur at any point in a process’ execution, the operating system needs to store the contents of all of these registers because it does not know which are currently in use.

clue ：线索，情节，为...提供线索
occur ：发生，举行，存在
execution ：实行，执行

现在的处理器中的寄存器的数量有一个惊人的数字，我很难为它们找一个合适的位置放在PPT上，我是想用这个比喻让你想象到保存和恢复它们需要多少时间。

因为在进程运行的过程中，进程的切换会发生在任何一个时刻，操作系统需要保存所有寄存器的内容，因为它不知道当前正在使用哪一种寄存器。

线程

This lead to the development of threads, which are conceptually the same as processes, but share the same memory space.

As threads share address space, they are lighter than processes so are faster to create and faster to switch between.

这个情况导致线程的开发。这是一种概念上个进程一样，但是多个线程可以共享内存空间。

当线程共享地址空间的时候，他们比进程更加轻量，可以更快的创建和切换。

Goroutines take the idea of threads a step further.

Goroutines are cooperatively scheduled, rather than relying on the kernel to manage their time sharing.

The switch between goroutines only happens at well defined points, when an explicit call is made to the Go runtime scheduler.

The compiler knows the registers which are in use and saves them automatically.

cooperatively ：合作的，共同的，协作的
relying ：依赖
defined ：清晰的adj；给...下定义，使明确v
explicit：详述的，明确的，清晰的

Goroutines 吸收并强化了线程的思想。

Goroutines 是协作调度，而不是靠内核来管理他们的分时操作。

当一个明确的调用被安排到GO的runtime调度器，goroutines之间的切换只发生在确定好的目标上。

编译器知道哪一个寄存器正在使用中，并自动保存。

While goroutines are cooperatively scheduled, this scheduling is handled for you by the runtime.

Places where Goroutines may yield to others are:

Channel send and receive operations, if those operations would block.
The Go statement, although there is no guarantee that new goroutine will be scheduled immediately.
Blocking syscalls like file and network operations.
After being stopped for a garbage collection cycle.

block ：块；阻塞；成批的
statement：声明，陈述
guarantee ：保证

虽然goroutines是协作调度，此调度的工作依赖的是runtime。

这里是goroutines可能会迭代的几点：

通道发送和接收操作，如果这些操作阻塞的话。
go声明，尽管不能保证新的goroutine将立即被调度。
系统调用阻塞，如文件读取和网络操作等。
在停止垃圾收集循环后。

This an example to illustrate some of the scheduling points described in the previous slide.

The thread, depicted by the arrow, starts on the left in the ReadFile function. It encounters os.Open, which blocks the thread while waiting for the file operation to complete, so the scheduler switches the thread to the goroutine on the right hand side.

Execution continues until the read from the c chan blocks, and by this time the os.Open call has completed so the scheduler switches the thread back the left hand side and continues to the file.Read function, which again blocks on file IO.

The scheduler switches the thread back to the right hand side for another channel operation, which has unblocked during the time the left hand side was running, but it blocks again on the channel send.

Finally the thread switches back to the left hand side as the Read operation has completed and data is available.

illustrate ：给...加插图，说明，阐明，表明
previous ：以前的
depicted ：描绘
arrow ：箭头
encounters ：遇到，遭遇
execution ：执行，实行，依法处决

这个例子演示上一张幻灯片中描述的一些调度点。
箭头所示的线程从ReadFile函数的左侧开始。它遇到os.open，在等待文件操作完成时阻塞线程，因此调度器将线程切换到右侧的goroutine。
一直到从C chan遇到阻塞，ReadFile的执行将继续，此时OS.open调用已完成，因此调度器将线程切换回左侧，并继续执行ReadFile函数，该函数再次遇到文件IO阻塞。
调度器将线程切换回右侧以进行另一个通道操作，在左侧运行期间该线程已被解除阻塞，但它在通道发送时再次阻塞。
最后，当读取操作完成且数据可用时，线程切换回左侧。

This slide shows the low level runtime.Syscall function which is the base for all functions in the os package.

Any time your code results in a call to the operating system, it will go through this function.

The call to entersyscall informs the runtime that this thread is about to block.

This allows the runtime to spin up a new thread which will service other goroutines while this current thread blocked.

This results in relatively few operating system threads per Go process, with the Go runtime taking care of assigning a runnable Goroutine to a free operating system thread.

informs ：通知
spin ：快速旋转
relatively ：相对的，比较而言
assigning ：分配

这张幻灯片展示了底层runtime.Syscall函数，它是os包中所有函数的基础。
只要你的代码调用操作系统，就要遇到这个函数。
对entersyscall的调用通知runtime该线程即将阻塞。
这允许runtime启动一个新线程，该线程将在当前线程被阻塞时为其他goroutine提供服务。
这导致每个Go进程的占用的操作系统线程相对较少，Go运行时，负责将可运行的goroutine分配给空闲的操作系统线程。

五、Segment And Copyings Stack

In the previous section I discussed how goroutines reduce the overhead of managing many, sometimes hundreds of thousands of concurrent threads of execution.

There is another side to the goroutine story, and that is stack management, which leads me to my final topic.

segment ：段，指占用数据文件空闲
segmented ：划分的，分割的
copying ：复制
segmented and copying stacks ：栈的划分和复制

在上一节中，我讨论了Goroutines如何降低大量，有时是成千上万个并发执行线程的开销的解决方案。

关于goroutine的故事还有另一个方面，那就是栈管理，它将引导我进入最后一个主题。

This is a diagram of the memory layout of a process. The key thing we are interested is the location of the heap and the stack.

Traditionally inside the address space of a process, the heap is at the bottom of memory, just above the program (text) and grows upwards.

The stack is located at the top of the virtual address space, and grows downwards.

diagram ：图表
layout ：布局，安排，设计，陈列
Traditionally ：传统上，传说上
located ：位于
virtual ：虚拟的

这是一个进程的内存布局图。我们感兴趣的关键点是堆和栈的位置。
传统上，在进程的地址空间内，堆位于内存的底部，刚好在程序（文本）的上方，并向上增长。
堆栈位于虚拟地址空间的顶部，并向下增长。

Because the heap and stack overwriting each other would be catastrophic, the operating system usually arranges to place an area of unwritable memory between the stack and the heap to ensure that if they did collide, the program will abort.

This is called a guard page, and effectively limits the stack size of a process, usually in the order of several megabytes.

guard ：保护，控制，警戒
guard page ：保护页
catastrophic ：/ˌkætəˈstrɔfɪk/ 灾难性的
arrange ：排列，安排，整理
collide ：碰撞，抵触
abort ：流产，夭折，使终止，终止计划

由于堆和堆栈相互覆盖将是灾难性的，操作系统通常会安排在堆栈和堆之间放置一个不可写内存区域，以确保如果它们发生冲突，程序将中止。
这被称为保护页，并有效地限制了进程的堆栈大小，通常以几兆字节的顺序排列。

下面的幻灯片是线程的栈和保护页

We’ve discussed that threads share the same address space, so for each thread, it must have its own stack.

Because it is hard to predict the stack requirements of a particular thread, a large amount of memory is reserved for each thread’s stack along with a guard page.

The hope is that this is more than will ever be needed and the guard page will never be hit.

The downside is that as the number of threads in your program increases, the amount of available address space is reduced.

predict ：预报
particular ：特定的
reserved ：保留的
increases ：增加

我们已经讨论过线程共享相同的地址空间，因此对于每个线程，它必须有自己的栈。由于很难预测特定线程的栈的大小的需求，因此为每个线程的栈以及保护页预留了大量内存。希望的是比以往任何时候都需要的多，并且永远不会击中保护页面。缺点是，随着程序中线程数的增加，可用地址空间的数量会减少。

We’ve seen that the Go runtime schedules a large number of goroutines onto a small number of threads, but what about the stack requirements of those goroutines ?

Instead of using guard pages, the Go compiler inserts a check as part of every function call to check if there is sufficient stack for the function to run. If there is not, the runtime can allocate more stack space.

Because of this check, a goroutines initial stack can be made much smaller, which in turn permits Go programmers to treat goroutines as cheap resources.

sufficient ：足够的
initial ：最初的
permit ：允许
treat ：招待，对待

我们已经了解了Go 的runtime将大量goroutines调度到少量线程上，但是这些goroutines的栈需求如何？
Go编译器不使用保护页，而是在每个函数调用中插入一个检查，以检查是否有足够的堆栈供函数运行。如果没有，runtime就分配更多的栈空间。
由于这种检查，goroutines的初始堆栈可以变得更小，反之，将允许go程序员将goroutines视为廉价资源，不会因为过多的使用，担心资源被耗尽。

This is a slide that shows how stacks are managed in Go 1.2.

When G calls to H there is not enough space for H to run, so the runtime allocates a new stack frame from the heap, then runs H on that new stack segment. When H returns, the stack area is returned to the heap before returning to G.

frame ：框架，结构
stack frame ：栈帧

这张幻灯片为我们演示如何在Go 1.2中管理栈。
当g调用h时，h没有足够的空间运行，所以runtime从堆中分配一个新的栈帧，然后在新的栈段上运行h。当h返回时，这段栈空间在返回到g之前返回到堆。

This method of managing the stack works well in general, but for certain types of code, usually recursive code, it can cause the inner loop of your program to straddle one of these stack boundaries.

For example, in the inner loop of your program, function G may call H many times in a loop,

Each time this will cause a stack split. This is known as the hot split problem.

in general ：一般而言
certain types ：某些类型
recursive ：递归的
canse ：引起，使遭受
straddle ：跨坐
boundaries ：边界

这种管理堆栈的方法通常工作得很好，但是对于某些类型的代码，通常是递归代码，它可能会导致程序的内部循环跨越他们其中一个的栈边界。
例如，在程序的内部循环中，函数g可以在一个循环中多次调用h，每次这样都会导致堆栈拆分。这就是所谓的热拆分问题。

To solve hot splits, Go 1.3 has adopted a new stack management method.

Instead of adding and removing additional stack segments, if the stack of a goroutine is too small, a new, larger, stack will be allocated.

The old stack’s contents are copied to the new stack, then the goroutine continues with its new larger stack.

After the first call to H the stack will be large enough that the check for available stack space will always succeed.

This resolves the hot split problem.

为了解决热拆分问题，Go1.3采用了一种新的栈管理方法。
如果goroutine的栈太小，将分配一个更大的新堆栈，而不是添加和删除其他堆栈段。
旧堆栈的内容被复制到新堆栈，然后goroutine继续其新的较大堆栈。
在对h的第一次调用之后，堆栈将足够大，以便在检查的时候，对于可用的堆栈空间始终成功通过检查。
这解决了热剥离问题。不会不停的拆分和分配空间了。

六、总结

Values, Inlining, Escape Analysis, Goroutines, and segmented/copying stacks.

These are the five features that I chose to speak about today, but they are by no means the only things that makes Go a fast programming language, just as there more that three reasons that people cite as their reason to learn Go.

As powerful as these five features are individually, they do not exist in isolation.

For example, the way the runtime multiplexes goroutines onto threads would not be nearly as efficient without growable stacks.

Inlining reduces the cost of the stack size check by combining smaller functions into larger ones.

Escape analysis reduces the pressure on the garbage collector by automatically moving allocations from the heap to the stack.

Escape analysis is also provides better cache locality.

Without growable stacks, escape analysis might place too much pressure on the stack.

cite ：引用，想起
individually ：分别的，个别的
isolation ：隔离，与世隔绝
multiplex ：多元的
pressure ：压力
garbage ：垃圾
allocation ：分配，配给

这些是我今天选择谈论的五个特性，但并不意味着它们是go成为一种快速编程语言的唯一原因，正如人们引用的三个以上的原因作为学习go的理由一样。
尽管这五个特性各自都很强大，但它们并不孤立地存在。
例如，如果没有可增长的栈，runtime将goroutine多路复用到线程上的方法就没有那么有效了。
通过将较小的函数组合为较大的函数，内联降低了堆栈大小检查的成本。
逃逸分析通过自动将分配从堆移动到堆栈来降低GC的压力。
逃逸分析也提供了更好的缓存位置。
如果没有可增长的栈，逃逸分析可能会对栈施加过大的压力。

* Thank you to the Gocon organisers for permitting me to speak today
* twitter / web / email details
* thanks to @offbymany, @billkennedy_go, and Minux for their assistance in preparing this talk.

posted @ 2019-05-25 22:52 游小刀阅读(2228) 评论(0) 编辑收藏举报

刷新页面返回顶部

游小刀

你的意志是什么？在无数次的碰壁之后，你还会紧紧握住他吗

Five things that make Go fast-渣渣翻译-让GO语言更快的5个原因

一、变量

二、内联函数

三、逃逸分析

四、Goroutines

五、Segment And Copyings Stack

六、总结

公告

游小刀

你的意志是什么？在无数次的碰壁之后，你还会紧紧握住他吗

Five things that make Go fast-渣渣翻译-让GO语言更快的5个原因

一、变量

二、内联函数

三、逃逸分析

四、Goroutines

五、Segment And Copyings Stack

六、总结

Related posts:

公告