string 转 byte 之零拷贝

零拷贝

项目之余，翻看Gin 的源码，看到下面两个函数，都不需要进行拷贝。那么下来我们就来分析下它是如何实现的。

1、基本数据结构

// StringToBytes converts string to byte slice without a memory allocation.
func StringToBytes(s string) (b []byte) {
	sh := *(*reflect.StringHeader)(unsafe.Pointer(&s))
	bh := (*reflect.SliceHeader)(unsafe.Pointer(&b))
	bh.Data, bh.Len, bh.Cap = sh.Data, sh.Len, sh.Len
	return b
}

// BytesToString converts byte slice to string without a memory allocation.
func BytesToString(b []byte) string {
	return *(*string)(unsafe.Pointer(&b))
}

我们看下byte 和 string的运行时的数据结构表示

// StringHeader is the runtime representation of a string.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type StringHeader struct {
	Data uintptr
	Len  int
}

// SliceHeader is the runtime representation of a slice.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
	Data uintptr
	Len  int
	Cap  int
}

以上结构可以参考：/go/src/reflect/value.go

其中 Data 的数据类型是 uintptr, uintptr 是一个整数类型，它大到足以容纳任何指针

// uintptr is an integer type that is large enough to hold the bit pattern of
// any pointer.
type uintptr uintptr

由于string 和 byte 的数据类型比较接近，只是byte 的SliceHeader中多了一个Cap 的表示。

string 在go中是ReadOnly 的。他和Slice不同的地方在于，Slice可以动态的增长。这也就是Slice多了一个Cap属性的缘故。

2、性能差异

var x = "hello world"

// BenchmarkBytesToString 零拷贝
func BenchmarkBytesToString(b *testing.B) {
   for i := 0; i <= b.N; i++ {
      _ = StringToBytes(x)
   }
}

//BenchmarkBytesToStringNormal 原生
func BenchmarkBytesToStringNormal(b *testing.B) {
   for i := 0; i <= b.N; i++ {
      _ = []byte(x)
   }
}

用以上测试用例，来测试两种方式的执行效率，如下是执行效率：

go test -bench="." -benchmem

BenchmarkBytesToString-8                1000000000               0.3072 ns/op          0 B/op          0 allocs/op
BenchmarkBytesToStringNormal-8          238276338                5.011 ns/op           0 B/op          0 allocs/op
PASS
ok      hello/cmd       2.944s

使用强制类型转换的方式，每次操作，耗费0.3072 纳秒，没有内存分配

使用系统的[]byte() 方式，每次操作，耗费5.011 纳秒 , 没有内存分配

这时候就会产生一个疑问，为什么系统的string to byte 为什么也没有产生内存分配？我们看下go的源码处理

// The constant is known to the compiler.
// There is no fundamental theory behind this number.
const tmpStringBufSize = 32

type tmpBuf [tmpStringBufSize]byte

func stringtoslicebyte(buf *tmpBuf, s string) []byte {
	var b []byte
	if buf != nil && len(s) <= len(buf) {
		*buf = tmpBuf{}
		b = buf[:len(s)]
	} else {
		b = rawbyteslice(len(s))
	}
	copy(b, s)
	return b
}


// rawbyteslice allocates a new byte slice. The byte slice is not zeroed.
func rawbyteslice(size int) (b []byte) {
	cap := roundupsize(uintptr(size))
	p := mallocgc(cap, nil, false)
	if cap != uintptr(size) {
		memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size))
	}

	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(cap)}
	return
}

从以上代码可以看出，在进行类型转换的时候，runtime 会先进性一个长度判断，如果长度小于len(buf)，也就是tmpStringBufSize 的长度，直接进行copy(b, s), 当前tmpStringBufSize 的长度是32。当字符串长度大于32 时，就会进行内存申请，继续调用rawbyteslice 函数。按照字符串的长度，计算需要申请的内存大小。所以当字符串过大，然后进行类型转换，就会造成频繁的内存申请，性能就会下降。

下面我们使用超大的字符串来测试下，再看下测试结果

var x = `hello world
hello world hello worldhello worldhello world
hello worldhello worldhello worldhello world
hello worldhello worldhello worldhello worldhello
`

BenchmarkBytesToString-8                1000000000               0.3075 ns/op          0 B/op          0 allocs/op
BenchmarkBytesToStringNormal-8          20812899                49.95 ns/op          160 B/op          1 allocs/op

可以看出,性能下降了差不多有十倍，而强制的类型转换则没有明显变化。

3、多问一个为什么？

既然强制类型转换性能这么高，为什么大家经常用的都是系统内置的方法呢？或者为什么不把系统内置的替换成强制类型转换呢？我们看如下的代码：

// 方式一
func main() {
	b := []byte("hello")
	b[1] = 'S'
	fmt.Println(string(b))
}

// 方式二
func main() {
	defer func() {
		err := recover()
		if err != nil {
			log.Println(err)
		}
	}()
	x := "hello"
	b := StringToBytes(x)
	b[1] = 'S'
	fmt.Println(x)
}

前文提过，go 的string 默认是只读的。这在很多语言中都是类似的设定，比如近几年比较火的Rust。go的协程是这个语言的亮点，可以有效的减少锁的竞争。

使用强制类型转换，由于byte 共享的还是 string 的指针，指向的内存是只读的，所以变更其内容是非法的。会触发程序的panic,并且不能通过recover 来恢复。

所以内置的转换是兼容了性能和安全性的一个方式，降低了使用者的心智负担。对于性能敏感的应用，比如网关。或者需要进行类型频繁转换的地方，就可以使用这种方式。

4、继续深挖一下？

go语言中，这里所说的零拷贝，和操作系统中的零拷贝是一回事吗？为什么需要零拷贝，操作系统中的零拷贝是如何实现的？有哪些软件和产品使用到了零拷贝，带来了性能的提升，本来计划在这篇文章中介绍下，限于篇幅，还是等下一篇继续介绍啦。

posted @ 2022-03-07 10:39 roverliang 阅读(722) 评论(1) 编辑收藏举报

刷新页面返回顶部

roverliang

代码如诗，生活如虹

string 转 byte 之零拷贝

零拷贝

1、基本数据结构

2、性能差异

3、多问一个为什么？

4、继续深挖一下？