victoriaMetrics之byteBuffer

victoriaMetrics之byteBuffer

victoriaMetrics之byteBuffer

VictoriaMetrics经常会处理数目庞大的指标，在处理的过程中会涉及指标的拷贝，如果在指标拷贝时都进行内存申请的话，其内存消耗和性能损耗都非常大。victoriaMetrics使用byteBuffer来复用内存，提升性能，其核心就是用了sync.pool。下面主要看下它是如何结合sync.pool运作的。

ByteBuffer的结构体如下，只包含一个切片：

type ByteBuffer struct {
	// B is the underlying byte slice.
	B []byte
}

ByteBufferPool的用法

为了复用ByteBuffer，victoriaMetrics用到了ByteBufferPool，与常见的sync.Pool用法相同，包含一个Get和一个Put函数。

// ByteBufferPool is a pool of ByteBuffers.
type ByteBufferPool struct {
	p sync.Pool
}

// Get obtains a ByteBuffer from bbp.
func (bbp *ByteBufferPool) Get() *ByteBuffer {
	bbv := bbp.p.Get()
	if bbv == nil {
		return &ByteBuffer{}
	}
	return bbv.(*ByteBuffer)
}

// Put puts bb into bbp.
func (bbp *ByteBufferPool) Put(bb *ByteBuffer) {
	bb.Reset()
	bbp.p.Put(bb)
}

Put函数用于将ByteBuffer返回给资源池，为了防止下次使用的时候出现无效数据，在返回给sync.Pool之前需要清空切片内存，其使用的Reset函数如下，bb.B = bb.B[:0]也是一种常见的清空切片内容的方式：

func (bb *ByteBuffer) Reset() {
	bb.B = bb.B[:0]
}

使用例子：

bb := bbPool.Get() // acquire buffer from pool
// perform decompressing in acquired buffer
bb.B, err = DecompressZSTD(bb.B[:0], src)
if err != nil {
    return nil, fmt.Errorf("cannot decompress: %w", err)
}
// unmarshal from temporary buffer to destination buffer
dst, err = unmarshalInt64NearestDelta(dst, bb.B)
bbPool.Put(bb) // release buffer to the pool, so it can be reused

leveledbytebufferpool

这是一个分级的bytebuffer池。这是为了解决sync.pool返回的buffer大小和所需大小不匹配的问题。例如vmagent抓取了两个targets的metrics：

target A暴露的指标数为100，而target B暴露的指标数为10000
如果不使用分级缓冲，当抓取tartget B的数据时，sync.pool可能会返回抓取target A时使用的内存，由于内存不足，还需要扩大内存，反而降低了处理性能。反过来，如果当抓取tartget A的数据时，sync.pool返回了抓取target B时使用的内存，则会浪费内存。

leveledbytebufferpool提供了如下12个级别的缓冲池，其中pools[0]为0～64 bytes长度的缓冲；pools[1]为65～128 bytes长度的缓冲；pools[3]为129～256 bytes长度的缓冲，以此类推。

注意：注释中有说明，当最大容量大于2^18时，就没有分级的必要，2^18对应的大小为256KB

// pools contains pools for byte slices of various capacities.
//
//	pools[0] is for capacities from 0 to 64，即0～2^6
//	pools[1] is for capacities from 65 to 128，即2^6+1～2^7
//	pools[2] is for capacities from 129 to 256，即2^7+1～2^8
//	...
//	pools[n] is for capacities from 2^(n+5)+1 to 2^(n+6)
//
// Limit the maximum capacity to 2^18, since there are no performance benefits
// in caching byte slices with bigger capacities.
var pools [12]sync.Pool

计算一个长度为size 字节的数据所在的缓冲池ID的方式如下：

func getPoolIDAndCapacity(size int) (int, int) {
    size--                     // 1
    if size < 0 {
       size = 0
    }
    size >>= 6                 // 2
    id := bits.Len(uint(size)) // 3
    if id >= len(pools) {
       id = len(pools) - 1
    }
    return id, (1 << (id + 6)) // 4
}

这主要是为了处理边缘场景，如64字节的数据，如果不减1，那么64>>6为1，导致将64字节的数据放到pools[1]中

其实这里不作处理，将64字节直接放到pools[1]，对实际影响并不大
缓冲池的id是以2的幂计数的，pools[0]的最大值为2^6，pools[1]的最大值为2^7，pools[2]的最大值为2^8...因此将size右移6位可以得到2^(pool_id)，在第三步中通过bits.Len(uint(size))可以获得pool_id。当然也可以修改为如下方式：
```
func getPoolIDAndCapacity(size int) (int, int) {
    size--
    if size < 0 {
       size = 0
    }

    id := bits.Len(uint(size)) - 6 //这里直接 -6
    if id >= len(pools) {
       id = len(pools) - 1
    }
    return id, (1 << (id + 6))
}
```
bits.Len可以计算出小于size，但最接近size的2的幂
最后返回缓冲池ID和缓冲的最大容量

使用方式如下：

func (sw *scrapeWork) scrape() {
    body := leveledbytebufferpool.Get(sw.previousResponseBodyLength)
    body.B = sw.ReadData(body.B[:0])
    sw.processScrapedData(body)
    leveledbytebufferpool.Put(body) //Put方法中会通过getPoolIDAndCapacity 获取合适大小的sync.pool
}

ByteBuffer

ByteBuffer实现了io.Writer和io.ReadFrom接口。

Writer接口实现

实现的write接口如下，比较简单，只是简单地将入参数据添加到byteBuffer中。在append的时候会增加切片的容量。

func (bb *ByteBuffer) Write(p []byte) (int, error) {
	bb.B = append(bb.B, p...)
	return len(p), nil
}

ReadFrom接口实现

从注释可以看出，ReadFrom的目的是从r读取所有的数据。

ReadFrom中比较有意思的是看它是如何预分配容量，以及在容量不足的情况下，如何进行扩容。ReadFrom中有两处扩容的地方：

一个是通过ResizeWithCopyMayOverallocate确保初始buffer不小于4*1024字节
另一处是在for循环中，为了能够读取所有的数据，需要确保有足够的buffer。每次buffer不足时，其容量会扩大30%

// ReadFrom reads all the data from r to bb until EOF.
func (bb *ByteBuffer) ReadFrom(r io.Reader) (int64, error) {
	b := bb.B
	bLen := len(b)                               //1
	b = ResizeWithCopyMayOverallocate(b, 4*1024) //2
	b = b[:cap(b)]                               //3
	offset := bLen                               //4
	for {
		if free := len(b) - offset; free < offset {//5
			// grow slice by 30% similar to how Go does this
			// https://go.googlesource.com/go/+/2dda92ff6f9f07eeb110ecbf0fc2d7a0ddd27f9d
			// higher growth rates could consume excessive memory when reading big amounts of data.
			n := 1.3 * float64(len(b))
			b = slicesutil.SetLength(b, int(n))
		}
		n, err := r.Read(b[offset:])               //6
		offset += n
		if err != nil {                            //7
			bb.B = b[:offset]
			if err == io.EOF {
				err = nil
			}
			return int64(offset - bLen), err
		}
	}
}

首先获取b的长度，表示切片中已有的数据长度

由于ByteBuffer可能来自ByteBufferPool.Get，因此，其切片容量可能无法满足数据读取的需要，此时用到了ResizeWithCopyMayOverallocate，ResizeWithCopyMayOverallocate确保切片的容量不小于n字节，如果容量足够，则返回长度为n的子切片，否则申请新的切片，并返回长度为n的子切片。roundToNearestPow2会找出最接近n的2的整倍数的数值，以此将容量扩大1倍，作为新切片的容量。

// ResizeNoCopyMayOverallocate resizes b to minimum n bytes and returns the resized buffer (which may be newly allocated).
//
// If newly allocated buffer is returned then b contents isn't copied to it.
func ResizeNoCopyMayOverallocate(b []byte, n int) []byte {
	if n <= cap(b) {
		return b[:n]
	}
	nNew := roundToNearestPow2(n)
	bNew := make([]byte, nNew)
	return bNew[:n]
}

// roundToNearestPow2 rounds n to the nearest power of 2
//
// It is expected that n > 0
func roundToNearestPow2(n int) int {
	pow2 := uint8(bits.Len(uint(n - 1)))
	return 1 << pow2
}

将b的长度等于容量
设置offset为b中已有的数据偏移量
获取剩余的容量free，如果剩余的容量不足一半(free < offset)，则将容量增加30%
将数据读取到offset之后的存储中，并增加偏移量
当Read操作返回错误时，将ByteBuffer中的切片长度设置为b，如果返回错误为EOF，则视为数据读取完成，返回读取到的数据量

如果无需从io.Reader中获取数据，也可以使用如下Write方法将数据写入buffer中。

package util

import (
	"sync"
)

// ByteBuffer implements a simple byte buffer.
type ByteBuffer struct {
	// B is the underlying byte slice.
	B []byte
}

// Reset resets bb.
func (bb *ByteBuffer) Reset() {
	bb.B = bb.B[:0]
}

// Resize resizes b to n bytes and returns b (which may be newly allocated).
func resize(b []byte, n int) []byte {
	if nn := n - cap(b); nn > 0 {
		b = append(b[:cap(b)], make([]byte, nn)...)
	}
	return b[:n]
}

// ReadFrom reads all the data from r to bb until EOF.
func (bb *ByteBuffer) Write(data []byte) {
	bb.B = resize(bb.B, len(data))
	copy(bb.B, data)
}

总结

后续可以使用该库来满足从io.Reader中读取数据，而不用担心buffer不足，借助ByteBufferPool可以有效地复用buffer。

posted @ 2022-04-06 23:10 charlieroro 阅读(365) 评论(0) 编辑收藏举报

刷新页面返回顶部

charlieroro

victoriaMetrics之byteBuffer