GC优化：栈内存、span、NativeMemory、指针、池化内存笔记

struct 与内存指针互转

其实，网上有多种方法，实测效率差别较大，有个项目对性能极其敏感，因此反复研究测试，得出了最高效率的方式。

先定义 struct 数据结构

[DebuggerDisplay("NameLength = {NameLength}, NodeIndex = {NodeIndex}, ParentNodeIndex = {ParentNodeIndex}, CreationTime = {StandardInformation.CreationTime}")]
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Unicode)]
internal unsafe struct FileEntryNode : IEquatable<FileEntryNode> {
    internal const byte MaxLength = 20;
    internal const byte ExtensionNameMaxLength = 10;

    // [FieldOffset(0)]
    internal readonly Attributes Attributes;
    // [FieldOffset(4)] 
    internal readonly UInt32 NodeIndex;
    // [FieldOffset(8)] 
    internal readonly UInt32 ParentNodeIndex;
    // [FieldOffset(16)] 
    internal readonly UInt64 Size;
    // [FieldOffset(32)]  
    internal readonly StandardInformation StandardInformation;
    internal readonly byte LogicalStatus;
    internal byte NameLength;
    internal readonly int NameOffset;
    internal readonly int ExtensionNameIndex;
    internal readonly byte ParentCount;
    public readonly int PathLength;
}

从内存指针位置转为 struct 对象

经过多种测试，这个方法是最高效的

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public FileEntryNode* ReadNodePointerByPosition(long position){
        Debug.Assert(position < Size);
        // A void* variable then add a cast is required, direct convert not work: FileEntryNode* value = (FileEntryNode*)this.Pointer + position;
        void* ptr = this.Pointer + position;
        FileEntryNode* value = (FileEntryNode*)ptr;
        return value;
    }

如果不是以方法返回值形式返回结果而是也 out 输出参数形式，则这样效率最好：

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Read(in long position, out T value){
        Debug.Assert(position < Size);
        if (ReadUseAccessor) {
            Accessor.Read(position, out value);
        }
        else{
            var valueSize = EntrySize;
            byte* ptr = this.Pointer + position;
            value = default(T);
            var valuePtr = Unsafe.AsPointer(ref value);
            Buffer.MemoryCopy(ptr, valuePtr, valueSize, valueSize);
            // Unsafe.CopyBlockUnaligned(valuePtr, ptr, valueSize);
        }
    }

这个方法等同效果，但效率不如第一种。

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public PinyinFullNode ReadNode(in long position){
    Debug.Assert(position < Size);
    var valueSize = EntrySize;
    byte* ptr = this.Pointer + position;
    PinyinFullNode value = default(PinyinFullNode);
    var valuePtr = Unsafe.AsPointer(ref value);
    Buffer.MemoryCopy(ptr, valuePtr, valueSize, valueSize);
    return value;
}

于此同时，如果你还有一个变长数据（就是长度不固定的），那么先用上面的方法，把长度记录到主体 struct，得到主题 struct 后，把长度带入进行转换。
经过测试，下面的代码效率是最高的：

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public ReadOnlySpan<char> ReadString(in uint offset, in int length)
{
    void* pointer = Pointer + offset;
    var fileName = new ReadOnlySpan<char>(pointer, length);
    return fileName;
}

以上是我反复研究测试得出的最高效率的代码，那么低效的方法是什么样的？
大概这样，网上去搜，基本都是这样的低效代码：

// converts byte[] to struct
public static T RawDeserialize(byte[] rawData, int position) {
	int rawsize = Marshal.SizeOf(typeof(T));
	if (rawsize > rawData.Length - position) throw new ArgumentException("Not enough data to fill struct. Array length from position: " + (rawData.Length - position) + ", Struct length: " + rawsize);
	IntPtr buffer = Marshal.AllocHGlobal(rawsize);
	Marshal.Copy(rawData, position, buffer, rawsize);
	T retobj = (T)Marshal.PtrToStructure(buffer, typeof(T));
	Marshal.FreeHGlobal(buffer);
	return retobj;
}

// converts a struct to byte[]
public static byte[] RawSerialize(object anything) {
	int rawSize = Marshal.SizeOf(anything);
	IntPtr buffer = Marshal.AllocHGlobal(rawSize);
	Marshal.StructureToPtr(anything, buffer, false);
	byte[] rawDatas = new byte[rawSize];
	Marshal.Copy(buffer, rawDatas, 0, rawSize);
	Marshal.FreeHGlobal(buffer);
	return rawDatas;
}

把 struct 写到内存指针

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Write(in long position, ref T value) {
        Debug.Assert(position < Size);
        if (WriteUseAccessor) {
            this.Accessor.Write(position, ref value);
        }
        else {
            var valueSize = EntrySize;
            byte* ptr = this.Pointer + position;
            var valuePtr = Unsafe.AsPointer(ref value);
            // Unsafe.CopyBlockUnaligned(ptr, valuePtr, (uint) valueSize);
            Buffer.MemoryCopy(valuePtr, ptr, valueSize, valueSize);
        }
    }

stackalloc

使用栈内存，减少GC压力

var wordMatchCounts = stackalloc float[wordCount];

Span

Span 支持 reinterpret_cast 的理念，即可以将 Span<byte> 强制转换为 Span<int>

Span 支持 reinterpret_cast 的理念，即可以将 Span<byte> 强制转换为 Span<int>（其中，Span<int> 中的索引 0 映射到 Span<byte> 的前四个字节）。这样一来，如果读取字节缓冲区，可以安全高效地将它传递到对分组字节（视作整数）执行操作的方法。

Span 也能装在集合之ValueListBuilder & .AsSpan()

.NET 内部提升性能对象：ValueListBuilder & .AsSpan()

ValueListBuilder & .AsSpan()

.NET Core 源码中的内部提升性能对象：ValueListBuilder & .AsSpan()

它在 String.Replace 中被使用

public unsafe string Replace(string oldValue, string? newValue) {
	  ArgumentException.ThrowIfNullOrEmpty(oldValue, nameof (oldValue));
	  if (newValue == null)	newValue = string.Empty;
	  // ISSUE: untyped stack allocation
	  ValueListBuilder<int> valueListBuilder = new ValueListBuilder<int>(new Span<int>((void*) __untypedstackalloc(new IntPtr(512)), 128));

	  if (oldValue.Length == 1)
	  {
		if (newValue.Length == 1)
		  return this.Replace(oldValue[0], newValue[0]);
		char ch = oldValue[0];
		int elementOffset = 0;

		while (true)
		{
		  int num = SpanHelpers.IndexOf(ref Unsafe.Add<char>(ref this._firstChar, elementOffset), ch, this.Length - elementOffset);
		  if (num >= 0){
			valueListBuilder.Append(elementOffset + num);
			elementOffset += num + 1;
		  }
		  else break;
		}
	  }
	  else{
		int elementOffset = 0;
		while (true){
		  int num = SpanHelpers.IndexOf(ref Unsafe.Add<char>(ref this._firstChar, elementOffset), this.Length - elementOffset, ref oldValue._firstChar, oldValue.Length);
		  if (num >= 0){
			valueListBuilder.Append(elementOffset + num);
			elementOffset += num + oldValue.Length;
		  }
		  else break;
		}
	  }
	  if (valueListBuilder.Length == 0) eturn this;
	  string str = this.ReplaceHelper(oldValue.Length, newValue, **valueListBuilder.AsSpan()**);
	  valueListBuilder.Dispose();
	  return str;
	}

.NET 内部类直接将集合转回为 Span<T>：CollectionsMarshal.AsSpan<string>(List<string>)

	private static unsafe string JoinCore<T>(ReadOnlySpan<char> separator, IEnumerable<T> values){

	  if (typeof (T) == typeof (string)){
		if (values is List<string> list)
		  return string.JoinCore(separator, (ReadOnlySpan<string>) CollectionsMarshal.AsSpan<string>(list));
		if (values is string[] array)
		  return string.JoinCore(separator, new ReadOnlySpan<string>(array));
	  }

ref struct，使用ref读取值类型，避免值类型拷贝

使用ref读取值类型，避免值类型拷贝，但要注意对当前值类型的修改，会影响被ref的那个值类型，因为本质上你在操作一个指针

ref var hierarchy = ref ph[i];
ref var words = ref hierarchy.Words;

Unsafe.IsNullRef

可以使用 Unsafe.IsNullRef 来判断一个 ref 是否为空。如果用户没有对 Foo.X 进行初始化，则默认是空引用：

ref  struct Foo {
  public  ref  int X;
  public  bool IsNull => Unsafe.IsNullRef(ref X);
  public  Foo(ref  int x) { X = ref x; }
}

NativeMemory

相比 Marshal.AllocHGlobal 和 Marshal.FreeHGlobal，其实现在更推荐 NativeMemory.*，有诸多好处：

支持控制是否零初始化
支持控制内存对齐
参数是 nuint 类型，支持在 64 位进程上支持分配超过 int 上限的大小

string.Join 内部实现解析

CollectionsMarshal.AsSpan(valuesList)

if (values is List<string?> valuesList) {
    return JoinCore(separator.AsSpan(), CollectionsMarshal.AsSpan(valuesList));
}
if (values is string?[] valuesArray)
{
    return JoinCore(separator.AsSpan(), new ReadOnlySpan<string?>(valuesArray));
}

Join

public static string Join(string? separator, IEnumerable<string?> values)
{
	if (values is List<string?> valuesList)
	{
		return JoinCore(separator.AsSpan(), CollectionsMarshal.AsSpan(valuesList));
	}
	if (values is string?[] valuesArray)
	{
		return JoinCore(separator.AsSpan(), new ReadOnlySpan<string?>(valuesArray));
	}
	if (values == null)
	{
		ThrowHelper.ThrowArgumentNullException(ExceptionArgument.values);
	}
	using (IEnumerator<string?> en = values.GetEnumerator())
	{
		if (!en.MoveNext())
		{
			return Empty;
		}
		string? firstValue = en.Current;
		if (!en.MoveNext())
		{
			// Only one value available
			return firstValue ?? Empty;
		}
		// Null separator and values are handled by the StringBuilder
		var result = new ValueStringBuilder(stackalloc char[256]);
		result.Append(firstValue);
		do
		{
			result.Append(separator);
			result.Append(en.Current);
		}
		while (en.MoveNext());
		return result.ToString();
	}
}

JoinCore

private static string JoinCore(ReadOnlySpan<char> separator, ReadOnlySpan<string?> values)
{
	if (values.Length <= 1)
	{
		return values.IsEmpty ?
			Empty :
			values[0] ?? Empty;
	}

	long totalSeparatorsLength = (long)(values.Length - 1) * separator.Length;
	if (totalSeparatorsLength > int.MaxValue)
	{
		ThrowHelper.ThrowOutOfMemoryException();
	}
	int totalLength = (int)totalSeparatorsLength;

	// Calculate the length of the resultant string so we know how much space to allocate.
	foreach (string? value in values)
	{
		if (value != null)
		{
			totalLength += value.Length;
			if (totalLength < 0) // Check for overflow
			{
				ThrowHelper.ThrowOutOfMemoryException();
			}
		}
	}

	// Copy each of the strings into the result buffer, interleaving with the separator.
	string result = FastAllocateString(totalLength);
	int copiedLength = 0;

	for (int i = 0; i < values.Length; i++)
	{
		// It's possible that another thread may have mutated the input array
		// such that our second read of an index will not be the same string
		// we got during the first read.

		// We range check again to avoid buffer overflows if this happens.

		if (values[i] is string value)
		{
			int valueLen = value.Length;
			if (valueLen > totalLength - copiedLength)
			{
				copiedLength = -1;
				break;
			}

			// Fill in the value.
			FillStringChecked(result, copiedLength, value);
			copiedLength += valueLen;
		}

		if (i < values.Length - 1)
		{
			// Fill in the separator.
			// Special-case length 1 to avoid additional overheads of CopyTo.
			// This is common due to the char separator overload.

			ref char dest = ref Unsafe.Add(ref result._firstChar, copiedLength);

			if (separator.Length == 1)
			{
				dest = separator[0];
			}
			else
			{
				separator.CopyTo(new Span<char>(ref dest, separator.Length));
			}

			copiedLength += separator.Length;
		}
	}

	// If we copied exactly the right amount, return the new string.  Otherwise,
	// something changed concurrently to mutate the input array: fall back to
	// doing the concatenation again, but this time with a defensive copy. This
	// fall back should be extremely rare.
	return copiedLength == totalLength ?
		result :
		JoinCore(separator, values.ToArray().AsSpan());
}

.NET Core 内部提升性能对象

ValueStringBuilder

分析 string.Join 源码能发现，它内部使用了一个非公开的 ValueStringBuilder，可以在构造它时指定使用栈内存或池化内存，降低 GC 压力和内存开销。

// Null separator and values are handled by the StringBuilder
var result = new ValueStringBuilder(stackalloc char[256]);
result.Append(firstValue);
do {
    result.Append(separator);
    result.Append(en.Current);
}
while (en.MoveNext());
return result.ToString();

对象复用

Array Pool 池化数组 & PooledList 池化集合

Array Pool 会在线程槽上，创建共享的数组池，需要数组时，去 Array Pool 取得，不用每次你创建数值导致频繁的内存分配，进而减轻 GC 次数。

PooledList 是一个三方库，它其实就是实现不直接 new Array 存储数据，而是使用 Array Pool 里的 Array 来存储数据。
然后通过 using 再结束使用后归还给 Array Pool，如果忘记归还，会通过终结器归还。

posted @ 2024-11-16 18:56 darklx 阅读(94) 评论(0) 收藏举报

刷新页面返回顶部

GC优化：栈内存、span、NativeMemory、指针、池化内存 笔记