GC优化:栈内存、span、NativeMemory、指针、池化内存 笔记
struct 与内存指针互转
其实,网上有多种方法,实测效率差别较大,有个项目对性能极其敏感,因此反复研究测试,得出了最高效率的方式。
先定义 struct 数据结构
[DebuggerDisplay("NameLength = {NameLength}, NodeIndex = {NodeIndex}, ParentNodeIndex = {ParentNodeIndex}, CreationTime = {StandardInformation.CreationTime}")]
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Unicode)]
internal unsafe struct FileEntryNode : IEquatable<FileEntryNode> {
internal const byte MaxLength = 20;
internal const byte ExtensionNameMaxLength = 10;
// [FieldOffset(0)]
internal readonly Attributes Attributes;
// [FieldOffset(4)]
internal readonly UInt32 NodeIndex;
// [FieldOffset(8)]
internal readonly UInt32 ParentNodeIndex;
// [FieldOffset(16)]
internal readonly UInt64 Size;
// [FieldOffset(32)]
internal readonly StandardInformation StandardInformation;
internal readonly byte LogicalStatus;
internal byte NameLength;
internal readonly int NameOffset;
internal readonly int ExtensionNameIndex;
internal readonly byte ParentCount;
public readonly int PathLength;
}
从内存指针位置转为 struct 对象
经过多种测试,这个方法是最高效的
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public FileEntryNode* ReadNodePointerByPosition(long position){
Debug.Assert(position < Size);
// A void* variable then add a cast is required, direct convert not work: FileEntryNode* value = (FileEntryNode*)this.Pointer + position;
void* ptr = this.Pointer + position;
FileEntryNode* value = (FileEntryNode*)ptr;
return value;
}
如果不是以方法返回值形式返回结果而是也 out 输出参数形式,则这样效率最好:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Read(in long position, out T value){
Debug.Assert(position < Size);
if (ReadUseAccessor) {
Accessor.Read(position, out value);
}
else{
var valueSize = EntrySize;
byte* ptr = this.Pointer + position;
value = default(T);
var valuePtr = Unsafe.AsPointer(ref value);
Buffer.MemoryCopy(ptr, valuePtr, valueSize, valueSize);
// Unsafe.CopyBlockUnaligned(valuePtr, ptr, valueSize);
}
}
这个方法等同效果,但效率不如第一种。
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public PinyinFullNode ReadNode(in long position){
Debug.Assert(position < Size);
var valueSize = EntrySize;
byte* ptr = this.Pointer + position;
PinyinFullNode value = default(PinyinFullNode);
var valuePtr = Unsafe.AsPointer(ref value);
Buffer.MemoryCopy(ptr, valuePtr, valueSize, valueSize);
return value;
}
于此同时,如果你还有一个变长数据(就是长度不固定的),那么先用上面的方法,把长度记录到主体 struct,得到主题 struct 后,把长度带入进行转换。
经过测试,下面的代码效率是最高的:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public ReadOnlySpan<char> ReadString(in uint offset, in int length)
{
void* pointer = Pointer + offset;
var fileName = new ReadOnlySpan<char>(pointer, length);
return fileName;
}
以上是我反复研究测试得出的最高效率的代码,那么低效的方法是什么样的?
大概这样,网上去搜,基本都是这样的低效代码:
// converts byte[] to struct
public static T RawDeserialize(byte[] rawData, int position) {
int rawsize = Marshal.SizeOf(typeof(T));
if (rawsize > rawData.Length - position) throw new ArgumentException("Not enough data to fill struct. Array length from position: " + (rawData.Length - position) + ", Struct length: " + rawsize);
IntPtr buffer = Marshal.AllocHGlobal(rawsize);
Marshal.Copy(rawData, position, buffer, rawsize);
T retobj = (T)Marshal.PtrToStructure(buffer, typeof(T));
Marshal.FreeHGlobal(buffer);
return retobj;
}
// converts a struct to byte[]
public static byte[] RawSerialize(object anything) {
int rawSize = Marshal.SizeOf(anything);
IntPtr buffer = Marshal.AllocHGlobal(rawSize);
Marshal.StructureToPtr(anything, buffer, false);
byte[] rawDatas = new byte[rawSize];
Marshal.Copy(buffer, rawDatas, 0, rawSize);
Marshal.FreeHGlobal(buffer);
return rawDatas;
}
把 struct 写到内存指针
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Write(in long position, ref T value) {
Debug.Assert(position < Size);
if (WriteUseAccessor) {
this.Accessor.Write(position, ref value);
}
else {
var valueSize = EntrySize;
byte* ptr = this.Pointer + position;
var valuePtr = Unsafe.AsPointer(ref value);
// Unsafe.CopyBlockUnaligned(ptr, valuePtr, (uint) valueSize);
Buffer.MemoryCopy(valuePtr, ptr, valueSize, valueSize);
}
}
stackalloc
使用栈内存,减少GC压力
var wordMatchCounts = stackalloc float[wordCount];
Span
Span 支持 reinterpret_cast 的理念,即可以将 Span 强制转换为 Span
Span 支持 reinterpret_cast 的理念,即可以将 Span
Span 也能装在集合之ValueListBuilder & .AsSpan()
.NET 内部提升性能对象:ValueListBuilder & .AsSpan()
ValueListBuilder & .AsSpan()
.NET Core 源码中的内部提升性能对象:ValueListBuilder & .AsSpan()
它在 String.Replace 中被使用
public unsafe string Replace(string oldValue, string? newValue) {
ArgumentException.ThrowIfNullOrEmpty(oldValue, nameof (oldValue));
if (newValue == null) newValue = string.Empty;
// ISSUE: untyped stack allocation
ValueListBuilder<int> valueListBuilder = new ValueListBuilder<int>(new Span<int>((void*) __untypedstackalloc(new IntPtr(512)), 128));
if (oldValue.Length == 1)
{
if (newValue.Length == 1)
return this.Replace(oldValue[0], newValue[0]);
char ch = oldValue[0];
int elementOffset = 0;
while (true)
{
int num = SpanHelpers.IndexOf(ref Unsafe.Add<char>(ref this._firstChar, elementOffset), ch, this.Length - elementOffset);
if (num >= 0){
valueListBuilder.Append(elementOffset + num);
elementOffset += num + 1;
}
else break;
}
}
else{
int elementOffset = 0;
while (true){
int num = SpanHelpers.IndexOf(ref Unsafe.Add<char>(ref this._firstChar, elementOffset), this.Length - elementOffset, ref oldValue._firstChar, oldValue.Length);
if (num >= 0){
valueListBuilder.Append(elementOffset + num);
elementOffset += num + oldValue.Length;
}
else break;
}
}
if (valueListBuilder.Length == 0) eturn this;
string str = this.ReplaceHelper(oldValue.Length, newValue, **valueListBuilder.AsSpan()**);
valueListBuilder.Dispose();
return str;
}
.NET 内部类直接将集合转回为 Span<T>:CollectionsMarshal.AsSpan<string>(List<string>)
private static unsafe string JoinCore<T>(ReadOnlySpan<char> separator, IEnumerable<T> values){
if (typeof (T) == typeof (string)){
if (values is List<string> list)
return string.JoinCore(separator, (ReadOnlySpan<string>) CollectionsMarshal.AsSpan<string>(list));
if (values is string[] array)
return string.JoinCore(separator, new ReadOnlySpan<string>(array));
}
ref struct,使用ref读取值类型,避免值类型拷贝
使用ref读取值类型,避免值类型拷贝,但要注意对当前值类型的修改,会影响被ref的那个值类型,因为本质上你在操作一个指针
ref var hierarchy = ref ph[i];
ref var words = ref hierarchy.Words;
Unsafe.IsNullRef
可以使用 Unsafe.IsNullRef
来判断一个 ref
是否为空。如果用户没有对 Foo.X
进行初始化,则默认是空引用:
ref struct Foo {
public ref int X;
public bool IsNull => Unsafe.IsNullRef(ref X);
public Foo(ref int x) { X = ref x; }
}
NativeMemory
相比 Marshal.AllocHGlobal 和 Marshal.FreeHGlobal,其实现在更推荐 NativeMemory.*,有诸多好处:
-
支持控制是否零初始化
-
支持控制内存对齐
-
参数是 nuint 类型,支持在 64 位进程上支持分配超过 int 上限的大小
string.Join 内部实现解析
CollectionsMarshal.AsSpan(valuesList)
if (values is List<string?> valuesList) {
return JoinCore(separator.AsSpan(), CollectionsMarshal.AsSpan(valuesList));
}
if (values is string?[] valuesArray)
{
return JoinCore(separator.AsSpan(), new ReadOnlySpan<string?>(valuesArray));
}
Join
public static string Join(string? separator, IEnumerable<string?> values)
{
if (values is List<string?> valuesList)
{
return JoinCore(separator.AsSpan(), CollectionsMarshal.AsSpan(valuesList));
}
if (values is string?[] valuesArray)
{
return JoinCore(separator.AsSpan(), new ReadOnlySpan<string?>(valuesArray));
}
if (values == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.values);
}
using (IEnumerator<string?> en = values.GetEnumerator())
{
if (!en.MoveNext())
{
return Empty;
}
string? firstValue = en.Current;
if (!en.MoveNext())
{
// Only one value available
return firstValue ?? Empty;
}
// Null separator and values are handled by the StringBuilder
var result = new ValueStringBuilder(stackalloc char[256]);
result.Append(firstValue);
do
{
result.Append(separator);
result.Append(en.Current);
}
while (en.MoveNext());
return result.ToString();
}
}
JoinCore
private static string JoinCore(ReadOnlySpan<char> separator, ReadOnlySpan<string?> values)
{
if (values.Length <= 1)
{
return values.IsEmpty ?
Empty :
values[0] ?? Empty;
}
long totalSeparatorsLength = (long)(values.Length - 1) * separator.Length;
if (totalSeparatorsLength > int.MaxValue)
{
ThrowHelper.ThrowOutOfMemoryException();
}
int totalLength = (int)totalSeparatorsLength;
// Calculate the length of the resultant string so we know how much space to allocate.
foreach (string? value in values)
{
if (value != null)
{
totalLength += value.Length;
if (totalLength < 0) // Check for overflow
{
ThrowHelper.ThrowOutOfMemoryException();
}
}
}
// Copy each of the strings into the result buffer, interleaving with the separator.
string result = FastAllocateString(totalLength);
int copiedLength = 0;
for (int i = 0; i < values.Length; i++)
{
// It's possible that another thread may have mutated the input array
// such that our second read of an index will not be the same string
// we got during the first read.
// We range check again to avoid buffer overflows if this happens.
if (values[i] is string value)
{
int valueLen = value.Length;
if (valueLen > totalLength - copiedLength)
{
copiedLength = -1;
break;
}
// Fill in the value.
FillStringChecked(result, copiedLength, value);
copiedLength += valueLen;
}
if (i < values.Length - 1)
{
// Fill in the separator.
// Special-case length 1 to avoid additional overheads of CopyTo.
// This is common due to the char separator overload.
ref char dest = ref Unsafe.Add(ref result._firstChar, copiedLength);
if (separator.Length == 1)
{
dest = separator[0];
}
else
{
separator.CopyTo(new Span<char>(ref dest, separator.Length));
}
copiedLength += separator.Length;
}
}
// If we copied exactly the right amount, return the new string. Otherwise,
// something changed concurrently to mutate the input array: fall back to
// doing the concatenation again, but this time with a defensive copy. This
// fall back should be extremely rare.
return copiedLength == totalLength ?
result :
JoinCore(separator, values.ToArray().AsSpan());
}
.NET Core 内部提升性能对象
ValueStringBuilder
分析 string.Join 源码能发现,它内部使用了一个非公开的 ValueStringBuilder,可以在构造它时指定使用栈内存或池化内存,降低 GC 压力和内存开销。
// Null separator and values are handled by the StringBuilder
var result = new ValueStringBuilder(stackalloc char[256]);
result.Append(firstValue);
do {
result.Append(separator);
result.Append(en.Current);
}
while (en.MoveNext());
return result.ToString();
对象复用
Array Pool 池化数组 & PooledList 池化集合
Array Pool 会在线程槽上,创建共享的数组池,需要数组时,去 Array Pool 取得,不用每次你创建数值导致频繁的内存分配,进而减轻 GC 次数。
PooledList 是一个三方库,它其实就是实现不直接 new Array 存储数据,而是使用 Array Pool 里的 Array 来存储数据。
然后通过 using 再结束使用后归还给 Array Pool,如果忘记归还,会通过终结器归还。