SourceGenerator 生成db to class代码优化结果记录 二

优化

在上一篇留下的 Dapper AOT 还有什么特别优化点的问题

在仔细阅读生成代码和源码之后,终于得到了答案

个人之前一直以为 Dapper AOT 只用了迭代器去实现,所以理应差不多实现代码却又极大差距,思维陷入了僵局,一度以为有什么黑魔法

结果 Dapper AOT 没有用迭代器去实现!!! 靠北啦,还以为迭代器有新姿势可以优化了

不再使用迭代器

List<BenchmarkTest.Dog> results = new();
try
{
    while (reader.Read())
    {
        results.Add(ReadOne(reader, readOnlyTokens));
    }
    return results;
}

当然就只能要求 用户必须使用 AsList 方法,因为 ToList 会导致复制list的问题, 导致负优化,

像这样

 connection.Query<Dog>("select * from dog").AsList();

// AsList 实现
public static List<T> AsList<T>(this IEnumerable<T>? source) => source switch
{
    null => null!,
    List<T> list => list,
    _ => Enumerable.ToList(source),
};

使用 span

再没有了迭代器方法限制, span 就可以放飞自我,随意使用了

public static BenchmarkTest.Dog ReadOne(this IDataReader reader, ref ReadOnlySpan<int> ss)
{
    var d = new BenchmarkTest.Dog();
    for (int j = 0; j < ss.Length; j++)
    {

使用 ArrayPool 减少内存占用

public Span<int> GetTokens()
{
    FieldCount = Reader!.FieldCount;
    if (Tokens is null || Tokens.Length < FieldCount)
    {
        // no leased array, or existing lease is not big enough; rent a new array
        if (Tokens is not null) ArrayPool<int>.Shared.Return(Tokens);
        Tokens = ArrayPool<int>.Shared.Rent(FieldCount);
    }
    return MemoryMarshal.CreateSpan(ref MemoryMarshal.GetArrayDataReference(Tokens), FieldCount);
}

数据小时使用栈分配

 var s = reader.FieldCount <= 64 ? MemoryMarshal.CreateSpan(ref MemoryMarshal.GetReference(stackalloc int[reader.FieldCount]), reader.FieldCount) :  state.GetTokens();

提前生成部分 hashcode 进行比较

因为比较现在也并不耗时了, 所以 缓存也没有必要了, 也一并移除

public static void GenerateReadTokens(this IDataReader reader, Span<int> s)
{
    for (int i = 0; i < reader.FieldCount; i++)
    {
        var name = reader.GetName(i);
        var type = reader.GetFieldType(i);
        switch (EntitiesGenerator.NormalizedHash(name))
        {
            
            case 742476188U:
                s[i] = type == typeof(int) ? 1 : 2; 
                break;

            case 2369371622U:
                s[i] = type == typeof(string) ? 3 : 4; 
                break;

            case 1352703673U:
                s[i] = type == typeof(float) ? 5 : 6; 
                break;

            default:
                break;
        }
    }
}

性能测试说明

BenchmarkDotNet

这里特别说明一下

使用的 BenchmarkDotNet ,其本身已经考虑了 jit优化等等方面, 有预热,超多次执行,

结果值也是按照统计学有考虑结果集分布情况处理,移除变差大的值(比如少数的孤立的极大极小值), 差异不大情况,一般显示平均值,有大差异时还会显示 中位值

感兴趣的童鞋可以去 https://github.com/dotnet/BenchmarkDotNet 了解

chole 有点棘手,为了方便mock,所以 copy了部分源码,只比较实体映射部分

测试数据

测试数据 正如之前说过, 采用 手动 mock 方式,避免 db 驱动 、db 执行、mock库 等等 带来的执行差异影响

class

非常简单的类,当然不能代表所有情况,不过简单测试够用了

public class Dog
{
    public int? Age { get; set; }
    public string Name { get; set; }
    public float? Weight { get; set; }
}

mock 数据

 public class TestDbConnection : DbConnection
 {
     public int RowCount { get; set; }

    public IDbCommand CreateCommand()
    {
        return new TestDbCommand() { RowCount = RowCount };
    }
}

public class TestDbCommand : DbCommand
{
    public int RowCount { get; set; }

    public IDataParameterCollection Parameters { get; } = new TestDataParameterCollection();

   public IDbDataParameter CreateParameter()
      {
         return new TestDataParameter();
      }

        protected override DbDataReader ExecuteDbDataReader(CommandBehavior behavior)
        {
            return new TestDbDataReader() { RowCount = RowCount };
        }
}

    public class TestDbDataReader : DbDataReader
    {
        public int RowCount { get; set; }
        private int calls = 0;
        public override object this[int ordinal] 
        {
            get
            {
                switch (ordinal)
                {
                    case 0:
                        return "XX";
                    case 1:
                        return 2;
                    case 2:
                        return 3.3f;
                    default:
                        return null;
                }
            }
        
        }
      public override int FieldCount => 3;

      public override Type GetFieldType(int ordinal)
      {
          switch (ordinal)
          {
              case 0:
                  return typeof(string);
              case 1:
                  return typeof(int);
              case 2:
                  return typeof(float);
              default:
                  return null;
          }
      }

      public override float GetFloat(int ordinal)
      {
          switch (ordinal)
          {
              case 2:
                  return 3.3f;
              default:
                  return 0;
          }
      }
        public override int GetInt32(int ordinal)
        {
            switch (ordinal)
            {
                case 1:
                    return 2;
                default:
                    return 0;
            }
        }
        public override string GetName(int ordinal)
        {
            switch (ordinal)
            {
                case 0:
                    return "Name";
                case 1:
                    return "Age";
                case 2:
                    return "Weight";
                default:
                    return null;
            }
        }
        public override string GetString(int ordinal)
        {
            switch (ordinal)
            {
                case 0:
                    return "XX";
                default:
                    return null;
            }
        }

        public override object GetValue(int ordinal)
        {
            switch (ordinal)
            {
                case 0:
                    return "XX";
                case 1:
                    return 2;
                case 2:
                    return 3.3f;
                default:
                    return null;
            }
        }

        public override bool Read()
        {
            calls++;
            return calls <= RowCount;
        }
}

Benchmark 代码

    [MemoryDiagnoser, Orderer(summaryOrderPolicy: SummaryOrderPolicy.FastestToSlowest), GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory), CategoriesColumn]
    public class ObjectMappingTest
    {
        [Params(1, 1000, 10000, 100000, 1000000)]
        public int RowCount { get; set; }

        [Benchmark(Baseline = true)]
        public void SetClass()
        {
            var connection = new TestDbConnection() { RowCount = RowCount };
            var dogs = new List<Dog>();
            try
            {
                connection.Open();
                var cmd = connection.CreateCommand();
                cmd.CommandText = "select ";
                using (var reader = cmd.ExecuteReader(CommandBehavior.Default))
                {
                    while (reader.Read())
                    {
                        var dog = new Dog();
                        dogs.Add(dog);
                        dog.Name = reader.GetString(0);
                        dog.Age = reader.GetInt32(1);
                        dog.Weight = reader.GetFloat(2);
                    }
                }
            }
            finally
            {
                connection.Close();
            }
        }

        [Benchmark]
        public void DapperAOT()
        {
            var connection = new TestDbConnection() { RowCount = RowCount };
            var dogs = connection.Query<Dog>("select * from dog").AsList();
        }

        [Benchmark]
        public void SourceGenerator()
        {
            var connection = new TestDbConnection() { RowCount = RowCount };
            List<Dog> dogs;
            try
            {
                connection.Open();
                var cmd = connection.CreateCommand();
                cmd.CommandText = "select ";
                using (var reader = cmd.ExecuteReader(CommandBehavior.Default))
                {
                    dogs = reader.ReadTo<Dog>().AsList();
                }
            }
            finally
            {
                connection.Close();
            }
        }

        [Benchmark]
        public void Chloe()
        {
            var connection = new TestDbConnection() { RowCount = RowCount };
            try
            {
                connection.Open();
                var cmd = connection.CreateCommand();
                var dogs = new InternalSqlQuery<Dog>(cmd, "select").AsList();
            }
            finally
            {
                connection.Close();
            }
        }
    }

完整代码可以参考 https://github.com/fs7744/SlowestEM

测试结果


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3)
13th Gen Intel Core i9-13900KF, 1 CPU, 32 logical and 24 physical cores
.NET SDK 9.0.100-preview.6.24328.19
  [Host]     : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2


Method RowCount Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
DapperAOT 1 294.1 ns 5.79 ns 7.73 ns 0.63 0.02 0.0234 0.0229 - 440 B 1.00
Dapper 1 321.7 ns 6.40 ns 5.99 ns 0.70 0.02 0.0405 0.0401 - 768 B 1.75
SourceGenerator 1 408.7 ns 6.38 ns 5.33 ns 0.89 0.01 0.0234 0.0229 - 440 B 1.00
SetClass 1 460.6 ns 4.82 ns 4.51 ns 1.00 0.00 0.0234 0.0229 - 440 B 1.00
Chloe 1 498.9 ns 8.99 ns 12.31 ns 1.09 0.03 0.0453 0.0448 - 856 B 1.95
SetClass 1000 4,751.0 ns 84.12 ns 86.38 ns 1.00 0.00 3.0212 1.2894 - 56912 B 1.00
SourceGenerator 1000 11,402.9 ns 220.27 ns 244.83 ns 2.39 0.04 3.0212 1.2817 - 56912 B 1.00
DapperAOT 1000 11,421.3 ns 121.00 ns 113.18 ns 2.41 0.05 3.0212 0.6409 - 56912 B 1.00
Dapper 1000 29,601.8 ns 447.50 ns 396.69 ns 6.25 0.15 5.5542 1.0986 - 105192 B 1.85
Chloe 1000 66,872.0 ns 150.27 ns 133.21 ns 14.12 0.27 2.9297 0.9766 - 57328 B 1.01
SetClass 10000 106,271.3 ns 2,111.19 ns 3,468.75 ns 1.00 0.00 41.6260 41.6260 41.6260 662782 B 1.00
DapperAOT 10000 172,867.7 ns 2,079.77 ns 1,945.42 ns 1.65 0.05 41.5039 41.5039 41.5039 662782 B 1.00
SourceGenerator 10000 181,916.1 ns 1,653.15 ns 1,465.47 ns 1.74 0.05 41.5039 41.5039 41.5039 662782 B 1.00
Dapper 10000 705,883.0 ns 8,517.90 ns 7,550.89 ns 6.74 0.19 82.0313 81.0547 41.0156 1143062 B 1.72
Chloe 10000 746,825.0 ns 3,067.25 ns 2,869.11 ns 7.15 0.21 41.0156 41.0156 41.0156 663198 B 1.00
SetClass 100000 1,191,303.2 ns 20,831.95 ns 19,486.22 ns 1.00 0.00 498.0469 498.0469 498.0469 6098016 B 1.00
DapperAOT 100000 1,794,197.8 ns 17,937.20 ns 16,778.47 ns 1.51 0.03 498.0469 498.0469 498.0469 6098016 B 1.00
SourceGenerator 100000 1,973,894.9 ns 26,063.73 ns 24,380.03 ns 1.66 0.03 496.0938 496.0938 496.0938 6098016 B 1.00
Dapper 100000 4,357,237.9 ns 85,065.76 ns 83,545.95 ns 3.66 0.09 492.1875 492.1875 492.1875 10898296 B 1.79
Chloe 100000 7,524,264.2 ns 91,289.38 ns 85,392.15 ns 6.32 0.14 492.1875 492.1875 492.1875 6098432 B 1.00
SetClass 1000000 49,990,270.7 ns 987,172.66 ns 1,829,787.82 ns 1.00 0.00 3300.0000 3300.0000 1400.0000 56778489 B 1.00
DapperAOT 1000000 56,473,264.7 ns 995,473.43 ns 1,427,678.25 ns 1.13 0.05 3555.5556 3555.5556 1777.7778 56779066 B 1.00
SourceGenerator 1000000 58,368,836.3 ns 1,153,542.14 ns 2,080,074.43 ns 1.17 0.06 3555.5556 3555.5556 1777.7778 56779066 B 1.00
Chloe 1000000 110,416,752.0 ns 1,562,298.26 ns 1,461,374.77 ns 2.19 0.10 3400.0000 3400.0000 1600.0000 56781312 B 1.00
Dapper 1000000 138,433,886.4 ns 2,765,190.70 ns 4,385,885.48 ns 2.77 0.14 6250.0000 6250.0000 2000.0000 104779052 B 1.85

SourceGenerator 基本等同 DapperAOT 了, 除了没有使用 Interceptor, 以及各种情况细节没有考虑之外, 两者性能一样

SourceGenerator 肯定现在性能优化最佳方式,毕竟可以生成代码文件,上手难度其实比 emit 之类小多了

posted @ 2024-08-03 14:17  victor.x.qu  阅读(310)  评论(4编辑  收藏  举报