CLR via C# 读书笔记 2-3 Cache Lines and False Sharing(高速缓冲区和错误共享???)

因为水平有限不知道怎么用中文表达Cache Lines 和 False Sharing (暂时把他们翻译为高速缓冲区和错误共享,如有谬误,还请有识之士指正)

现在的cpu 一般拥有多个核心和一个cpu内的缓存(一般是L2?)

这些缓存一般位于cpu芯片内, 他的速度远远高于主板上的内存,

一般来说cpu会把数据从内存加载到缓存中 ,这样可以获得更好的性能(特别是频繁使用的数据)

这个高速缓存默认划分64 Byte为一个区域(这个数字可能在不同的平台上不一样, 可以通过 win32 api 函数 GetProcessorInformation 修改)

一个区域在一个时间点只允许一个核心操作

也就是说不能有多个核心同时操作一个缓存区域(现在cpu都是多核心的...)

由于64Byte大的空间可以同时存的下多个数据结构例如16个integer32

例如以下代码(这个代码只是demo用,请不要深究他的命名和设计问题...)

private class Data
    {
        public Int32 field1;
        public Int32 field2;
    }

那么完全可能一个线程在操作field1 的时候 , 运行于另外一个cpu上的另外一个线程想操作field2,就必须等待线程1完成以后才能获取这个缓存区域的访问.

这在数据操作很密集的时候会造成很大的性能损耗

请参考以下代码

代码

internal static class FalseSharing
{
    private class Data
    {
        public Int32 field1;
        public Int32 field2;
    }
    private const Int32 iterations = 100000000; // 100 million
    private static Int32 s_operations = 2;
    private static Int64 s_startTime;
    public static void Main()
    {
        // Allocate an object and record the start time
        Data data = new Data();
        s_startTime = Stopwatch.GetTimestamp();
        // Have 2 threads access their own fields within the structure
        ThreadPool.QueueUserWorkItem(o => AccessData(data, 0));
        ThreadPool.QueueUserWorkItem(o => AccessData(data, 1));
        // For testing, block the Main thread
        Console.ReadLine();
    }
    private static void AccessData(Data data, Int32 field)
    {
        // The threads in here each access their own field within the Data object
        for (Int32 x = 0; x < iterations; x++)
            if (field == 0) data.field1++; else data.field2++;
        // Whichever thread finishes last, shows the time it took
        if (Interlocked.Decrement(ref s_operations) == 0)
            Console.WriteLine("Access time: {0:N0}", Stopwatch.GetTimestamp() - s_startTime);
    }
}

这段代码在我的机器上运行了2,471,930,060 (Timestamp) 话说我的机器真是太烂了.....哪位同学有兴趣的也研究下在你机器上的速度吧......

如果将Data 这个类改为以下的定义

代码

[StructLayout(LayoutKind.Explicit)]//声明显式指定内存布局
    private class Data
    {
        // These two fields are separated now and no longer in the same cache line
        [FieldOffset(0)]//内存地址偏移量0
        public Int32 field1;
        [FieldOffset(64)]//内存地址偏移量64
        public Int32 field2;
    }

那么在我机器上它的运行时间为1,258,994,700(Timestamp)

修改内存布局将field2偏移64个字节以后, 程序需要更多的空间 (2个缓存),但是会有更快的运行速度

最实用的推论:

c# 所有一维数组继承于System.Array 并且进行了一些特殊的处理, 例如说边界检查

边界检查:当你访问任何数组的元素的时候,CLR 都要验证索引值必须位于数组的合法长度内,(index <length)

这就意味着无论访问数组的哪个元素,CLR都必须先访问一下Length

Length是位于数组元素之前的一个integer32的值用于表示数组的长度

那么在内存中数组的数据结构大概如下所示

 int[] vals = new int[] { 1, 2, 3, 4, 5 };
// 长度 第一个元素 第二个元素 第三个元素 第四个元素 第五个元素
//  5      1          2         3         4         5

假设默认的缓冲区大小是64Byte 那么数组的Length和开始的几个元素必然在一个缓冲区区域中

那么必须避免有一个线程A一直在操作Length,或者数据开始的其他元素

因为在此同时,想要读取数组其他部分的数据必须要等待线程A完成以后才能读取数组的其他部分.

PS:以下只是个人推断没有经过验证

一个缓冲区应该是允许并发读取,但是不允许并发写的时候, 并且写操作会block其他所有的操作 (例如读和其他写)

posted on 2010-11-25 12:10 听说读写阅读(1468) 评论(1) 编辑收藏举报

刷新页面返回顶部

听说读写

CLR via C# 读书笔记 2-3 Cache Lines and False Sharing(高速缓冲区和错误共享???)

导航

公告