.net垃圾回收和CLR 4.0对垃圾回收所做的改进之二

A survey of garbage collection and the changes CLR 4.0 brings in Part 2 - series of what is new in CLR 4.0

接前篇Continue the previous post .net垃圾回收和CLR 4.0对垃圾回收所做的改进之一

CLR4.0所带来的变化仍然没有在这篇,请看下篇。

内存释放和压缩

创建对象引用图之后,垃圾回收器将那些没有在这个图中的对象(即不再需要的对象)释放。释放内存之后, 出现了内存碎片, 垃圾回收器扫描托管堆,找到连续的内存块,然后移动未回收的对象到更低的地址, 以得到整块的内存,同时所有的对象引用都将被调整为指向对象新的存储位置。这就象一个夯实的动作。

After building up the reference relationship graph, garbage collector reclaims the objects not in the graph(no longer needed), after releasing the objects not in the graph, there is memory scrap. Garbage collector scans the managed heap to find continous memory block, and shifts the remaining objects to lower address to get consecutive memory space, and then adjusts the references of objects according to the shifted address of objects. This is looking like a tamping on the managed heap.

下面要说到的是代的概念。代概念的引入是为了提高垃圾收集器的整体性能。We come to the concept of generations next. The importing of generation concept is to improve the performance of garbage collector.

代Generations

请想一想如果垃圾收集器每次总是扫描所有托管堆中的对象,对性能会有什么影响。会不会很慢?是的。微软因此引入了代的概念。

Please think about what will happen if garbage collector scans all the objects in the whole heap in every garbage collecting cycle. Will it be very slow? Yes, therefore Microsoft imported the concept of generations.

为什么代的概念可以提高垃圾收集器的性能?因为微软是基于对大量编程实践的科学估计,做了一些假定而这些假定符合绝大多数的编程实践:

Why generation concept can help improve performance of garbage collector? Because Microsoft did scientific valuation on mass of programming practice, and made assumptions and the assumptions conform to most of programming practice:

  • 越新的对象,其生命周期越短。The newer an object is, the shorter its lifetime will be.
  • 越老的对象,其生命周越长。The older an object is, the longer its lifetime will be.
  • 新对象之间通常有强的关系并被同时访问。Newer objects tend to have strong relationships to each other and are frequently accessed around the same time.
  • 压缩一部分堆比压缩整个堆快。Compacting a portion of the heap is faster than compacting the whole heap.

有了代的概念,垃圾回收活动就可以大部分局限于一个较小的区域来进行。这样就对垃圾回收的性能有所提高。After importing the concept of generations, most of garbage collecting will be limited in in smaller range of memory. This enhances the performance of garbage collector.

让我们来看垃圾收集器具体是怎么实现代的: Let’s see how generations are exactly implemented in garbage collector:

第0代:新建对象和从未经过垃圾回收对象的集合   Generation 0: A collection of newly created object and the objects never collected.

第1代:在第0代收集活动中未回收的对象集合  Generation 1: A collection of objects not collected by garbage collector in collecting cycle of generation 0.

第2代:在第1和第2代中未回收的对象集合, 即垃圾收集器最高只支持到第2代, 如果某个对象在第2代的回收活动中留下来,它仍呆在第2代的内存中。 Generation 2: A collection of objects not collected by garbage collector in generation 1 and generation 2. This means the highest generation that garbage collector supports is generation 2. If an object survives in generation 2 collecting cycle, it still remains in memory of generation 2.

当程序刚开始运行,垃圾收集器分配为每一代分配了一定的内存,这些内存的初始大小由.net framework的策略决定。垃圾收集器记录了这三代的内存起始地址和大小。这三代的内存是连接在一起的。第2代的内存在第1代内存之下,第1代内存在第0代内存之下。应用程序分配新的托管对象总是从第0代中分配。如果第0代中内存足够,CLR就很简单快速地移动一下指针,完成内存的分配。这是很快速的。当第0代内存不足以容纳新的对象时,就触发垃圾收集器工作,来回收第0代中不再需要的对象,当回收完毕,垃圾收集器就夯实第0代中没有回收的对象至低的地址,同时移动指针至空闲空间的开始地址(同时按照移动后的地址去更新那些相关引用),此时第0代就空了,因为那些在第0代中没有回收的对象都移到了第1代。

When the program initializes, garbage collector allocates memory for generations. The initial size of memory blocks are determined according to the strategies of the .net framework. Garbage collector records the start address and size of the memory block for generations. The memory blocks of generations are continuous and adjacent. The memory of generation 2 is under the memory of generation 1, and the memory of generation 1 is under the memory of generation 0. CLR always allocates memory for new objects in generation 0. If there is enough memory in generation 0, CLR simply moves the pointer to allocate memory. This is really fast. When there is not enough memory in generation 0 to accommodate new objects, CLR triggers garbage collector starts to collect objects no longer needed from generation 0. When the collecting action in generation 0 finishs, garbage collector tamps(or compacts) the objects not collected in generation 0 to lower address, and moves the pointer to start address of free memory(and updates the related references according to the shifted address of objects). At this time, generation 0 is empty, because the objects survived in generation 0 are moved to generation 1.

当只对第0代进行收集时,所发生的就是部分收集。这与之前所说的全部收集有所区别(因为代的引入)。对第0代收集时,同样是从根开始找那些正引用的对象,但接下来的步骤有所不同。当垃圾收集器找到一个指向第1代或者第2代地址的根,垃圾收集器就忽略此根,继续找其他根,如果找到一个指向第0代对象的根,就将此对象加入图。这样就可以只处理第0代内存中的垃圾。这样做有个先决条件,就是应用程序此前没有去写第1代和第2代的内存,没有让第1代或者第2代中某个对象指向第0代的内存。但是实际中应用程序是有可能写第1代或者第2代的内存的。针对这种情况,CLR有专门的数据结构(Card table)来标志应用程序是否曾经写第1代或者第2代的内存。如果在此次对第0代进行收集之前,应用程序写过第1代或者第2代的内存,那些被Card Table登记的对象(在第1代或者第2代)将也要在此次对第0代收集时作为根。这样,才可以正确地对第0代进行收集。

When collecting generation 0 only, it is partial collection. It is different from full collection mentioned earlier(because of the generations). When collecting generation 0, garbage collector starts from the roots, which is the same as the full collection, but it is different in coming steps. When garbage collector finds a root pointing to an address of generation 1 or 2, garbage collector ignores the root, and goes to next root. If garbage collector finds a root pointing to an object of generation 0, garbage collector addes the object into the graph. That way garbage collector processes the objects of generation 0 only. There is a pre-condition to do that. It is that the application does not write to the memory of generation 1 and 2, does not allow some objects of generation 1 or 2 refer to the memory of generation 0. But in our daily work, the applicaiton is possible to write the memory of generation 1 or 2. In this case, CLR has a dedicated data structure called Card Table to record whether the application writes the memory of generation 1 or 2. If the application writes the memory of generation 1 or 2 before the collecting on generation 0, the objects recorded by the Card Table will become roots during the collecting on generation 0. Garbage collection on generation 0 can be done correctly in this case.

以上说到了第0代收集发生的一个条件,即第0代没有足够内存去容纳新对象。执行GC.Collect()也会触发对第0代的收集。另外,垃圾收集器还为每一代都维护着一个监视阀值。第0代内存达到这个第0代的阀值时也会触发对第0代的收集。对第1代的收集发生在执行GC.Collect(1)或者第1代内存达到第1代的阀值时。第2代也有类似的触发条件。当第1代收集时,第0代也需要收集。当第2代收集时,第1和第0代也需要收集。在第n代收集之后仍然存留下来的对象将被转移到第n+1代的内存中,如果n=2, 那么存留下来的对象还将留在第2代中。

We mentioned a criteria to trigger collecting on generation 0 in above paragraphs: generation 0 does not have enough memory to accommodate new objects. When execute GC.Collect(), it launches collecting on generation 0 also. In addition, garbage collector sets up a threshold for each of generations. When the memory of generation 0 reaches the threshold, collecting on generation 0 happens also. Collecting on generation 1 happens when executing GC.Collect() or the memory of generation 1 reaches the threshold of generation 1. Generation 2 has similar trigger conditions. When collecting on generation 1, collecting on generation 0 happens also. When collecting on generation 2, collecting on generation 1 and 0 happen also. The survived object in collecting generation n will be moved to the memory of generation n+1. If n=2, the remaining objects still stay in generation 2.

对象结束Finalization of objects

对象结束机制是程序员忘记用Close或者Dispose等方法清理申请的资源时的一个保证措施。如下的一个类,当一个此类的实例创建时,在第0代中分配内存,同时此对象的引用要被加入到一个由CLR维护的结束队列中去。

Finalization is an ensuring mechanism when programmers forget to use Close or Dispose method to clean up resources. For exmaple, a class like the following, when an instane of the class is created, it is allocated in memory of generation 0, and a reference of the object is appended to Finalization queue maintained by CLR.

public class BaseObj {
    public BaseObj() { } 
    protected override void Finalize() {
        // Perform resource cleanup code here...
        // Example: Close file/Close network connection
        Console.WriteLine("In Finalize.");
    }
}

当此对象成为垃圾时,垃圾收集器将其引用从结束队列移到待结束队列中,同时此对象会被加入引用关系图。一个独立运行的CLR线程将一个个从待结束队列(Jeffrey Richter称之为Freachable queue)取出对象,执行其Finalize方法以清理资源。因此,此对象不会马上被垃圾收集器回收。只有当此对象的Finalize方法被执行完毕后,其引用才会从待结束队列中移除。等下一轮回收时,垃圾回收器才会将其回收。

When the object becomes garbage, garbage collector moves the reference from Finalization queue to ToBeFinalized queue(Jeffrey Richter called it Freachable queue), and appends the object to the reference graph. A standalone thread of CLR will fetch objects from the ToBeFinalized queue one by one, and execute the Finalize() method of objects to clean up resources. Therefore, the object will not be collected right away by garbage collector. After the Finalize() method is executed, its reference will be removed from the ToBeFinalizaed queue. When next collecting comes, garbage collector reclaims its memory.

GC类有两个公共静态方法GC.ReRegisterForFinalize和GC.SuppressFinalize大家也许想了解一下,ReRegisterForFinalize是将指向对象的引用添加到结束队列中(即表明此对象需要结束),SuppressFinalize是将结束队列中该对象的引用移除,CLR将不再会执行其Finalize方法。

There are two public static methods of GC class you guys may want to know: GC.ReRegisterForFinalize and GC.SuppressFinalize. ReRegisterForFinalize is to append the reference of objects to finalization queue(meaning the objects need to be finalized), SuppressFinalize is to remove the reference of objects from finalization queue, then CLR would not execute the Finalize method of the object.

因为有Finalize方法的对象在new时就自动会加入结束队列中,所以ReRegisterForFinalize可以用的场合比较少。ReRegisterForFinalize比较典型的是配合重生(Resurrection)的场合来用。重生指的是在Finalize方法中让根又重新指向此对象。那么此对象又成了可到达的对象,不会被垃圾收集器收集,但是此对象的引用未被加入结束队列中。所以此处需要用ReRegisterForFinalize方法来将对象的引用添加到结束队列中。因为重生本身在现实应用中就很少见,所以ReRegisterForFinalize也将比较少用到。

Because the objects with Finalize method will be appended to Finalization queue when new operation, there are few scenarios to use ReRegisterForFinalize method. A typical scenario is to use ReRegisterForFinalize with Resurrection. Resurrection is that we let a root point to the object again in Finalize method, and then the object becomes reachable again, therefore it will be not collected by garbage collector. But the reference of the object has not been appended to Finalization queue, therefore we can use ReRegisterForFinalize to append the object to Finalization queue. Because there are few requirement in reality to use resurrection, ReRegisterForFinalize will be used in low rate.

相比之下,SuppressFinalize更常用些。SuppressFinalize用于同时实现了Finalize方法和Dispose()方法来释放资源的情况下。在Dispose()方法中调用GC.SuppressFinalize(this),那么CLR就不会执行Finalize方法。Finalize方法是程序员忘记用Close或者Dispose等方法清理资源时的一个保证措施。如果程序员记得调用Dispose(),那么就会不执行Finalize()来再次释放资源;如果程序员忘记调用Dispose(), Finalize方法将是最后一个保证资源释放的措施。这样做不失为一种双保险的方案。

Comparing to ReRegisterForFinalize, SuppressFinalize has more frequent utilization. When we implement both Finalize method and Dispose method to release resources, we need to use SuppressFinalize method. Call GC.SuppressFinalize(this) in the body of Dispose() method and then CLR will not execute the Finalize method. Finalization is an ensuring mechanism when programmers forget to use Close or Dispose method to clean up resources. If programmers do call Dispose(), then CLR will not call Finalize method to release resources again. If programmers forget to call Dispose(), then Finalize method will be the final ensuring mechnism for resource releasing. That way it is a dual fail-safe solution.

对象结束机制对垃圾收集器的性能影响比较大,同时CLR难以保证调用Finalize方法的时间和次序。因此,尽量不要用对象结束机制,而采用自定义的方法或者名为Close, Dispose的方法来清理资源。可以考虑实现IDisposable接口并为Dispose方法写好清理资源的方法体。

Finalization has significant impact on performance of garbage collector, and CLR can not be sure on the time and order to call Finalize methods of objects, therefore please do not use finalization of objects as possible as you can, instead, you could use self defined methods, Close method or Dispose method to clean up resources. Please think about to implement the IDisposable interface and write method body for the Dispose method to clean up resources.

大对象堆Large object heap

大对象堆专用于存放大于85000字节的对象。初始的大对象内存区域堆通常在第0代内存之上,并且与第0代内存不邻接。第0,第1和第2代合起来称为小对象堆。CLR分配一个新的对象时,如果其大小小于85000字节,就在第0代中分配,如果其大小大于等于85000自己,就在大对象堆中分配。

Large object heap is to store objects that its size is over 85000 bytes. The initial memory block of large object heap is above the memory block of generation 0, and it is not adjacent to memory block of generation 0. Generation 0,1 and 2 is called small object heap. When CLR allocates a new object, if its size is lower than 85000 bytes, then allocates memory in generation 0; If its size is over 85000 bytes, then allocates memory in large object heap.

因为大对象的尺寸比较大,收集时成本比较高,所以对大对象的收集是在第2代收集时。大对象的收集也是从根开始查找可到达对象,那些不可到达的大对象就可回收。垃圾收集器回收了大对象后,不会对大对象堆进行夯实操作(毕竟移动大对象成本较高),而是用一个空闲对象表的数据结构来登记哪些对象的空间可以再利用,其中两个相邻的大对象回收将在空闲对象表中作为一个对象对待。空闲对象表登记的空间将可以再分配新的大对象。

Because size of large object is significant, the cost of collection is significant also. Collection of large objects happens when collecting generation 2.  Collection of large objects starts from the roots also and searches for reachable objects. Non-reachable large objects will be collected. After collecting non-reachable large objects, garbage collector will not tamp the large object heap(because the cost of moving a large object is high), instead, garbage collector uses a free object table to record memory ranges that can be re-used, if there are two adjacent large object collected, then treats the two large objects as one large object in free object table. The memory ranges in free object table can be re-used by new large objects.

大对象的分配,回收的成本都较小对象高,因此在实践中最好避免很快地分配大对象又很快回收,可以考虑如何分配一个大对象池,重复利用这个大对象池,而不频繁地回收。

The cost of allocation and collection of large objects is higher than the cost of allocation and collection of small objects, therefore it would better avoid to allocate large object and release it soon. Please think about allocate a pool of large objects, try to re-use the pool of large objects, do not frequently reclaim large objects.

 

未完待续To be continued…

参考文献References

Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework By Jeffrey Richter  http://msdn.microsoft.com/en-us/magazine/bb985010.aspx

Garbage Collection Part 2: Automatic Memory Management in the Microsoft .NET Framework By Jeffrey Richter http://msdn.microsoft.com/en-us/magazine/bb985011.aspx

Garbage Collector Basics and Performance Hints By Rico Mariani at Microsoft  http://msdn.microsoft.com/en-us/library/ms973837.aspx

http://drowningintechnicaldebt.com/blogs/royashbrook/archive/2007/06/22/top-20-net-garbage-collection-gc-articles.aspx

Large Object Heap Uncovered By Maoni Stephens http://msdn.microsoft.com/en-us/magazine/cc534993.aspx

Garbage collection in msdn http://msdn.microsoft.com/en-us/library/0xy59wtx.aspx

CLR4.0所带来的变化仍然没有在这篇,请看下篇。

The changes CLR 4.0 brings in are not in this post, please read the next post.

posted on 2009-05-15 00:54  mikelij  阅读(512)  评论(0编辑  收藏  举报

导航