[经典文章翻译] [未完工] [9-6更新] 在.NET Framework中针对Real-Time技术的性能注意事项
原作者: Emmanuel Schanzer
总结:
这篇文章包括了在托管世界中工业级的各种技术以及对它们如何影响性能的技术解释. 涉及到垃圾收集, JIT, Remoting, ValueTypes, 安全等方面.
概览:
.NET运行时引入了多种旨在提高安全性, 易开发性, 高性能的高级技术. 作为一个研发人员, 理解这些技术的任何一个并在你的代码中高效地使用这些技术都是比较重要的. Run Time提供的高级工具会使得创建健壮的应用程序变得更加容易, 但是如何让应用程序飞的更快一点却是(也一直是)研发人员的责任.
这篇白皮书会给你提供一个对.NET的工业级技术的更加深入的理解, 并帮助你调整你的代码使之运行的更快. 注意, 这不是一篇规范表. 现在已经有很多实实在在的技术信息了. 这篇文章的目的是聚焦性能问题来提供信息, 也许不能回答你的每一个技术问题. 如果这里找不到你的问题的答案, 我建议你再在MSDN在线文档库中在多看看.
我将会讨论到下面的技术, 并且对它们的目的以及为什么他们会影响性能提供高层次的概述. 之后我会深入到一些底层的技术实现细节, 并使用示例代码来说明如何从每一项techonlgy中获取高性能和高速度.
Garbage Collection
Thread Pool
The JIT
AppDomains
Security
Remoting
ValueTypes
垃圾收集
=======================
基础
垃圾收集(GC)通过释放不再被使用的对象的内存把程序员从释放内存这种常见却又难以debug其错误的任务中解放了出来. 一般一个对象的生存路径如下代码所示, 托管或非托管是一样的:
Foo a = new Foo(); // 为对象分配内存并初始化 ...a... // 使用该对象 delete a; // 清除对象的状态, 进行清理 // 释放这个对象的内存
在native code中, 你需要自己来做所有的这一切. 忽略内存分配阶段或清理阶段都会导致不可预期的行为, 并且这种问题难以debug, 忘记释放内存则会导致内存泄露. 在CLR, 内存分配跟我们刚刚看到的很接近. 如果我们添加GC-specific信息的话, 我们会得到看起来非常相似的内容.
Foo a = new Foo(); // 为对象分配内存并初始化 ...a... // 使用该对象(该对象是strong reachable的) a = null; // A对象变为unreachable了(out of scope, nulled, 等等) // 最终, 对A对象的回收发生, 还有同时还要回收A的资源. // 内存被回收掉
直到对象能够被释放之前, 在托管和非托管世界以上的步骤都是一样的. 在native code, 你需要记住释放在对象使用完毕后释放掉它. 在managed code, 一旦对象变为unreachable, 那么GC会回收它. 当然了, 如果你的resource需要在释放上特别吃小灶的话(比如说关闭socket), GC就需要你的帮助才能正确地进行处理. 你写的代码中, 在对象释放之前进行清理工作这一条规则依然适用, 你可以使用Dispose() 和Finalize() 方法来这样做. 我们稍后会谈论这二者的区别.
如果你保留着一个指向某资源的指针, 那么GC就不可能知道你是否将来还要使用这项资源. 这就意味着你在native code中使用的所有的显式释放对象的规则仍然适用, 但是绝大多数情况下, GC会为你处理掉一切. 如果说以前你要把百分一百的时间投入到内存管理上, 那么现在你仅需要百分之五的时间来考虑内存管理了.
CLR的垃圾收集器是一个按代划分的(generational), 标记并整理的(mark-and-compact )回收器. 它遵循以下的几条原则, 这些原则能让它获得出色的性能. 首先, 短命的对象往往是较小的和会经常被访问到的. GC把分配图表划分为几个子图表, 叫做generations(代), Generation能让GC尽可能地花费较少的时间来进行回收. Gen 0包含年轻的, 经常被访问的对象. 这些对象规模趋近于最小, 并且需要大概10毫秒来回收. 因为GC能够再进行这次回收的时候忽略其他generation的回收, 所以它可以提供更高的性能. G1和G2是为了更大的, 更老的, 不会被频繁回收的对象准备的. 当G1回收发生的时候, G0也被回收. G2的回收是一种完全的回收, 尽在这是GC会遍历整个内存graph. 它还会智能地使用CPU缓存, 通过这种技术能够调整某个CPU之上的内存子系统. 对于native的内存分配来说, 这种优化不容易获得, 但如果有这种优化的话, 就能够帮助提高你的应用程序的性能.
垃圾收集何时发生?
在需要分配内存的时候, GC会检查是否需要进行回收. GC会查看可回收的内存的大小, 剩下的内存的大小, 以及每一个generation的大小, 然后使用一个启发式方法来做决定. 直到一个回收发生, 对象的内存分配可以像C或C++一样快, 甚至更快.
垃圾收集的时候做发生了些什么?
让我们一步步地看垃圾收集器在回收的时候都做了哪些步骤吧. GC维护着一个root的列表, 该列表内容指向GC的堆heap. 如果一个对象是活动的, 那么就会有一个root指向它在堆中的位置. 堆中的对象还可以互相引用. 这张指针图(reachability graph)是GC为了释放内存而必须进行搜索的. 事件发生的顺序如下:
1. 托管堆中所有的内存分配块都是连续的, 当剩下的一块大小不足以应付一个请求的时候, 那么GC就会被触发了.
2. GC顺着每一个root以及root之后的所有指针进行遍历, 生成一个列表, 列表中的对象都是前面的遍历所无法到达的.
3. 从root出发进行遍历, 每一个无法到达的对象都被认为是可以回收的, 并且这些对象会为后面的回收而被进行标记.
4. 从reachability graph中移除掉对象, 使得很多对象都可以回收了. 然而, 有些资源需要进行特别处理. 当你定义一个对象的时候, 你可以选择为它定义Dispose() 方法或Finalize() 方法, 或者二者都有. 我们稍后会讨论这二者的不同, 并且会讨论什么时候使用它们.
5. 回收的最后一步是内存整理阶段. 所有正在被使用的对象都被移到一块连续的内存块上, 所有的指针以及root都会被更新.
6. 通过整理活动的对象并且更新可用内存的起始地址, GC保持了内用内存快的连续性. 如果有足够空间进行内存分配, 那么GC就会把控制转交给应用程序. 如果还不能满足, 那么就报出exception, 类型为OutOfMemoryException
Object Cleanup
Some objects require special handling before their resources can be returned. A few examples of such resources are files, network sockets, or database connections. Simply releasing the memory on the heap isn't going to be enough, since you want these resources closed gracefully. To perform object cleanup, you can write a Dispose() method, a Finalize() method, or both.
A Finalize() method:
- Is called by the GC
- Is not guaranteed to be called in any order, or at a predictable time
- After being called, frees memory after the next GC
- Keeps all child objects live until the next GC
A Dispose() method:
- Is called by the programmer
- Is ordered and scheduled by the programmer
- Returns resources upon completion of the method
Managed objects that hold only managed resources don't require these methods. Your program will probably use only a few complex resources, and chances are you know what they are and when you need them. If you know both of these things, there's no reason to rely on finalizers, since you can do the cleanup manually. There are several reasons that you want to do this, and they all have to do with the finalizer queue.
In the GC, when an object that has a finalizer is marked collectable, it and any objects it points to are placed in a special queue. A separate thread walks down this queue, calling the Finalize() method of each item in the queue. The programmer has no control over this thread, or the order of items placed in the queue. The GC may return control to the program, without having finalized any objects in the queue. Those objects may remain in memory, tucked away in queue for a long time. Calls to finalize are done automatically, and there is no direct performance impact from call itself. However, the non-deterministic model for finalization can definitely have other indirect consequences:
- In a scenario where you have resources that need to be released at a specific time, you lose control with finalizers. Say you have a file open, and it needs to be closed for security reasons. Even when you set the object to null, and force a GC immediately, the file will remain open until its Finalize() method is called, and you have no idea when this could happen.
- N objects that require disposal in a certain order may not be handled correctly.
- An enormous object and its children may take up far too much memory, require additional collections and hurt performance. These objects may not be collected for a long time.
- A small object to be finalized may have pointers to large resources that could be freed at any time. These objects will not be freed until the object to be finalized is taken care of, creating unnecessary memory pressure and forcing frequent collections.
The state diagram in Figure 3 illustrates the different paths your object can take in terms of finalization or disposal.
As you can see, finalization adds several steps to the object's lifetime. If you dispose of an object yourself, the object can be collected and the memory returned to you in the next GC. When finalization needs to occur, you have to wait until the actual method gets called. Since you are not given any guarantees about when this happens, you can have a lot of memory tied up and be at the mercy of the finalization queue. This can be extremely problematic if your object is connected to a whole tree of objects, and they all sit in memory until finalization occurs.
Choosing Which Garbage Collector to Use
The CLR has two different GCs: Workstation (mscorwks.dll) and Server (mscorsvr.dll). When running in Workstation mode, latency is more of a concern than space or efficiency. A server with multiple processors and clients connected over a network can afford some latency, but throughput is now a top priority. Rather than shoehorn both of these scenarios into a single GC scheme, Microsoft has included two garbage collectors that are tailored to each situation.
Server GC:
- Multiprocessor (MP) Scalable, Parallel
- One GC thread per CPU
- Program paused during marking
Workstation GC:
- Minimizes pauses by running concurrently during full collections
The server GC is designed for maximum throughput, and scales with very high performance. Memory fragmentation on servers is a much more severe problem than on workstations, making garbage collection an attractive proposition. In a uniprocessor scenario, both collectors work the same way: workstation mode, without concurrent collection. On an MP machine, the Workstation GC uses the second processor to run the collection concurrently, minimizing delays while diminishing throughput. The Server GC uses multiple heaps and collection threads to maximize throughput and scale better.
You can choose which GC to use when you host the run time. When you load the run time into a process, you specify what collector to use. Loading the API is discussed in the .NET Framework Developer's Guide. For an example of a simple program that hosts the run time and selects the server GC, take a look at the Appendix.
Myth: Garbage Collection Is Always Slower Than Doing It by Hand
Actually, until a collection is called, the GC is a lot faster than doing it by hand in C. This surprises a lot of people, so it's worth some explanation. First of all, notice that finding free space occurs in constant time. Since all free space is contiguous, the GC simply follows the pointer and checks to see if there's enough room. In C, a call to malloc()
typically
results in a search of a linked list of free blocks. This can be time consuming, especially if your heap is badly fragmented. To make matters worse, several implementations of the C run time lock the heap during this procedure. Once the memory is allocated or used, the list has to be updated. In a garbage-collected environment, allocation is free, and the memory is released during collection. More advanced programmers will reserve large blocks of memory, and handle allocation within that block themselves. The problem with this approach is that memory fragmentation becomes a huge problem for programmers, and it forces them to add a lot of memory-handling logic to their applications. In the end, a garbage collector doesn't add a lot of overhead. Allocation is as fast or faster, and compaction is handled automatically—freeing programmers to focus on their applications.
In the future, garbage collectors could perform other optimizations that make it even faster. Hot spot identification and better cache usage are possible, and can make enormous speed differences. A smarter GC could pack pages more efficiently, thereby minimizing the number of page fetches that occur during execution. All of these could make a garbage-collected environment faster than doing things by hand.
Some people may wonder why GC isn't available in other environments, like C or C++. The answer is types. Those languages allow casting of pointers to any type, making it extremely difficult to know what a pointer refers to. In a managed environment like the CLR, we can guarantee enough about the pointers to make GC possible. The managed world is also the only place where we can safely stop thread execution to perform a GC: in C++ these operations are either unsafe or very limited.
Tuning for Speed
The biggest worry for a program in the managed world is memory retention. Some of the problems that you'll find in unmanaged environments are not an issue in the managed world: memory leaks and dangling pointers are not much of a problem here. Instead, programmers need to be careful about leaving resources connected when they no longer need them.
The most important heuristic for performance is also the easiest one to learn for programmers who are used to writing native code: keep track of the allocations to make, and free them when you're done. The GC has no way of knowing that you aren't going to use a 20KB string that you built if it's part of an object that's being kept around. Suppose you have this object tucked away in a vector somewhere, and you never intend to use that string again. Setting the field to null will let the GC collect those 20KB later, even if you still need the object for other purposes. If you don't need the object anymore, make sure you're not keeping references to it. (Just like in native code.) For smaller objects, this is less of a problem. Any programmer that's familiar with memory management in native code will have no problem here: all the same common sense rules apply. You just don't have to be so paranoid about them.
The second important performance concern deals with object cleanup. As I mentioned earlier, finalization has profound impacts on performance. The most common example is that of a managed handler to an unmanaged resource: you need to implement some kind of cleanup method, and this is where performance becomes an issue. If you depend on finalization, you open yourself up to the performance problems I listed earlier. Something else to keep in mind is that the GC is largely unaware of memory pressure in the native world, so you may be using a ton of unmanaged resources just by keeping a pointer around in the managed heap. A single pointer doesn't take up a lot of memory, so it could be a while before a collection is needed. To get around these performance problems, while still playing it safe when it comes to memory retention, you should pick a design pattern to work with for all the objects that require special cleanup.
The programmer has four options when dealing with object cleanup:
1. Implement Both
This is the recommended design for object cleanup. This is an object with some mix of unmanaged and managed resources. An example would be System.Windows.Forms.Control. This has an unmanaged resource (HWND) and potentially managed resources (DataConnection, etc.). If you are unsure of when you make use of unmanaged resources, you can open the manifest for your program in ILDASM
and check for references to native libraries. Another alternative is to use vadump.exe
to see what resources are loaded along with your program. Both of these may provide you with insight as to what kind of native resources you use.
The pattern below gives users a single recommended way instead of overriding cleanup logic (override Dispose(bool)). This provides maximum flexibility, as well as catch-all just in case Dispose() is never called. The combination of maximum speed and flexibility, as well as the safety-net approach make this the best design to use.
Example:
public class MyClass : IDisposable { public void Dispose() { Dispose(true); GC.SuppressFinalizer(this); } protected virtual void Dispose(bool disposing) { if (disposing) { ... } ... } ~MyClass() { Dispose(false); } }
2. Implement Dispose() Only
This is when an object has only managed resources, and you want to make sure that its cleanup is deterministic. An example of such an object is System.Web.UI.Control.
Example:
public class MyClass : IDisposable { public virtual void Dispose() { ... }
3. Implement Finalize() Only
This is needed in extremely rare situations, and I strongly recommend against it. The implication of a Finalize() only object is that the programmer has no idea when the object is going to be collected, yet is using a resource complex enough to require special cleanup. This situation should never occur in a well-designed project, and if you find yourself in it you should go back and find out what went wrong.
Example:
public class MyClass { ... ~MyClass() { ... }
4. Implement Neither
This is for a managed object that points only to other managed objects that are not disposable nor to be finalized.
Recommendation
The recommendations for dealing with memory management should be familiar: release objects when you're done with them, and keep an eye out for leaving pointers to objects. When it comes to object cleanup, implement both a Finalize() and Dispose()
method for objects with unmanaged resources. This will prevent unexpected behavior later, and enforce good programming practices
The downside here is that you force people to have to call Dispose(). There is no performance loss here, but some people might find it frustrating to have to think about disposing of their objects. However, I think it's worth the aggravation to use a model that makes sense. Besides, this forces people to be more attentive to the objects they allocate, since they can't blindly trust the GC to always take care of them. For programmers coming from a C or C++ background, forcing a call to Dispose() will probably be beneficial, since it's the kind of thing they are more familiar with.
Dispose() should be supported on objects that hold on to unmanaged resources anywhere in the tree of objects underneath it; however, Finalize() need only be placed only on those objects that are specifically holding on to these resources, such as an OS Handle or unmanaged memory allocation. I suggest creating small managed objects as "wrappers" for implementing Finalize() in addition to supporting Dispose(),
which would be called by the parent object's Dispose(). Since the parent objects do not have a finalizer, the entire tree of objects will not survive a collection regardless of whether or not Dispose() was called.
A good rule of thumb for finalizers is to use them only on the most primitive object that requires finalization. Suppose I have a large managed resource that includes a database connection: I would make it possible for the connection itself to be finalized, but make the rest of the object disposable. That way I can call Dispose() and free the managed portions of the object immediately, without having to wait for the connection to be finalized. Remember: use Finalize() only where you have to, when you have to.
Note C and C++ Programmers: the Destructor semantic in C# creates a finalizer, not a disposal method!
线程池
=======================
JIT
=======================
AppDomain
=======================
Security
=======================
Remoting
=======================
ValueTypes
=======================
原文地址:
Performance Considerations for Run-Time Technologies in the .NET Framework