垃圾回收机制的优缺点 (On GC Trade-off)
Several weeks ago, I was asked to talk something about the GC trade-off. This is really a good question, but I don't think I gave a great answer at that time, some aspects were just not mentioned. After that, I thought over about this question, and now I decide to come up with a much better answer with this blog post.
Let's get back to the original question first. The question is
"You have experience with C/C++ and Java/C#/Python, could you talk something about the GC trade-off? And how do you compare languages with and without GC?"
My answer is:
The intention of GC is to reduce the difficulty in memory management and avoid some typical memory related errors, i.e. segment fault, memory leak. Programming in languages with GC is considered as the higher level programming, programmers talk to objects with reference instead of pointer (memory address). So, generally speaking, programming in languages with GC will be easier, because you will never get a crash by segment fault, you don't need to care about releasing allocated memories (you still need to reset the references to unused objects).
But every coin has 2 sides, there're some downsides about GC indeed.
The most obvious downside is that reference is a kind of indirection, it's not as efficient as direct memory access with pointer. But I don't think this is really a big issue for most of the systems I worked on.
The thing really matters is by taking control of the large part of memory management, GC brings both convenience and inconvenience to programmers, "no need to care" also means "loss of control". In languages with GC, programmers have no way to recycle a large memory immediately, you can reset all references to the object, you can call GC.collect() to notify the GC to work, but there's no guarantee that the memory will be recycled immediately. It all depends on the complicated algorithm of GC, but unfortunately it is non-deterministic.
So, in an application which needs to manage memory very efficiently such as a server using Gigabytes of memory, it should be able to allocate and release memory in a deterministic manner, in this case GC hurts. The common OOM issue in languages with GC is partly because unreferenced objects pending for being collected still take a lot of memory, finally the system runs into OOM at a point. When OOM happens, it's even harder to resolve than resolving memory issues in languages like C/C++.
GC also brings non-deterministic in latency. In real-time systems such as a stock trading system, latency is usually considered much more important than throughput. The system should keep latency as low as possible, but the timing of GC is non-deterministic, it's not controlled by programmers too. When GC is executing, the latency can be several orders of magnitude higher than that when it's idle. This kind of behavior is not acceptable in real-time systems because it needs to respond to requests with low and deterministic latency.
To sum up, by taking control of the large part of memory management, GC brings both convenience and inconvenience. In applications that memory usage and latency are not issues such as scripts or tools, languages with GC are usually better choices. Otherwise, languages without GC should be the choices. It depends on the application.