Effective C++ 笔记 —— Item 50: Understand when it makes sense to replace new and delete.
Let's return to fundamentals for a moment. Why would anybody want to replace the compiler-provided versions of operator new or operator delete in the first place? These are three of the most common reasons:
- To detect usage errors. Failure to delete memory conjured up by new leads to memory leaks. Using more than one delete on newed memory yields undefined behavior. If operator new keeps a list of allocated addresses and operator delete removes addresses from the list, it's easy to detect such usage errors. Similarly, a variety of programming mistakes can lead to data overruns (writing beyond the end of an allocated block) and underruns (writing prior to the beginning of an allocated block). Custom operator news can overallocate blocks so there's room to put known byte patterns ("signatures") before and after the memory made available to clients. operator deletes can check to see if the signatures are still intact. If they're not, an overrun or underrun occurred sometime during the life of the allocated block, and operator delete can log that fact, along with the value of the offending pointer.
- To improve efficiency. The versions of operator new and operator delete that ship with compilers are designed for general-purpose use. They have to be acceptable for long-running programs (e.g., web servers), but they also have to be acceptable for programs that execute for less than a second. They have to handle series of requests for large blocks of memory, small blocks, and mixtures of the two. They have to accommodate allocation patterns ranging from the dynamic allocation of a few blocks that exist for the duration of the program to constant allocation and deallocation of a large number of short-lived objects. They have to worry about heap fragmentation, a process that, if unchecked, eventually leads to the inability to satisfy requests for large blocks of memory, even when ample free memory is distributed across many small blocks. Given the demands made on memory managers, it's no surprise that the operator news and operator deletes that ship with compilers take a middle-of-the-road strategy. They work reasonably well for everybody, but optimally for nobody. If you have a good understanding of your program's dynamic memory usage patterns, you can often find that custom versions of operator new and operator delete outperform the default ones. By "outperform," I mean they run faster — sometimes orders of magnitude faster — and they require less memory — up to 50% less. For some (though by no means all) applications, replacing the stock new and delete with custom versions is an easy way to pick up significant performance improvements.
- To collect usage statistics. Before heading down the path of writing custom news and deletes, it's prudent to gather information about how your software uses its dynamic memory. What is the distribution of allocated block sizes? What is the distribution of their lifetimes? Do they tend to be allocated and deallocated in FIFO ("first in, first out") order, LIFO ("last in, first out") order, or something closer to random order? Do the usage patterns change over time, e.g., does your software have different allocation/deallocation patterns in different stages of execution? What is the maximum amount of dynamically allocated memory in use at any one time (i.e., its "high water mark")? Custom versions of operator new and operator delete make it easy to collect this kind of information.
In concept, writing a custom operator new is pretty easy. For example, here's a quick first pass at a global operator new that facilitates the detection of under- and overruns. There are a lot of little things wrong with it, but we'll worry about those in a moment.
static const int signature = 0xDEADBEEF; typedef unsigned char Byte; // this code has several flaws — see below void* operator new(std::size_t size) throw(std::bad_alloc) { using namespace std; size_t realSize = size + 2 * sizeof(int); // increase size of request so 2 signatures will also fit inside void *pMem = malloc(realSize); // call malloc to get the actual memory if (!pMem) throw bad_alloc(); // write signature into first and last parts of the memory *(static_cast<int*>(pMem)) = signature; *(reinterpret_cast<int*>(static_cast<Byte*>(pMem) + realSize - sizeof(int))) = signature; // return a pointer to the memory just past the first signature return static_cast<Byte*>(pMem) + sizeof(int); }
Many computer architectures require that data of particular types be placed in memory at particular kinds of addresses. For example, an architecture might require that pointers occur at addresses that are a multiple of four (i.e., be four-byte aligned) or that doubles must occur at addresses that are a multiple of eight (i.e., be eight-byte aligned). Failure to follow such constraints could lead to hardware exceptions at runtime. Other architectures are more forgiving, though they may offer better performance if alignment preferences are satisfied. For example, doubles may be aligned on any byte boundary on the Intel x86 architecture, but access to them is a lot faster if they are eight-byte aligned.
Alignment is relevant here, because C++ requires that all operator news return pointers that are suitably aligned for any data type. malloc labors under the same requirement, so having operator new return a pointer it gets from malloc is safe. However, in operator new above, we're not returning a pointer we got from malloc, we're returning a pointer we got from malloc offset by the size of an int. There is no guarantee that this is safe! If the client called operator new to get enough memory for a double (or, if we were writing operator new[], an array of doubles) and we were running on a machine where ints were four bytes in size but doubles were required to be eight-byte aligned, we'd probably return apointer with improper alignment. That might cause the program to crash. Or it might just cause it to run more slowly.
The topic of this Item is knowing when it can make sense to replace the default versions of new and delete, either globally or on a per-class basis. We’re now in a position to summarize when in more detail than we did before.
- To detect usage errors (as above).
- To collect statistics about the use of dynamically allocated memory (also as above).
- To increase the speed of allocation and deallocation. General-purpose allocators are often (though not always) a lot slower than custom versions, especially if the custom versions are designed for objects of a particular type. Class-specific allocators are an example application of fixed-size allocators such as those offered by Boost's Pool library. If your application is single-threaded, but your compilers' default memory management routines are threadsafe, you may be able to win measurable speed improvements by writing thread-unsafe allocators. Of course, before jumping to the conclusion that operator new and operator delete are worth speeding up, be sure to profile your program to confirm that these functions are truly a bottleneck.
- To reduce the space overhead of default memory management. General-purpose memory managers are often (though not always) not just slower than custom versions, they often use more memory, too. That's because they often incur some overhead for each allocated block. Allocators tuned for small objects (such as those in Boost's Pool library) essentially eliminate such overhead.
- To compensate for suboptimal alignment in the default allocator. As I mentioned earlier, it's fastest to access doubles on the x86 architecture when they are eight-byte aligned. Alas, the operator news that ship with some compilers don't guarantee eight-byte alignment for dynamic allocations of doubles. In such cases, replacing the default operator new with one that guarantees eight-byte alignment could yield big increases in program performance.
- To cluster related objects near one another. If you know that particular data structures are generally used together and you'd like to minimize the frequency of page faults when working on the data, it can make sense to create a separate heap for the data structures so they are clustered together on as few pages as possible. Placement versions of new and delete (see Item 52) can make it possible to achieve such clustering.
- To obtain unconventional behavior. Sometimes you want operators new and delete to do something that the compiler-provided versions don't offer. For example, you might want to allocate and deallocate blocks in shared memory, but have only a C API through which to manage that memory. Writing custom versions of new and delete (probably placement versions — again, see Item 52) would allow you to drape the C API in C++ clothing. As another example, you might write a custom operator delete that overwrites deallocated memory with zeros in order to increase the security of application data.
Things to Remember
- There are many valid reasons for writing custom versions of new and delete, including improving performance, debugging heap usage errors, and collecting heap usage information.