Pimpls - Beauty Marks You Can Depend On
Managing dependencies well is an essential part of writing solid code. As I've argued before,[1] C++'s greatest strength is that it supports two powerful methods of abstraction: object-oriented programming and generic programming. Both are fundamentally tools to help manage dependencies. In fact, all of the common OO/generic buzzwords -- encapsulation, polymorphism, type independence --, and all of the design patterns I know of, really describe ways to manage interdependencies within a software system.
When we talk about dependencies, we usually think of runtime dependencies like class interactions. In this column, I'll focus instead on how to analyze and manage compile-time dependencies.
A Header That Could Use Some Work
In C++, when anything in a header file changes, all code that includes that header either directly or indirectly must be recompiled. To show how to reduce this kind of dependency, I'll present an example header and show how it can be improved step by step. Along the way I'll examine and apply three major ways to reduce compile-time dependencies:
o Avoid gratuitous #includes. Use forward declarations whenever a definition isn't required. (This may sound obvious to experienced programmers, but it's trickier than you might think for templates.)
o Avoid unnecessary membership. Use the Pimpl Idiom to fully hide a class' private implementation details.
o Avoid unnecessary inheritance.
So let's begin: Here is the initial version of a "problem" header file. Before reading on, take a little time to look at it and decide how it could be improved. Note: The comments are important!
// x.h: original header // #include <iostream> #include <ostream> #include <list> // None of A, B, C, D or E are templates. // Only A and C have virtual functions. #include "a.h" // class A #include "b.h" // class B #include "c.h" // class C #include "d.h" // class D #include "e.h" // class E class X : public A, private B { public: X( const C& ); B f( int, char* ); C f( int, C ); C& g( B ); E h( E ); virtual std::ostream& print( std::ostream& ) const; private: std::list<C> clist_; D d_; }; inline std::ostream& operator<<( std::ostream& os, const X& x ) { return x.print(os); }
Do you see a few things you'd do differently?
Remove Gratuitous Headers, Use Forward Declarations
Right off the bat, x.h is clearly including far too many other headers. This is a Bad Thing, because it means that every client that includes x.h is also forced to include all of the other headers mentioned in x.h. While this probably isn't much of an overhead for a relatively small standard header like list, it could be a substantial overhead for class headers like c.h (after all, who knows what else gets pulled in by c.h?).
Of the first two standard headers mentioned in x.h, one can be immediately removed because it's not needed at all, and the second can be replaced with a smaller header:
o Remove iostream. Many programmers #include <iostream> purely out of habit as soon as they see anything resembling a stream nearby. X does make use of streams, that's true; but it doesn't mention anything specifically from iostream. At the most, X needs ostream alone, and even that can be whittled down:
o Replace ostream with iosfwd. Parameter and return types only need to be forward-declared, so instead of the full definition of ostream we really only need its forward declaration. In the old days, you could just replace "#include <ostream>" with "class ostream;" in this situation, because ostream used to be a class. Alas, no more -- ostream is now typedef'd as basic_ostream<char>, and that basic_ostream template gets a bit messy to forward-declare. All is not lost, though; the standard library helpfully provides the header iosfwd, which contains forward declarations for all of the stream templates (including basic_ostream) and their standard typedefs (including ostream). So all we need to do is replace "#include <ostream>" with "#include <iosfwd>".[2]
There, that was easy. We can...
... what? "Not so fast!" I hear some of you say. "This header does a lot more with ostream than just mention it as a parameter or return type. The inlined operator<< actually uses an ostream object! So it must need ostream's definition, right?"
That's a reasonable question. Happily, the answer is: No, it doesn't. Consider again the function in question:
inline std::ostream& operator<<( std::ostream& os, const X& x ) { return x.print(os); }
This function mentions an ostream as both a parameter and a return type (which most people know doesn't require a definition), and it passes its ostream parameter in turn as a parameter to another function (which many people don't know doesn't require a definition either). As long as that's all we're doing with the ostream object, there's no need for a full ostream definition. Of course, we would need the full definition if we tried to call any member functions, for example, but we're not doing anything like that here.
So, as I was saying, we can only get rid of one of the other headers just yet:
o Replace e.h with a forward declaration. E is just being mentioned as a parameter and as a return type, so no definition is required and x.h shouldn't be pulling in e.h in the first place. All we need to do is replace "#include "e.h"" with "class E;".
o Leave a.h and b.h (for now). We can't get rid of these because X inherits from both A and B, and you always have to have full definitions for base classes so that the compiler can determine X's object size, virtual functions, and other fundamentals. (Can you anticipate how to remove one of these? Think about it: Which one can you remove, and why/how? The answer will come shortly.)
o Leave list, c.h and d.h (for now). We can't get rid of these right away because a list<C> and a D appear as private data members of X. Although C appears as neither a base class nor a member, it is being used to instantiate the list member, and most current compilers require that when you instantiate list<C> you be able to see the definition of C.
Here's how the header looks after this initial cleanup pass:
// x.h: sans gratuitous headers // #include <iosfwd> #include <list> // None of A, B, C or D are templates. // Only A and C have virtual functions. #include "a.h" // class A #include "b.h" // class B #include "c.h" // class C #include "d.h" // class D class E; class X : public A, private B { public: X( const C& ); B f( int, char* ); C f( int, C ); C& g( B ); E h( E ); virtual std::ostream& print( std::ostream& ) const; private: std::list<C> clist_; D d_; }; inline std::ostream& operator<<( std::ostream& os, const X& x ) { return x.print(os); }
This isn't bad, but we can still do quite a bit better.
The Beauty of Pimpls
C++ lets us easily encapsulate the private parts of a class from unauthorized access. Unfortunately, because of the header file approach inherited from C, it can take a little more work to encapsulate dependencies on a class' privates. "But," you say, "the whole point of encapsulation is that the client code shouldn't have to know or care about a class' private implementation details, right?" Right, and in C++ the client code doesn't need to know or care about access to a class' privates (because unless it's a friend it isn't allowed any), but because the privates are visible in the header the client code does have to depend upon any types they mention.
How can we better insulate clients from a class' private implementation details? One good way is to use a special form of the handle/body idiom[3] (what I call the Pimpl Idiom because of the intentionally pronounceable "pimpl_" pointer[4]) as a compilation firewall.[5] [6] [7]
A "pimpl" is just an opaque pointer used to hide the private members of a class. That is, instead of writing:
// file x.h class X { // public and protected members private: // private members; whenever these change, // all client code must be recompiled };
We write instead:
// file x.h class X { // public and protected members private: class XImpl* pimpl_; // a pointer to a forward-declared class }; // file x.cpp struct XImpl { // private members; fully hidden, can be // changed at will without recompiling clients };
(Yes, it's legal to forward-declare XImpl as a class and then define it as a struct.)
Every X object dynamically its XImpl object. If you think of an object as a physical block, we've essentially lopped off a large chunk of the block and in its place left only "a little bump on the side" -- the opaque pointer, or "pimpl."
The major advantages of this idiom come from the fact that it breaks compile-time dependencies:
1. Types mentioned only in a class' implementation need no longer be defined for client code, which can eliminate extra #includes and improve compile speeds.
2. A class' implementation can be changed -- that is, private members can be freely added or removed -- without recompiling client code.
The major costs of this idiom are in performance:
1. Each construction/destruction must allocate/deallocate memory.
2. Each access of a hidden member can require at least one extra indirection. (If the hidden member being accessed itself uses a back pointer to call a function in the visible class, there will be multiple indirections.)
I'll talk more about these and other pimpl issues in my next column. For now, in our example, there were three headers whose definitions were needed simply because they appeared as private members of X. If we instead restructure X to use a pimpl, one of these headers (c.h) can be replaced with a forward declaration because C is still being mentioned elsewhere as a parameter or return type, and the other two (list and d.h) can disappear completely:
// x.h: after converting to use a pimpl // #include <iosfwd> #include "a.h" // class A (has virtual functions) #include "b.h" // class B (has no virtual functions) class C; class E; class X : public A, private B { public: X( const C& ); B f( int, char* ); C f( int, C ); C& g( B ); E h( E ); virtual std::ostream& print( std::ostream& ) const; private: class XImpl* pimpl_; // opaque pointer to forward-declared class }; inline std::ostream& operator<<( std::ostream& os, const X& x ) { return x.print(os); } The private details go into X's implementation file where client code never sees them and therefore never depends upon them: // Implementation file x.cpp // struct XImpl { std::list<C> clist_; D d_; };
Remove Unnecessary Inheritance
In my experience, many C++ programmers still seem to march to the "It isn't OO unless you inherit!" battle hymn, by which I mean that they use inheritance more than necessary. I'll save the whole lecture for another time and place, but my bottom line is simply that inheritance (including but not limited to IS-A) is a much stronger relationship than HAS-A or USES-A. When it comes to managing dependencies, therefore, you should always prefer composition/membership over inheritance wherever possible. To paraphrase a well-known mathematician: 'Use as strong a relationship as necessary, but no stronger.'
In this example, X is derived publicly from A and privately from B. Recall that public inheritance should always model IS-A and satisfy the Liskov Substitution Principle (LSP). [8] In this case X IS-A A and there's naught wrong with it, so we'll leave that as it is. But did you notice the interesting thing about B? The interesting thing is this: B is a private base class of X, but B has no virtual functions. Now, the only reason you would choose private inheritance over composition/membership is to gain access to protected members -- which most of the time means "to override a virtual function." Since B has none such, there's probably no reason to prefer the stronger relationship of inheritance.[9] Instead, X should probably have just a plain member of type B. Since that member should be private, and to get rid of the b.h header entirely, this member should live in X's hidden pimpl_ portion.
// x.h: after removing unnecessary inheritance // #include <iosfwd> #include "a.h" // class A class C; class E; class X : public A { public: X( const C& ); B f( int, char* ); C f( int, C ); C& g( B ); E h( E ); virtual std::ostream& print( std::ostream& ) const; private: class XImpl* pimpl_; // this now quietly includes a B }; inline std::ostream& operator<<( std::ostream& os, const X& x ) { return x.print(os); }
The Bottom Line
x.h is still using other class names all over the place, but clients of X need only pay for the #includes of a.h and iosfwd. What an improvement over the original!
In the next column, I'll conclude my focus on the Pimpl Idiom. I'll analyze how it can best be used, then demonstrate how to overcome its main disadvantage.
Notes
1. Sutter H., "C++ State of the Union" (C++ Report, January 1998).
2. Once you see iosfwd, you might think that the same trick would work for other standard library templates like list and string. However, there are no comparable "stringfwd" or "listfwd" standard headers. The iosfwd header was created to give streams special treatment for backwards compatibility, to avoid breaking code written in years past for the "old" non-templated version of the iostreams subsystem.
3. J. Coplien. Advanced C++ Programming Styles and Idioms (Addison-Wesley, 1992).
4. I always used to write impl_. The eponymous pimpl_ was actually coined several years ago by friend and colleague Jeff Sumner, due in equal parts to a penchant for Hungarian-style "p" prefixes for pointer variables and an occasional taste for horrid puns.
5. J. Lakos. Large-Scale C++ Software Design (Addison-Wesley, 1996).
6. S. Meyers. Effective C++, 2nd edition (Addison-Wesley, 1998).
7. R. Murray. C++ Strategies and Tactics (Addison-Wesley, 1993).
8. For lots of good discussion about applying the LSP, see the papers available online at www.oma.com, and the book Designing Object-Oriented C++ Applications Using the Booch Method by Robert C. Martin (Prentice-Hall, 1995). Yes, Bob is now also the editor of this magazine, but I've been recommending the papers and the book since long before that.
9. Unless X needs access to some protected function or data in B, of course, but for now I'll assume that this is not the case.