CPPFormatLibary提升效率的优化原理

CPPFormatLibary，以下简称FL，介绍：关于CPPFormatLibary。

与stringstream，甚至C库的sprintf系列想比，FL在速度上都有优势，而且是在支持.net格式化风格的基础上。要实现这一点，需要多种优化结合在一起，有代码技巧方面的，也有设计策略上的。下面简要的对这些内容进行讲解：

1. Pattern缓存

在C库函数sprintf中，比如这行代码：

1          char szBuf[64];
2          sprintf_s(szBuf, "%d--#--%8.2f--#--%s", 100, -40.2f, " String ");

格式化字符串"%d--#--%8.2f--#--%s"在每次函数调用的时候，都需要分析一次，依次找出对应的格式化符，在实际开发过程中，多数情况下，格式化字符串并没有任何不同，因此这个分析属于重复分析。因此在设计FL时将这样的格式化字符串称为PatternList，并使用Hash容器对这个PatternList进行存储，在每次格式化之前，首先在容器中查找对应字符串的Pattern是否已经存在，有的话则直接使用已经分析的结果。

下面的代码是Pattern的定义，PatternList则为对应的数组：

 1         /**
 2         * @brief This is the description of a Format unit
 3         * @example {0} {0:d}
 4         */
 5         template < typename TCharType >
 6         class TFormatPattern
 7         {
 8         public:
 9             typedef TCharType                                    CharType;
10             typedef unsigned char                                ByteType;
11             typedef std::size_t                                  SizeType;
12 
13             enum EFormatFlag
14             {
15                 FF_Raw,
16                 FF_None,
17                 FF_Decimal,
18                 FF_Exponent,
19                 FF_FixedPoint,
20                 FF_General,
21                 FF_CSV,
22                 FF_Percentage,
23                 FF_Hex
24             };
25 
26             enum EAlignFlag
27             {
28                 AF_Right,
29                 AF_Left
30             };
31 
32             TFormatPattern() :
33                 Start((SizeType)-1),
34                 Len(0),
35                 Flag(FF_Raw),
36                 Align(AF_Right),
37                 Index((ByteType)-1),
38                 Precision((ByteType)-1),
39                 Width((ByteType)-1)
40 
41             {
42             }
43 
44             SizeType  GetLegnth() const
45             {
46                 return Len;
47             }
48 
49             bool    IsValid() const
50             {
51                 return Start != -1 && Len != -1 && Index >= 0;
52             }
53 
54             bool    HasWidth() const
55             {
56                 return Width != (ByteType)-1;
57             }
58 
59             bool    HasPrecision() const
60             {
61                 return Precision != (ByteType)-1;
62             }
63 
64         public:
65             SizeType      Start;
66             SizeType      Len;
67             EFormatFlag   Flag;
68             EAlignFlag    Align;
69 
70             ByteType      Index;
71             ByteType      Precision;
72             ByteType      Width;
73         };

这个Pattern就代表了分析格式化字符串的每一个单元。

1         StandardLibrary::FormatTo(str, "{0}--#--{1,8}--#--{2}", 100, -40.2f, " String ");

在这行代码中，PatternList一共有5个Pattern，分别是：

    {0} 参数0

     --#-- 原始类型

    {1,8} 参数1 宽度8

    --#-- 原始类型 纯字符串

    {2} 参数2

这样设计可以优化掉重复的字符串Parse。

2.各种类型到字符串转换的算法优化

这部分代码完全存在于文件Algorithm.hpp中，这里面包含了诸如int、double等转换为字符串的快速算法，实测性能优于sprintf和atoi之类。通过这些基础算法的优化，性能可以得到相当不错的提升。

 1        template < typename TCharType >
 2         inline void StringReverse(TCharType* Start, TCharType* End)
 3         {
 4             TCharType Aux;
 5 
 6             while (Start < End)
 7             {
 8                 Aux = *End;
 9                 *End-- = *Start;
10                 *Start++ = Aux;
11             }
12         }
13 
14         namespace Detail
15         {
16             const char DigitMap[] = 
17             { 
18                 '0', '1', '2', '3', '4', '5', '6',
19                 '7', '8', '9', 'A', 'B', 'C', 'D',
20                 'E', 'F'
21             };
22         }       
23 
24         template < typename TCharType >
25         inline SIZE_T Int64ToString(INT64 Value, TCharType* Buf, INT Base)
26         {
27             assert(Base > 0 && static_cast<SIZE_T>(Base) <= _countof(Detail::DigitMap));
28 
29             TCharType* Str = Buf;
30 
31             UINT64 UValue = (Value < 0) ? -Value : Value;
32 
33             // Conversion. Number is reversed.
34             do
35             {
36                 *Str++ = Detail::DigitMap[UValue%Base];
37             } while (UValue /= Base);
38 
39             if (Value < 0)
40             {
41                 *Str++ = '-';
42             }
43 
44             *Str = '\0';
45 
46             // Reverse string
47             StringReverse<TCharType>(Buf, Str - 1);
48 
49             return Str - Buf;
50         }

上面这段代码展示的是快速整数到字符串的转换。据说基于sse指令的各种转换会更快，然而担心兼容性问题影响跨平台，我并未采用。

3. 栈容器和栈字符串

这部分代码存在于文件Utility.hpp中，这部分代码的优化原理就是在需要的动态内存并不大的时候，直接使用栈内存，当内存需求大到超过一定阀值的时候，自动申请堆内存并将栈数据转移到堆内存上。在大多数情况下，我们需要的内存都是很少，因此在绝大多数情况下，都能起到相当显著的优化效果。

  1         template <
  2             typename T,
  3             INT DefaultLength = 0xFF,
  4             INT ExtraLength = 0
  5         >
  6         class TAutoArray
  7         {
  8         public:
  9             typedef TAutoArray<T, DefaultLength, ExtraLength>   SelfType;
 10 
 11             enum
 12             {
 13                 DEFAULT_LENGTH = DefaultLength
 14             };
 15 
 16             class ConstIterator : Noncopyable
 17             {
 18             public:
 19                 ConstIterator(const SelfType& InRef) :
 20                     Ref(InRef),
 21                     Index( InRef.GetLength()>0?0:-1 )
 22                 {
 23                 }
 24 
 25                 bool IsValid() const
 26                 {
 27                     return Index < Ref.GetLength();
 28                 }
 29 
 30                 bool Next()
 31                 {
 32                     ++Index;
 33 
 34                     return IsValid();
 35                 }
 36 
 37                 const T& operator *() const
 38                 {
 39                     const T* Ptr = Ref.GetDataPtr();
 40 
 41                     return Ptr[Index];
 42                 }
 43             private:
 44                 ConstIterator& operator = (const ConstIterator&);
 45                 ConstIterator(ConstIterator&);
 46             protected:
 47                 const SelfType&   Ref;
 48                 SIZE_T            Index;
 49             };
 50 
 51             TAutoArray() :
 52                 Count(0),
 53                 AllocatedCount(0),
 54                 HeapValPtr(NULL)
 55             {
 56             }
 57 
 58             ~TAutoArray()
 59             {
 60                 ReleaseHeapData();
 61 
 62                 Count = 0;
 63             }
 64 
 65             TAutoArray(const SelfType& Other) :
 66                 Count(Other.Count),
 67                 AllocatedCount(Other.AllocatedCount),
 68                 HeapValPtr(NULL)
 69             {
 70                 if (Count > 0)
 71                 {
 72                     if (Other.IsDataOnStack())
 73                     {
 74                         Algorithm::CopyArray(Other.StackVal, Other.StackVal + Count, StackVal);
 75                     }
 76                     else
 77                     {
 78                         HeapValPtr = Allocate(AllocatedCount);
 79                         Algorithm::CopyArray(Other.HeapValPtr, Other.HeapValPtr + Count, HeapValPtr);
 80                     }
 81                 }
 82             }
 83 
 84             SelfType& operator = (const SelfType& Other)
 85             {
 86                 if (this == &Other)
 87                 {
 88                     return *this;
 89                 }
 90 
 91                 ReleaseHeapData();
 92 
 93                 Count = Other.Count;
 94                 AllocatedCount = Other.AllocatedCount;
 95                 HeapValPtr = NULL;
 96 
 97                 if (Count > 0)
 98                 {
 99                     if (Other.IsDataOnStack())
100                     {
101                         Algorithm::CopyArray(Other.StackVal, Other.StackVal + Count, StackVal);
102                     }
103                     else
104                     {
105                         HeapValPtr = Allocate(AllocatedCount);
106                         Algorithm::CopyArray(Other.HeapValPtr, Other.HeapValPtr + Count, HeapValPtr);
107                     }
108                 }
109 
110                 return *this;
111             }
112 
113             SelfType& TakeFrom(SelfType& Other)
114             {
115                 if (this == &Other)
116                 {
117                     return *this;
118                 }
119 
120                 Count = Other.Count;
121                 AllocatedCount = Other.AllocatedCount;
122                 HeapValPtr = Other.HeapValPtr;
123 
124                 if (Count > 0 && Other.IsDataOnStack())
125                 {
126                     Algorithm::MoveArray(Other.StackVal, Other.StackVal + Count, StackVal);
127                 }
128 
129                 Other.Count = 0;
130                 Other.AllocatedCount = 0;
131                 Other.HeapValPtr = NULL;
132             }
133 
134             void TakeTo(SelfType& Other)
135             {
136                 Other.TakeFrom(*this);
137             }
138 
139 #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
140             TAutoArray( SelfType && Other ) :
141                 Count(Other.Count),
142                 AllocatedCount(Other.AllocatedCount),
143                 HeapValPtr(Other.HeapValPtr)
144             {
145                 if (Count > 0 && Other.IsDataOnStack())
146                 {
147                     Algorithm::MoveArray(Other.StackVal, Other.StackVal + Count, StackVal);
148                 }
149 
150                 Other.Count = 0;
151                 Other.AllocatedCount = 0;
152                 Other.HeapValPtr = NULL;
153             }
154 
155             SelfType& operator = (SelfType&& Other )
156             {
157                 return TakeFrom(Other);
158             }
159 #endif
160 
161             bool  IsDataOnStack() const
162             {
163                 return HeapValPtr == NULL;
164             }
165 
166             void  AddItem(const T& InValue)
167             {
168                 if (IsDataOnStack())
169                 {
170                     if (Count < DEFAULT_LENGTH)
171                     {
172                         StackVal[Count] = InValue;
173                         ++Count;
174                     }
175                     else if (Count == DEFAULT_LENGTH)
176                     {
177                         InitialMoveDataToHeap();
178 
179                         assert(Count < AllocatedCount);
180 
181                         HeapValPtr[Count] = InValue;
182                         ++Count;
183                     }
184                     else
185                     {
186                         assert(false && "internal error");
187                     }
188                 }
189                 else
190                 {
191                     if (Count < AllocatedCount)
192                     {
193                         HeapValPtr[Count] = InValue;
194                         ++Count;
195                     }
196                     else
197                     {
198                         ExpandHeapSpace();
199 
200                         assert(Count < AllocatedCount);
201                         HeapValPtr[Count] = InValue;
202                         ++Count;
203                     }
204                 }
205             }
206 
207             SIZE_T GetLength() const
208             {
209                 return Count;
210             }
211 
212             SIZE_T GetAllocatedCount() const
213             {
214                 return AllocatedCount;
215             }
216 
217             T* GetDataPtr()
218             {
219                 return IsDataOnStack() ? StackVal : HeapValPtr;
220             }
221 
222             const T* GetDataPtr() const
223             {
224                 return IsDataOnStack() ? StackVal : HeapValPtr;
225             }
226 
227             T* GetUnusedPtr()
228             {
229                 return IsDataOnStack() ? StackVal + Count : HeapValPtr + Count;
230             }
231 
232             const T* GetUnusedPtr() const
233             {
234                 return IsDataOnStack() ? StackVal + Count : HeapValPtr + Count;
235             }
236 
237             SIZE_T GetCapacity() const
238             {                
239                 return IsDataOnStack() ?
240                     DEFAULT_LENGTH - Count :
241                     AllocatedCount - Count;
242             }
243             
244             T& operator []( SIZE_T Index )
245             {
246                 assert( Index < GetLength() );
247                 
248                 return GetDataPtr()[Index];
249             }
250             
251             const T& operator []( SIZE_T Index ) const
252             {
253                 assert( Index < GetLength() );
254                 
255                 return GetDataPtr()[Index];
256             }
257 
258         protected:
259             void  InitialMoveDataToHeap()
260             {
261                 assert(HeapValPtr == NULL);
262 
263                 AllocatedCount = DEFAULT_LENGTH * 2;
264 
265                 HeapValPtr = Allocate(AllocatedCount);
266 
267 #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
268                 Algorithm::MoveArray(StackVal, StackVal + Count, HeapValPtr);
269 #else
270                 Algorithm::CopyArray(StackVal, StackVal + Count, HeapValPtr);
271 #endif
272             }
273 
274             void  ExpandHeapSpace()
275             {
276                 SIZE_T NewCount = AllocatedCount * 2;
277                 assert(NewCount > AllocatedCount);
278 
279                 T* DataPtr = Allocate(NewCount);
280 
281                 assert(DataPtr);
282 
283 #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
284                 Algorithm::MoveArray(HeapValPtr, HeapValPtr + Count, DataPtr);
285 #else
286                 Algorithm::CopyArray(HeapValPtr, HeapValPtr + Count, DataPtr);
287 #endif
288                 ReleaseHeapData();
289 
290                 HeapValPtr = DataPtr;
291                 AllocatedCount = NewCount;
292             }
293 
294             void  ReleaseHeapData()
295             {
296                 if (HeapValPtr)
297                 {
298                     delete[] HeapValPtr;
299                     HeapValPtr = NULL;
300                 }
301 
302                 AllocatedCount = 0;
303             }
304 
305             static T*  Allocate(const SIZE_T InAllocatedCount)
306             {
307                 // +ExtraLength this is a hack method for saving string on it.
308                 return new T[InAllocatedCount + ExtraLength];
309             }
310 
311         protected:
312             SIZE_T        Count;
313             SIZE_T        AllocatedCount;
314 
315             // +ExtraLength this is a hack method for saving string on it.
316             T             StackVal[DEFAULT_LENGTH + ExtraLength];
317 
318             T*            HeapValPtr;
319         };

上面这段代码展示的就是这个设想的实现。这是一个模板类，基于这个类实现了栈字符串。同时默认的PatternList也是使用这个模板来保存的，这样在节约了大量的内存分配操作之后，性能得到进一步的提升。

  1 // String Wrapper
  2         template < typename TCharType >
  3         class TAutoString :
  4             public TAutoArray< TCharType, 0xFF, 2 >
  5         {
  6         public:
  7             typedef TAutoArray< TCharType, 0xFF, 2 > Super;
  8             typedef Mpl::TCharTraits<TCharType>      CharTraits;
  9             typedef TCharType                        CharType;
 10 
 11 #if !FL_COMPILER_MSVC
 12             using Super::Count;
 13             using Super::AllocatedCount;
 14             using Super::HeapValPtr;
 15             using Super::StackVal;
 16             using Super::Allocate;
 17             using Super::IsDataOnStack;
 18             using Super::DEFAULT_LENGTH;
 19             using Super::GetDataPtr;
 20             using Super::ReleaseHeapData;
 21 #endif
 22 
 23             TAutoString()
 24             {
 25             }
 26 
 27             TAutoString(const CharType* pszStr)
 28             {
 29                 if (pszStr)
 30                 {
 31                     const SIZE_T Length = CharTraits::length(pszStr);
 32 
 33                     Count = Length;
 34 
 35                     if (Length <= DEFAULT_LENGTH)
 36                     {
 37                         CharTraits::copy(pszStr, pszStr + Length, StackVal);
 38                         StackVal[Count] = 0;
 39                     }
 40                     else
 41                     {
 42                         HeapValPtr = Allocate(Length);
 43                         CharTraits::copy(pszStr, pszStr + Length, HeapValPtr);
 44                         HeapValPtr[Count] = 0;
 45                     }
 46                 }
 47             }
 48 
 49             void  AddChar(CharType InValue)
 50             {
 51                 AddItem(InValue);
 52 
 53                 if (IsDataOnStack())
 54                 {
 55                     StackVal[Count] = 0;
 56                 }
 57                 else
 58                 {
 59                     HeapValPtr[Count] = 0;
 60                 }
 61             }
 62 
 63             void AddStr(const CharType* pszStart, const CharType* pszEnd = NULL)
 64             {
 65                 const SIZE_T Length = pszEnd ? pszEnd - pszStart : CharTraits::length(pszStart);
 66 
 67                 if (IsDataOnStack())
 68                 {
 69                     if (Count + Length <= DEFAULT_LENGTH)
 70                     {
 71                         CharTraits::copy(StackVal+Count, pszStart, Length);
 72                         Count += Length;
 73 
 74                         StackVal[Count] = 0;
 75                     }
 76                     else
 77                     {
 78                         assert(!HeapValPtr);
 79 
 80                         AllocatedCount = static_cast<SIZE_T>((Count + Length)*1.5f);
 81                         HeapValPtr = Allocate(AllocatedCount);
 82                         assert(HeapValPtr);
 83 
 84                         if (Count > 0)
 85                         {
 86                             CharTraits::copy(HeapValPtr, StackVal, Count);
 87                         }
 88 
 89                         CharTraits::copy(HeapValPtr+Count, pszStart, Length);
 90 
 91                         Count += Length;
 92 
 93                         HeapValPtr[Count] = 0;
 94                     }
 95                 }
 96                 else
 97                 {
 98                     if (Count + Length <= AllocatedCount)
 99                     {
100                         CharTraits::copy(HeapValPtr+Count, pszStart, Length);
101                         Count += Length;
102 
103                         HeapValPtr[Count] = 0;
104                     }
105                     else
106                     {
107                         SIZE_T NewCount = static_cast<SIZE_T>((Count + Length)*1.5f);
108 
109                         CharType* DataPtr = Allocate(NewCount);
110 
111                         if (Count > 0)
112                         {
113                             CharTraits::copy(DataPtr, HeapValPtr, Count);
114                         }
115 
116                         ReleaseHeapData();
117 
118                         CharTraits::copy(DataPtr, pszStart, Length);
119 
120                         Count += Length;
121 
122                         AllocatedCount = NewCount;
123                         HeapValPtr = DataPtr;
124 
125                         HeapValPtr[Count] = 0;
126                     }
127                 }
128             }
129 
130             const TCharType* CStr() const
131             {
132                 return GetDataPtr();
133             }
134 
135             // is is a internal function
136             // 
137             void InjectAdd(SIZE_T InCount)
138             {
139                 Count += InCount;
140 
141                 assert(IsDataOnStack() ? (Count <= DEFAULT_LENGTH) : (Count < AllocatedCount));
142             }
143 
144         protected:
145             void  AddItem(const TCharType& InValue)
146             {
147                 Super::AddItem(InValue);
148             }
149         };

上面展示的是基于栈内存容器实现的栈字符串，在大多数情况下，我们格式化字符串时都采用栈字符串来保存结果，这样可以显著的提升性能。

同时栈容器和栈字符串，都特别适合于当临时容器和临时字符串，因为多数情况下它们都优化掉了可能需要动态内存分配的操作。所以它们的使用并不局限于这一个小地方。

4. 基于C++ 11的优化

除了引入了C++ 11的容器unordered_map之外，还引入了右值引用等新内容，在某些情况下，可以带来一定的性能提升。

 1 #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
 2             TAutoArray( SelfType && Other ) :
 3                 Count(Other.Count),
 4                 AllocatedCount(Other.AllocatedCount),
 5                 HeapValPtr(Other.HeapValPtr)
 6             {
 7                 if (Count > 0 && Other.IsDataOnStack())
 8                 {
 9                     Algorithm::MoveArray(Other.StackVal, Other.StackVal + Count, StackVal);
10                 }
11 
12                 Other.Count = 0;
13                 Other.AllocatedCount = 0;
14                 Other.HeapValPtr = NULL;
15             }
16 
17             SelfType& operator = (SelfType&& Other )
18             {
19                 return TakeFrom(Other);
20             }
21 #endif

上面展示的是基于右值引用的优化。

除此之外还是用了线程局部存储(TLS)，这依赖于编译器是否支持。前面提到了我们采用Hash容器来存储Pattern缓存，然而在单线程的时候自然无需多余考虑，当需要支持多线程时，则全局唯一的Hash容器的访问都需要加锁，而加锁是有性能开销的。幸好C++ 11带来了内置的TLS支持，其结果就是每个线程会独立保存一份这样的Pattern缓存，因此无需对其访问加锁，这样无疑效率会更高。缺陷则是会损失部分内存。所有的这些都可以通过预先的宏定义来进行开关，使用者可以自行决定使用TLS还是Lock，或者不支持多线程。

 1         template < typename TPolicy >
 2         class TGlobalPatternStorage : 
 3             public TPatternStorage<TPolicy>
 4         {
 5         public:
 6             static TGlobalPatternStorage* GetStorage()
 7             {
 8 #if FL_WITH_THREAD_LOCAL
 9                 struct ManagedStorage
10                 {
11                     typedef Utility::TScopedLocker<System::CriticalSection>  LockerType;
12                     
13                     System::CriticalSection                                  ManagedCS;
14                     Utility::TAutoArray<TGlobalPatternStorage*>              Storages;
15                     
16                     ~ManagedStorage()
17                     {
18                         LockerType Locker(ManagedCS);
19                         
20                         for( SIZE_T i=0; i<Storages.GetLength(); ++i )
21                         {
22                             delete Storages[i];
23                         }
24                     }
25                     
26                     void AddStorage( TGlobalPatternStorage* Storage )
27                     {
28                         assert(Storage);
29                         
30                         LockerType Locker(ManagedCS);
31                         
32                         Storages.AddItem(Storage);
33                     }
34                 };
35                 
36                 static ManagedStorage StaticManager;
37                 
38                 static FL_THREAD_LOCAL TGlobalPatternStorage* StaticStorage = NULL;
39                 
40                 if( !StaticStorage )
41                 {
42                     StaticStorage = new TGlobalPatternStorage();
43                     
44                     StaticManager.AddStorage(StaticStorage);
45                 }
46                 
47                 return StaticStorage;
48 #else
49                 static TGlobalPatternStorage StaticStorage;
50                 return &StaticStorage;
51 #endif
52             }
53         };

如上所示为项目中使用TLS的代码。

总结

在将这一系列的优化结合起来之后，可以使得FL的整体效率处于较高水平，不低于C库函数，同时还具备其它格式化库不具备的功能，对于代码安全性等各方面的增强，都有帮助。下面是Test.cpp的测试结果，FL代表的是使用FL库的耗时，CL代表的C库的耗时，同时此测试模拟了多线程环境。

Windows Visual Studio 2013 Release下的输出：

0x64
Test20, -10.0050,  X ,  X
0x64
Test20, -10.0050,  X ,  X
1920 FLElapse:0.0762746
1920 CLElapse:0.269722
1636 FLElapse:0.0756153
7732 FLElapse:0.0766446
7956 FLElapse:0.0762051
7956 CLElapse:0.285714
1636 CLElapse:0.288648
7732 CLElapse:0.289193

Mac Xcode Release:

99
Test20, -10.0050,  X ,  X 
18446744073709551615 FLElapse:0.0901681
18446744073709551615 CLElapse:0.19329
18446744073709551615 FLElapse:0.147378
18446744073709551615 FLElapse:0.150375
18446744073709551615 FLElapse:0.153342
18446744073709551615 CLElapse:0.303508
18446744073709551615 CLElapse:0.308418
18446744073709551615 CLElapse:0.307407

这并非完全的测试，更多的测试需要在实际使用过程中来验证。