STL String and Unicode

Get it from: http://msdn.microsoft.com/zh-cn/magazine/cc188714(en-us).aspx

In one word, in order to use STL string to be adaptive with Unicode, the best way is to define a tstring as follows:
#include<string>
#ifdef _UNICODE
#define tstring wstring
#else
#define tstring string
#endif
using namespace std;
Then we can use tstring everywhere to store _T("XX") thing. Detail, please look into below Q&A. I am so happy the solution is exactly the same as mine before seeing this article.
 
Q: I use the Standard Template Library (STL) std::string class very often in my C++ programs, but I have a problem when it comes to Unicode. When using regular C-style strings I can use TCHAR and the _T macro to write code that compiles for either Unicode or ASCII, but I always find it difficult to get this ASCII/Unicode combination working with the STL string class. Do you have any suggestions?
A: Sure. It's easy, once you know how TCHAR and _T work. The basic idea is that TCHAR is either char or wchar_t, depending on the value of _UNICODE:
// abridged from tchar.h
#ifdef _UNICODE
typedef wchar_t TCHAR;
#define __T(x) L ## x
#else
typedef char TCHAR;
#define __T(x) x
#endif 
When you choose Unicode as the character set in your project settings, the compiler compiles with _UNICODE defined. If you select MBCS (Multi-Byte Character Sets), the compiler builds without _UNICODE. Everything hinges on the value of _UNICODE. Similarly, every Windows® API function that uses char pointers has an A (ASCII) and a W (Wide/Unicode) version, with the real version defined to one of these, based on the value of _UNICODE:
#ifdef UNICODE
#define CreateFile CreateFileW
#else
#define CreateFile CreateFileA
#endif
Likewise, there's _tprintf and _tscanf for printf and scanf. All the 't' versions use TCHARs instead of chars. So how can you apply all this to std::string? Easy. STL already has a wstring class that uses wide characters (defined in the file xstring). Both string and wstring are typedef-ed as template classes using basic_string, which lets you create a string class using any character type. Here's how STL defines string and wstring:
// (from include/xstring)
typedef basic_string<char,
char_traits<char>, allocator<char> >
string;
typedef basic_string<wchar_t,
char_traits<wchar_t>, allocator<wchar_t> >
wstring;
The templates are parameterized by the underlying character type (char or wchar_t), so all you need for a TCHAR version is to mimic the definitions using TCHAR:
typedef basic_string<TCHAR,
char_traits<TCHAR>,
allocator<TCHAR> >
tstring;
Now you have a tstring that's based on TCHAR—that is, either char or wchar_t, depending on the value of _UNICODE. I'm showing you this to point out how STL uses basic_string to implement strings based on any underlying character type. Defining a new typedef isn't the most efficient way to solve your problem. A better way is to simply #define tstring to either string or wstring, like so:
#ifdef _UNICODE
#define tstring wstring
#else
#define tstring string
#endif
This is better because STL already defines string and wstring, so why use templates to create another string class that's the same as one of these, just to call it tstring? You can use #define to define tstring to string or wstring, which will save you from creating another template class (though compilers are getting so smart these days it wouldn't surprise me if the duplicate class were discarded). ng
In any case, once you have tstring, you can write code like this:
   tstring s = _T("Hello, world");
_tprintf(_T("s =%s\n"), s.c_str());

The method basic_string::c_str returns a const pointer to the underlying character type; in this case, that character type is either const char* or const wchar_t* .
Figure 2 shows a simple program I wrote that illustrates tstring. It writes "Hello, world" to a file and reports how many bytes were written. I set the project up so it uses Unicode for the Debug build and MBCS for the Release build. You can compile both builds and run them to compare the results. Figure 3 shows a sample run.
Figure 3 tstring in Action 
By the way, MFC's CString is now married to ATL so that both MFC and ATL use the same string implementation. The combined implementation uses a template class called CStringT that works like STL's basic_string in the sense that it lets you create a CString class based on any underlying character type. The MFC include file afxstr.h defines three string types, like so:
typedef ATL::CStringT<wchar_t,
StrTraitMFC<wchar_t>> CStringW;
typedef ATL::CStringT<char,
StrTraitMFC<char>> CStringA;
typedef ATL::CStringT<TCHAR,
StrTraitMFC<TCHAR>> CString;
CStringW, CStringA, and CString are just what you would expect: wide, ASCII, and TCHAR versions of CString.
So which is better, STL or CStrings? Both classes are fine, and you should use whichever you like best. One issue to consider is which libraries you want to link with and whether you're already using ATL/MFC or not. From a coding perspective, I prefer CString for two features. First, you can initialize a CString from either wide or char strings:
CString s1 = "foo";
CString s2 = _T("bar");
Both initializations work because CString silently performs whatever conversions are necessary. With STL strings, you can't initialize a tstring without using _T() because you can't initialize a wstring from a char* or vice versa. The other feature I like about CString is its automatic conversion operator to LPCTSTR, which lets you write the following:
   CString s;
LPCTSTR lpsz = s;
With STL, on the other hand, you have to explicitly call c_str. This is really nit-picking and some would even argue it's better to know when you're performing a conversion. For example, CStrings can get you in trouble with functions that use C-style variable arguments (varargs), such as printf:
   printf("s=%s\n", s); // Error: thinks s is char*
printf("s=%s\n", (LPCTSTR)s); // required
Without the cast you can get garbage results because printf expects s to be char*. I'm sure many readers have made this error. Preventing this sort of mishap is no doubt one reason the designers of STL chose not to provide a conversion operator, insisting instead that you invoke c_str. In general, the STL folks tend to be a little more academic and purist types, whereas the Redmontonians are a little more practical and loosey-goosey. Hey, whatever. The practical differences between std::string and CString are slim.
posted @ 2009-02-22 19:56  能巴  阅读(1231)  评论(0编辑  收藏  举报