(转)What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?

原文地址:http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc

Many C++ Windows programmers get confused over what bizarre identifiers like TCHAR,LPCTSTR are. In this article, I would attempt by best to clear out the fog.

In general, a character can be represented in 1 byte or 2 bytes. Let's say 1-byte character is ANSI character - all English characters are represented through this encoding. And let's say a 2-byte character is Unicode, which can represent ALL languages in the world. 

Visual C++ compiler supports char and wchar_t as native data-types for ANSI and Unicode characters respectively. Though there is more concrete definition of Unicode, but for understanding assume it as two-byte character which Windows OS uses for multiple language support.

What if you want your C/C++ code to be independent of character encoding/mode used? 
Suggestion: Use generic data-types and names to represent characters and string.

For example, instead of replacing:

char cResponse; // 'Y' or 'N'
char sUsername[64];
// str* functions

with

wchar_t cResponse; // 'Y' or 'N'
wchar_t sUsername[64];
// wcs* functions

In order to support multi-lingual (i.e. Unicode) in your language, you can simply code it in more generic manner:

 

#include<TCHAR.H> // Implicit or explicit include
TCHAR cResponse; // 'Y' or 'N'
TCHAR sUsername[64];
// _tcs* functions

The following project setting in General page describes which Character Set is to be used for compilation:
(General -> Character Set)

This way, when your project is being compiled as Unicode, the TCHAR would translate to wchar_t. If it is being compiled as ANSI/MBCS, it would be translated to char. You are free to use char and wchar_t, and project settings will not affect any direct use of these keywords.

TCHAR is defined as:

#ifdef _UNICODE
typedef wchar_t TCHAR;
#else
typedef char TCHAR;
#endif

The macro _UNICODE is defined when you set Character Set to "Use Unicode Character Set", and therefore TCHARwould mean wchar_t. When Character Set if set to "Use Multi-Byte Character Set", TCHAR would mean char.

Likewise, to support multiple character-set using single code base, and possibly supporting multi-language, use specific functions (macros). Instead of using strcpystrlenstrcat (including the secure versions suffixed with_s); or wcscpywcslenwcscat (including secure), you should better use use _tcscpy_tcslen_tcscatfunctions.

As you know strlen is prototyped as:

size_t strlen(const char*);

And, wcslen is prototyped as:

size_t wcslen(const wchar_t* );

You may better use _tcslen, which is logically prototyped as:

size_t _tcslen(const TCHAR* );

WC is for Wide Character. Therefore, wcs turns to be wide-character-string. This way, _tcs would mean _T Character String. And you know _T may be char or what_t, logically.

But, in reality, _tcslen (and other _tcs functions) are actually not functions, but macros. They are defined simply as:

#ifdef _UNICODE
#define _tcslen wcslen 
#else
#define _tcslen strlen
#endif

You should refer TCHAR.H to lookup more macro definitions like this.

You might ask why they are defined as macros, and not implemented as functions instead? The reason is simple: A library or DLL may export a single function, with same name and prototype (Ignore overloading concept of C++). For instance, when you export a function as:

void _TPrintChar(char);

How the client is supposed to call it as?

void _TPrintChar(wchar_t);

_TPrintChar cannot be magically converted into function taking 2-byte character. There has to be two separate functions:

void PrintCharA(char); // A = ANSI 
void PrintCharW(wchar_t); // W = Wide character

And a simple macro, as defined below, would hide the difference:

#ifdef _UNICODE
void _TPrintChar(wchar_t); 
#else 
void _TPrintChar(char);
#endif

The client would simply call it as:

TCHAR cChar;
_TPrintChar(cChar);

Note that both TCHAR and _TPrintChar would map to either Unicode or ANSI, and therefore cChar and the argument to function would be either char or wchar_t.

Macros do avoid these complications, and allows us to use either ANSI or Unicode function for characters and strings. Most of the Windows functions, that take string or a character are implemented this way, and for programmers convenience, only one function (a macro!) is good. SetWindowText is one example:

// WinUser.H
#ifdef UNICODE
#define SetWindowText  SetWindowTextW
#else
#define SetWindowText  SetWindowTextA
#endif // !UNICODE

There are very few functions that do not have macros, and are available only with suffixed W or A. One example isReadDirectoryChangesW, which doesn't have ANSI equivalent.

 


You all know that we use double quotation marks to represent strings. The string represented in this manner is ANSI-string, having 1-byte each character. Example:

 

"This is ANSI String. Each letter takes 1 byte."

The string text given above is not Unicode, and would be quantifiable for multi-language support. To represent Unicode string, you need to use prefix L. An example:

L"This is Unicode string. Each letter would take 2 bytes, including spaces."

Note the L at the beginning of string, which makes it a Unicode string. All characters (I repeat all characters) would take two bytes, including all English letters, spaces, digits, and the null character. Therefore, length of Unicode string would always be in multiple of 2-bytes. A Unicode string of length 7 characters would need 14 bytes, and so on. Unicode string taking 15 bytes, for example, would not be valid in any context.

In general, string would be in multiple of sizeof(TCHAR) bytes!

When you need to express hard-coded string, you can use:

"ANSI String"; // ANSI
L"Unicode String"; // Unicode

_T("Either string, depending on compilation"); // ANSI or Unicode
// or use TEXT macro, if you need more readability

The non-prefixed string is ANSI string, the L prefixed string is Unicode, and string specified in _T or TEXT would be either, depending on compilation.

String classes, like MFC/ATL's CString implement two versions using macro. There are two classes named CStringA for ANSI, CStringW for Unicode. When you use CString (which is a macro/typedef), it translates to either of two classes. Okay. The TCHAR type-definition was for a single character. You can definitely declare an array of TCHAR. What if you want to express a character-pointer, or a const-character-pointer - Which one of the following?

// ANSI characters
foo_ansi(char*);
foo_ansi(const char*);
/*const*/ char* pString;
 
// Unicode/wide-string
foo_uni(WCHAR*); // or wchar_t*
foo_uni(const WCHAR*);
/*const*/ WCHAR* pString;
 
// Independent 
foo_char(TCHAR*);
foo_char(const TCHAR*);
/*const*/ TCHAR* pString;
After reading about TCHAR stuff, you'd definitely select the last one as your choice. But here is a better alternative. Before that, note that TCHAR.H header file declares only TCHAR datatype and for the following stuff, you need to include Windows.h (defined in WinNT.h).

NOTE: If your project implicitly or explicitly includes Windows.h, you need not include TCHAR.H

  • char* replacement: LPSTR
  • const char* replacement: LPCSTR
  • WCHAR* replacement: LPWSTR
  • const WCHAR* replacement: LPCWSTR (C before W, since const is before WCHAR)
  • TCHAR* replacement: LPTSTR
  • const TCHAR* replacement: LPCTSTR
Now, I hope you understand the following signatures:
BOOL SetCurrentDirectory( LPCTSTR lpPathName );
DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);
Continuing. You must have seen some functions/methods asking you to pass number of characters, or returning the number of characters. Well, like GetCurrentDirectory, you need to pass number of characters, and not number of bytes. For example::
TCHAR sCurrentDir[255];
 
// Pass 255 and not 255*2 
GetCurrentDirectory(sCurrentDir, 255);
On the other side, if you need to allocate number or characters, you must allocate proper number of bytes. In C++, you can simply use new:
LPTSTR pBuffer; // TCHAR* 

pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.
But if you use memory allocation functions like mallocLocalAllocGlobalAlloc, etc; you must specify the number of bytes!
pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );
Typecasting the return value is required, as you know. The expression in malloc's argument ensures that it allocates desired number of bytes - and makes up room for desired number of characters.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

 
Ajay Vijayvargiya

Software Developer (Senior)

India India

Member
Started programming with GwBasic back in 1996 (Those lovely days!). Found the hidden talent!
 
Touched COBOL and Quick Basic for a while. 
 
Finally learned C and C++ entirely on my own, and fell in love with C++, still in love! Began with Turbo C 2.0/3.0, then to VC6 for 4 years! Finally on VC2008/2010.
 
I enjoy programming, mostly the system programming, but the UI is always on top of MFC! Quite experienced on other environments and platforms, but I prefer Visual C++. Zeal to learn, and to share!
posted @ 2012-03-13 15:25  天堂大鸟  阅读(638)  评论(0编辑  收藏  举报