zyl910

优化技巧、硬件体系、图像处理、图形学、游戏编程、国际化与文本信息处理。

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

作者:zyl910

  随着wchar_t类型引入C语言,字符串处理变得越来越复杂。例如字符串输出有printf、wprintf这两个函数,当参数中既有char字符串又有wchar_t字符串时,该怎么填写格式控制字符呢?本文对此进行探讨。


一、翻阅文档

  先翻阅一下各个编译器的文档及C99标准,看看它们对格式控制字符的说明。


1.1 VC的文档

  在MSDN官网上,可以找到printf与wprintf的格式字符串的说明,在《Format Specification Fields: printf and wprintf Functions》(http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx)。摘录——
A format specification, which consists of optional and required fields, has the following form:
% [flags] [width] [.precision] [{h | l | ll | I | I32 | I64}]type

  先点“type”查看类型,进入《printf Type Field Characters》页面(http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx)。摘录——
printf Type Field Characters

Character
Type Output format
c int or wint_t When used with printf functions, specifies a single-byte character; when used with wprintf functions, specifies a wide character.
C int or wint_t When used with printf functions, specifies a wide character; when used with wprintf functions, specifies a single-byte character.
s String When used with printf functions, specifies a single-byte–character string; when used with wprintf functions, specifies a wide-character string. Characters are displayed up to the first null character or until the precision value is reached.
S String When used with printf functions, specifies a wide-character string; when used with wprintf functions, specifies a single-byte–character string. Characters are displayed up to the first null character or until the precision value is reached.

 


  后退,再点击《Size Specification》(http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx)的链接。摘录——

To specify
Use prefix With type specifier
Single-byte character with printf functions h c or C
Single-byte character with wprintf functions h c or C
Wide character with printf functions l c or C
Wide character with wprintf functions l c or C
Single-byte – character string with printf functions h s or S
Single-byte – character string with wprintf functions h s or S
Wide-character string with printf functions l s or S
Wide-character string with wprintf functions l s or S
Wide character w c
Wide-character string w s
 

 

Thus to print single-byte or wide-characters with printf functions and wprintf functions, use format specifiers as follows.

To print character as
Use function With format specifier
single byte printf c, hc, or hC
single byte wprintf C, hc, or hC
wide wprintf c, lc, lC, or wc
wide printf C, lc, lC, or wc
 

To print strings with printf functions and wprintf functions, use the prefixes h and l analogously with format type-specifiers s and S.


  上面介绍了很多控制字符。整理一下,发现对字符串来说,最有用的是这三个——
hs:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。
s:printf是char字符串,而wprintf是wchar_t字符串。与TCHAR搭配使用很方便。


1.2 BCB的文档

  打开BCB6帮助文件中的“C Runtime Library Reference”,在索引中输入“printf”,能很快找到格式控制字符的说明——

  观察后可发现,它与VC是兼容的。可以使用hs/ls/s分别处理char/wchar_t/TCHAR字符串。


1.3 GCC的文档

  我这里装了Fedora 17,并装好了GCC 4.7.0。
  打开控制台,输入“man 3 wprintf”查看wprintf函数的文档。摘录——
c
If no l modifier is present, the int argument is converted to a wide character by a call to the btowc(3) function, and the resulting wide character is written. If an l modifier is present, the wint_t (wide character) argument is written.

s
If no l modifier is present: The const char * argument is expected to be a pointer to an array of character type (pointer to a string) containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted to wide characters (each by a call to the mbrtowc(3) function with a conversion state starting in the initial state before the first byte). The resulting wide characters are written up to (but not including) the terminating null wide character. If a precision is specified, no more wide characters than the number specified are written. Note that the precision determines the number of wide characters written, not the number of bytes or screen positions. The array must contain a terminating null byte, unless a precision is given and it is so small that the number of converted wide characters reaches it before the end of the array is reached.
If an l modifier is present: The const wchar_t * argument is expected to be a pointer to an array of wide characters. Wide characters from the array are written up to (but not including) a terminating null wide character. If a precision is specified, no more than the number specified are written. The array must contain a terminating null wide character, unless a precision is given and it is smaller than or equal to the number of wide characters in the array.


  根据上面的描述,GCC似乎只支持这两种字符串的格式控制字符——
s:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。


1.4 C99标准

  在C99标准的“7.24.2.1 The fwprintf function”中介绍了fwprintf等宽字符函数的格式控制字符。摘录——
7 The length modifiers and their meanings are:

h
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.

l (ell)
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.

……

8 The conversion specifiers and their meanings are:

c
If no l length modifier is present, the int argument is converted to a wide character as if by calling btowc and the resulting wide character is written.
If an l length modifier is present, the wint_t argument is converted to wchar_t and written.

s
If no l length modifier is present, the argument shall be a pointer to the initial element of a character array containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted as if by repeated calls to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted, and written up to (but not including) the terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the converted array, the converted array shall contain a null wide character.
If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are written up to (but not including) a terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null wide character.


  可见,C99标准中c、s仅有“l”长度修正,没“l”的是char字符串,有“l”的是wchar_t字符串。


1.5 小结

  根据上面的资料,可以整理出一份表格——

  VC和BCB GCC和C99标准
printf wprintf printf wprintf
s char wchar_t char char
S wchar_t char * *
hs char char * *
ls wchar_t wchar_t wchar_t wchar_t

*:未定义。


二、测试程序

  参考了上述文档,我觉的应该编写一个测试程序,实际测一下各个编译器对wchar_t格式控制字符的支持性。
  测试程序的代码如下——

#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <wchar.h>

char* psa = "CHAR";    // 单字节字符串.
wchar_t* psw = L"WCHAR";    // 宽字符串.
wchar_t* pst = L"TCHAR";    // 类型与printf/wprintf匹配的字符串.

int main()
{
    setlocale(LC_ALL, "");    // 使用系统当前代码页.
    
    // test
    wprintf(L"A:\t%hs\n", psa);
    wprintf(L"W:\t%ls\n", psw);
    wprintf(L"T:\t%s\n", pst);
    
    return 0;
}

 

  如果运行正常的话,该程序的输出结果应该是——
A: CHAR
W: WCHAR
T: TCHAR


三、测试结果

3.1 VC6与BCB6测试

  跟意料中的一样,VC6与BCB6均正确输出了——
A: CHAR
W: WCHAR
T: TCHAR


3.2 fedora中的GCC测试

  Fedora 17,GCC 4.7.0——

  第3项的输出结果有误是很容易理解的。因为GCC文档与C99标准都规定“无l时的s代表char字符串”,而pst实际上是一个wchar_t字符串。
  而第1项正确的输出结果反倒有点迷惑——GCC文档和C99标准中s不是没有“h”长度修正吗。想了一下才明白,文档上说的是“无l时的s代表char字符串”,因“hs”没有“l”,所以被识别为char字符串也是符合标准。


3.3 mingw中的GCC测试

  MinGW(20120426),GCC 4.6.2——

  MinGW虽然用的也是GCC编译器,但为了兼容Windows环境,它调整了格式控制字符规则,与VC保持一致。


四、总结

  根据上面的测试结果,修订前面的表格——

  VC、BCB、MinGW Linux下的GCC、C99标准
printf wprintf printf wprintf
s char wchar_t char char
S wchar_t char * *
hs char char char char
ls wchar_t wchar_t wchar_t wchar_t

  总结如下——
1) 需要输出char字符串时,使用“hs”。
2) 需要输出wchar_t字符串时,使用“ls”。
3) 需要输出TCHAR字符串时,使用“s”,仅对VC、BCB、MinGW等Windows平台的编译器有效。

 

参考文献——
《ISO/IEC 9899:1999 (C99)》。ISO/IEC,1999。www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
《C99标准》。yourtommy。http://blog.csdn.net/yourtommy/article/details/7495033
《[VS2012] Format Specification Fields: printf and wprintf Functions》。http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx
《[VS2012] printf Type Field Characters》。http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx
《[VS2012] Size Specification》。http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx
《wprintf(3) - Linux manual page》。http://www.kernel.org/doc/man-pages/online/pages/man3/wprintf.3.html

 

源码下载——
https://files.cnblogs.com/zyl910/wcharfmt.rar

posted on 2012-07-30 18:12  zyl910  阅读(4855)  评论(0编辑  收藏  举报