c中的字符串 - silentjesse

关于这个请参考《c++ primer》中的2.2节和4.3节。it's very clear

像 42 这样的值，在程序中被当作字面值常量。称之为字面值是因为只能用它的值称呼它，称之为常量是因为它的值不能修改。每个字面值都有相应的类型，例如：0 是 int 型，3.14159 是 double 型。只有内置类型存在字面值，没有类类型的字面值。因此，也没有任何标准库类型的字面值。

字符字面值

可打印的字符型字面值通常用一对单引号来定义：

  'a'         '2'         ','         ' ' // blank

这些字面值都是 char 类型的。在字符字面值前加 L 就能够得到 wchar_t 类型的宽字符字面值。如：

  L'a'

非打印字符的转义序列

有些字符是不可打印的。不可打印字符实际上是不可显示的字符，比如退格或者控制符。还有一些在语言中有特殊意义的字符，例如单引号、双引号和反斜线符号。不可打印字符和特殊字符都用转义字符书写。转义字符都以反斜线符号开始，C++ 语言中定义了如下转义字符：

newline 换行符	`\n`	horizontal tab 水平制表符	`\t`
vertical tab 纵向制表符	`\v`	backspace 退格符	`\b`
carriage return 回车符	`\r`	formfeed 进纸符	`\f`
alert (bell) 报警（响铃）符	`\a`	backslash 反斜线	`\\`
question mark 疑问号	`\?`	single quote 单引号	`\'`
double quote 双引号	`\"`

我们可以将任何字符表示为以下形式的通用转义字符：

     \ooo

这里 ooo 表示三个八进制数字，这三个数字表示字符的数字值。下面的例子是用 ASCII 码字符集表示字面值常量：

     \7 (bell)      \12 (newline)     \40 (blank)
     \0 (null)      \062 ('2')        \115 ('M')

字符’\0’通常表示“空字符（null character）”，我们将会看到它有着非常特殊的意义。

同样也可以用十六进制转义字符来定义字符：

     \xddd

它由一个反斜线符、一个 x 和一个或者多个十六进制数字组成。

字符串字面值

之前见过的所有字面值都有基本内置类型。还有一种字面值（字符串字面值）更加复杂。字符串字面值是一串常量字符

字符串字面值常量用双引号括起来的零个或者多个字符表示。不可打印字符表示成相应的转义字符。

     "Hello World!"                 // simple string literal
     ""                             // empty string literal
     "\nCC\toptions\tfile.[cC]\n"   // string literal using newlines and tabs

为了兼容 C 语言，C++ 中所有的字符串字面值都由编译器自动在末尾添加一个空字符。字符字面值

     'A' // single quote: character literal

表示单个字符 A，然而

     "A" // double quote: character string literal

表示包含字母 A 和空字符两个字符的字符串。

正如存在宽字符字面值，如

        L'a'

也存在宽字符串字面值，一样在前面加“L”，如

      L"a wide string literal"

宽字符串字面值是一串常量宽字符，同样以一个宽空字符结束。

字符串字面值的连接

两个相邻的仅由空格、制表符或换行符分开的字符串字面值（或宽字符串字面值），可连接成一个新字符串字面值。这使得多行书写长字符串字面值变得简单：

     // concatenated long string literal
     std::cout << "a multi-line "
                  "string literal "
                  "using concatenation"
               << std::endl;

执行这条语句将会输出：

     a multi-line string literal using concatenation

如果连接字符串字面值和宽字符串字面值，将会出现什么结果呢？例如：

     // Concatenating plain and wide character strings is undefined
     std::cout << "multi-line " L"literal " << std::endl;

其结果是未定义的，也就是说，连接不同类型的行为标准没有定义。这个程序可能会执行，也可能会崩溃或者产生没有用的值，而且在不同的编译器下程序的动作可能不同。

多行字面值

处理长字符串有一个更基本的（但不常使用）方法，这个方法依赖于很少使用的程序格式化特性：在一行的末尾加一反斜线符号可将此行和下一行当作同一行处理。

正如第 1.4.1 节提到的，C++ 的格式非常自由。特别是有一些地方不能插入空格，其中之一是在单词中间。特别是不能在单词中间断开一行。但可以通过使用反斜线符号巧妙实现：

 // ok: A \ before a newline ignores the line break
      std::cou\
      t << "Hi" << st\
      d::endl;

等价于

  std::cout << "Hi" << std::endl;

可以使用这个特性来编写长字符串字面值：

 // multiline string literal
           std::cout << "a multi-line \
      string literal \
      using a backslash"
                    << std::endl;
          return 0;
      }

注意反斜线符号必须是该行的尾字符——不允许有注释或空格符。同样，后继行行首的任何空格和制表符都是字符串字面值的一部分。正因如此，长字符串字面值的后继行才不会有正常的缩进。

C 风格字符串

字符串字面值的类型是字符常量的数组，现在可以更明确地认识到：字符串字面值的类型就是 const char 类型的数组。C++ 从 C 语言继承下来的一种通用结构是C 风格字符串，而字符串字面值就是该类型的实例。实际上，C 风格字符串既不能确切地归结为 C 语言的类型，也不能归结为 C++ 语言的类型，而是以空字符 null 结束的字符数组：

  char ca1[] = {'C', '+', '+'};        // no null, not C-style string
          char ca2[] = {'C', '+', '+', '\0'};  // explicit null
          char ca3[] = "C++";     // null terminator added automatically
          const char *cp = "C++"; // null terminator added automatically
          char *cp1 = ca1;   // points to first element of a array, but not C-style string
          char *cp2 = ca2;   // points to first element of a null-terminated char array

ca1 和 cp1 都不是 C 风格字符串：ca1 是一个不带结束符 null 的字符数组，而指针 cp1 指向 ca1，因此，它指向的并不是以 null 结束的数组。其他的声明则都是 C 风格字符串，数组的名字即是指向该数组第一个元素的指针。于是，ca2 和 ca3 分别是指向各自数组第一个元素的指针。

C 风格字符串的使用

C++ 语言通过(const)char*类型的指针来操纵 C 风格字符串。一般来说，我们使用指针的算术操作来遍历 C 风格字符串，每次对指针进行测试并递增 1，直到到达结束符 null 为止：

const char *cp = "some value";
          while (*cp) {
              // do something to *cp
              ++cp;
          }

while 语句的循环条件是对 const char* 类型的指针 cp 进行解引用，并判断 cp 当前指向的字符是 true 值还是 false 值。真值表明这是除 null 外的任意字符，则继续循环直到 cp 指向结束字符数组的 null 时，循环结束。while 循环体做完必要的处理后，cp 加1，向下移动指针指向数组中的下一个字符。

如果 cp 所指向的字符数组没有 null 结束符，则此循环将会失败。这时，循环会从 cp 指向的位置开始读数，直到遇到内存中某处 null 结束符为止。

C 风格字符串的标准库函数

表 4.1. 操纵 C 风格字符串的标准库函数

`strlen(s)`	Returns the length of `s`, not counting the null. 返回 `s` 的长度，不包括字符串结束符 null
`strcmp(s1, s2)`	Compares `s1` and `s2` for equality. Returns 0 if `s1 == s2`, positive value if `s1 > s2`, negative value if `s1 < s2`. 比较两个字符串 `s1` 和 `s2` 是否相同。若 `s1` 与 `s2` 相等，返回 0；若 `s1` 大于 `s2`，返回正数；若 `s1` 小于 `s2`，则返回负数
`strcat(s1, s2)`	Appends `s2` to `s1`. Returns `s1`. 将字符串 `s2` 连接到 `s1` 后，并返回 `s1`
`strcpy(s1, s2)`	Copies `s2` into `s1`. Returns `s1`. 将 `s2` 复制给 `s1`，并返回 `s1`
`strncat(s1, s2,n)`	Appends `n` characters from `s2` onto `s1`. Returns `s1`. 将 `s2` 的前 `n` 个字符连接到 `s1` 后面，并返回 `s1`
`strncpy(s1, s2, n)`	Copies `n` characters from `s2` into `s1`. Returns `s1`. 将 `s2` 的前 `n` 个字符复制给 `s1`，并返回 `s1`

          #include <cstring>

The pointer(s) passed to these routines must be nonzero and each pointer must point to the initial character in a null-terminated array. Some of these functions write to a string they are passed. These functions assume that the array to which they write is large enough to hold whatever characters the function generates. It is up to the programmer to ensure that the target string is big enough.

传递给这些标准库函数例程的指针必须具有非零值，并且指向以 null 结束的字符数组中的第一个元素。其中一些标准库函数会修改传递给它的字符串，这些函数将假定它们所修改的字符串具有足够大的空间接收本函数新生成的字符，程序员必须确保目标字符串必须足够大。

When we compare library strings, we do so using the normal relational operators. We can use these operators to compare pointers to C-style strings, but the effect is quite different; what we're actually comparing is the pointer values, not the strings to which they point:

C++ 语言提供普通的关系操作符实现标准库类型 string 的对象的比较。这些操作符也可用于比较指向C风格字符串的指针，但效果却很不相同：实际上，此时比较的是指针上存放的地址值，而并非它们所指向的字符串：

          if (cp1 < cp2) // compares addresses, not the values pointed to

Assuming cp1 and cp2 point to elements in the same array (or one past that array), then the effect of this comparison is to compare the address in cp1 with the address in cp2. If the pointers do not address the same array, then the comparison is undefined.

如果 cp1 和 cp2 指向同一数组中的元素（或该数组的溢出位置），上述表达式等效于比较在 cp1 和 cp2 中存放的地址；如果这两个指针指向不同的数组，则该表达式实现的比较没有定义。

To compare the strings, we must use strcmp and interpret the result:

字符串的比较和比较结果的解释都须使用标准库函数 strcmp 进行：

          const char *cp1 = "A string example";
          const char *cp2 = "A different string";
          int i = strcmp(cp1, cp2);    // i is positive
          i = strcmp(cp2, cp1);        // i is negative
          i = strcmp(cp1, cp1);        // i is zero

The strcmp function returns three possible values: 0 if the strings are equal; or a positive or negative value, depending on whether the first string is larger or smaller than the second.

标准库函数 strcmp 有 3 种可能的返回值：若两个字符串相等，则返回 0 值；若第一个字符串大于第二个字符串，则返回正数，否则返回负数。

Never Forget About the Null-Terminator

永远不要忘记字符串结束符 null

When using the C library string functions it is essential to remember the strings must be null-terminated:

在使用处理 C 风格字符串的标准库函数时，牢记字符串必须以结束符 null 结束：

          char ca[] = {'C', '+', '+'}; // not null-terminated
          cout << strlen(ca) << endl; // disaster: ca isn't null-terminated

In this case, ca is an array of characters but is not null-terminated. What happens is undefined. The strlen function assumes that it can rely on finding a null character at the end of its argument. The most likely effect of this call is that strlen will keep looking through the memory that follows wherever ca happens to reside until it encounters a null character. In any event, the return from strlen will not be the correct value.

在这个例题中，ca 是一个没有 null 结束符的字符数组，则计算的结果不可预料。标准库函数 strlen 总是假定其参数字符串以 null 字符结束，当调用该标准库函数时，系统将会从实参 ca 指向的内存空间开始一直搜索结束符，直到恰好遇到 null 为止。strlen 返回这一段内存空间中总共有多少个字符，无论如何这个数值不可能是正确的。

Caller Is Responsible for Size of a Destination String

调用者必须确保目标字符串具有足够的大小

The array that we pass as the first argument to strcat and strcpy must be large enough to hold the generated string. The code we show here, although a common usage pattern, is frought with the potential for serious error:

传递给标准库函数 strcat 和 strcpy 的第一个实参数组必须具有足够大的空间存放新生成的字符串。以下代码虽然演示了一种通常的用法，但是却有潜在的严重错误：

          // Dangerous: What happens if we miscalculate the size of largeStr?
          char largeStr[16 + 18 + 2];         // will hold cp1 a space and cp2
          strcpy(largeStr, cp1);              // copies cp1 into largeStr
          strcat(largeStr, " ");              // adds a space at end of largeStr
          strcat(largeStr, cp2);              // concatenates cp2 to largeStr
          // prints A string example A different string
          cout << largeStr << endl;

The problem is that we could easily miscalculate the size needed in largeStr. Similarly, if we later change the sizes of the strings to which either cp1 or cp2 point, then the calculated size of largeStr will be wrong. Unfortunately, programs similar to this code are widely distributed. Programs with such code are error-prone and often lead to serious security leaks.

问题在于我们经常会算错 largeStr 需要的大小。同样地，如果 cp1 或 cp2 所指向的字符串大小发生了变化，largeStr 所需要的大小则会计算错误。不幸的是，类似于上述代码的程序应用非常广泛，这类程序往往容易出错，并导致严重的安全漏洞。

When Using C-Style Strings, Use the `strn` Functions

使用 `strn` 函数处理C风格字符串

If you must use C-style strings, it is usually safer to use the strncat and strncpy functions instead of strcat and strcpy:

如果必须使用 C 风格字符串，则使用标准库函数 strncat 和 strncpy 比 strcat 和 strcpy 函数更安全：

          char largeStr[16 + 18 + 2]; // to hold cp1 a space and cp2
          strncpy(largeStr, cp1, 17); // size to copy includes the null
          strncat(largeStr, " ", 2);  // pedantic, but a good habit
          strncat(largeStr, cp2, 19); // adds at most 18 characters, plus a null

The trick to using these versions is to properly calculate the value to control how many characters get copied. In particular, we must always remember to account for the null when copying or concatenating characters. We must allocate space for the null because that is the character that terminates largeStr after each call. Let's walk through these calls in detail:

使用标准库函数 strncat 和 strncpy 的诀窍在于可以适当地控制复制字符的个数。特别是在复制和串连字符串时，一定要时刻记住算上结束符 null。在定义字符串时要切记预留存放 null字符的空间，因为每次调用标准库函数后都必须以此结束字符串 largeStr。让我们详细分析一下这些标准库函数的调用：

On the call to strncpy, we ask to copy 17 characters: all the characters in cp1 plus the null. Leaving room for the null is necessary so that largeStr is properly terminated. After the strncpy call, largeStr has a strlen value of 16. Remember, strlen counts the characters in a C-style string, not including the null.

调用 strncpy 时，要求复制 17 个字符：字符串 cp1 中所有字符，加上结束符 null。留下存储结束符 null 的空间是必要的，这样 largeStr 才可以正确地结束。调用 strncpy 后，字符串 largeStr 的长度 strlen 值是 16。记住：标准库函数 strlen 用于计算 C 风格字符串中的字符个数，不包括 null结束符。
When we call strncat, we ask to copy two characters: the space and the null that terminates the string literal. After this call, largeStr has a strlen of 17. The null that had ended largeStr is overwritten by the space that we appended. A new null is written after that space.

调用 strncat 时，要求复制 2 个字符：一个空格和结束该字符串字面值的 null。调用结束后，字符串 largeStr 的长度是 17，原来用于结束 largeStr 的 null 被新添加的空格覆盖了，然后在空格后面写入新的结束符 null。
When we append cp2 in the second call, we again ask to copy all the characters from cp2, including the null. After this call, the strlen of largeStr would be 35: 16 characters from cp1, 18 from cp2, and 1 for the space that separates the two strings.

第二次调用 strncat 串接 cp2 时，要求复制 cp2 中所有字符，包括字符串结束符 null。调用结束后，字符串 largeStr 的长度是 35：cp1 的 16 个字符和 cp2 的 18 个字符，再加上分隔这两个字符串的一个空格。

The array size of largeStr remains 36 throughout.

整个过程中，存储 largeStr 的数组大小始终保持为 36（包括结束符）。

These operations are safer than the simpler versions that do not take a size argument as long as we calculate the size argument correctly. If we ask to copy or concatenate more characters than the size of the target array, we will still overrun that array. If the string we're copying from or concatenating is bigger than the requested size, then we'll inadvertently truncate the new version. Truncating is safer than overrunning the array, but it is still an error.

只要可以正确计算出 size 实参的值，使用 strn 版本要比没有 size 参数的简化版本更安全。但是，如果要向目标数组复制或串接比其 size 更多的字符，数组溢出的现象仍然会发生。如果要复制或串接的字符串比实际要复制或串接的 size 大，我们会不经意地把新生成的字符串截短了。截短字符串比数组溢出要安全，但这仍是错误的。

Whenever Possible, Use Library `string`s

尽可能使用标准库类型 `string`

None of these issues matter if we use C++ library strings:

如果使用 C++ 标准库类型 string，则不存在上述问题：

          string largeStr = cp1; // initialize large Str as a copy of cp1
          largeStr += " ";       // add space at end of largeStr
          largeStr += cp2;       // concatenate cp2 onto end of largeStr

Now the library handles all memory management, and we need no longer worry if the size of either string changes.

此时，标准库负责处理所有的内存管理问题，我们不必再担心每一次修改字符串时涉及到的大小问题。

For most applications, in addition to being safer, it is also more efficient to use library strings rather than C-style strings.

对大部分的应用而言，使用标准库类型 string，除了增强安全性外，效率也提高了，因此应该尽量避免使用 C 风格字符串。

posted on 2012-11-24 20:30 silentjesse 阅读(490) 评论(0) 收藏举报