代码改变世界

编码参考(Encoding)

2011-02-16 14:37  Clingingboy  阅读(723)  评论(0编辑  收藏  举报

 

一.ASCII

参照标准ASCII表,其只支持128个字符

http://baike.baidu.com/view/15482.htm

MSDN示例代码

public static void Main()
{
    // The encoding.
    ASCIIEncoding ascii = new ASCIIEncoding();

    // A Unicode string with two characters outside the ASCII code range.
    String unicodeString =
        "This Unicode string contains two characters " +
        "with codes outside the ASCII code range, " +
        "Pi (\u03a0) and Sigma (\u03a3).";
    Console.WriteLine("Original string:");
    Console.WriteLine(unicodeString);

    // Save positions of the special characters for later reference.
    int indexOfPi = unicodeString.IndexOf('\u03a0');
    int indexOfSigma = unicodeString.IndexOf('\u03a3');

    // Encode string.
    Byte[] encodedBytes = ascii.GetBytes(unicodeString);
    Console.WriteLine();
    Console.WriteLine("Encoded bytes:");
    foreach (Byte b in encodedBytes)
    {
        Console.Write("[{0}]", b);
    }
    Console.WriteLine();

    // Notice that the special characters have been replaced with
    // the value 63, which is the ASCII character code for '?'.
    Console.WriteLine();
    Console.WriteLine(
        "Value at position of Pi character: {0}",
        encodedBytes[indexOfPi]
    );
    Console.WriteLine(
        "Value at position of Sigma character: {0}",
        encodedBytes[indexOfSigma]
    );
    
    // Decode bytes back to string.
    // Notice missing Pi and Sigma characters.
    String decodedString = ascii.GetString(encodedBytes);
    Console.WriteLine();
    Console.WriteLine("Decoded bytes:");
    Console.WriteLine(decodedString);
}

输出:

image

二.Unicode

UTF-8 编码将每个码位表示为一个由 1 至 4 个字节组成的序列

应尽量使用该编码,其经过.net优化
参考:http://baike.baidu.com/view/40801.htm

示例:

public static void Main()
{
    // Create a UTF-8 encoding.
    UTF8Encoding utf8 = new UTF8Encoding();

    // A Unicode string with two characters outside an 8-bit code range.
    String unicodeString =
        "This unicode string contains two characters " +
        "with codes outside an 8-bit code range, " +
        "Pi (\u03a0) and Sigma (\u03a3).";
    Console.WriteLine("Original string:");
    Console.WriteLine(unicodeString);

    // Encode the string.
    Byte[] encodedBytes = utf8.GetBytes(unicodeString);
    Console.WriteLine();
    Console.WriteLine("Encoded bytes:");
    foreach (Byte b in encodedBytes)
    {
        Console.Write("[{0}]", b);
    }
    Console.WriteLine();

    // Decode bytes back to string.
    // Notice Pi and Sigma characters are still present.
    String decodedString = utf8.GetString(encodedBytes);
    Console.WriteLine();
    Console.WriteLine("Decoded bytes:");
    Console.WriteLine(decodedString);
}

输出:

image

其他:

  1. UTF-7 编码将 Unicode 字符表示为 7 位 ASCII 字符的序列
  2. UTF-16,它将每个码位表示为一个由 1 至 2 个 16 位整数组成的序列
  3. UTF-32 编码将每个码位表示为一个 32 位整数