编码参考(Encoding)
2011-02-16 14:37 Clingingboy 阅读(723) 评论(0) 编辑 收藏 举报
一.ASCII
参照标准ASCII表,其只支持128个字符
http://baike.baidu.com/view/15482.htm
MSDN示例代码
public static void Main()
{
// The encoding.
ASCIIEncoding ascii = new ASCIIEncoding();
// A Unicode string with two characters outside the ASCII code range.
String unicodeString =
"This Unicode string contains two characters " +
"with codes outside the ASCII code range, " +
"Pi (\u03a0) and Sigma (\u03a3).";
Console.WriteLine("Original string:");
Console.WriteLine(unicodeString);
// Save positions of the special characters for later reference.
int indexOfPi = unicodeString.IndexOf('\u03a0');
int indexOfSigma = unicodeString.IndexOf('\u03a3');
// Encode string.
Byte[] encodedBytes = ascii.GetBytes(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
Console.Write("[{0}]", b);
}
Console.WriteLine();
// Notice that the special characters have been replaced with
// the value 63, which is the ASCII character code for '?'.
Console.WriteLine();
Console.WriteLine(
"Value at position of Pi character: {0}",
encodedBytes[indexOfPi]
);
Console.WriteLine(
"Value at position of Sigma character: {0}",
encodedBytes[indexOfSigma]
);
// Decode bytes back to string.
// Notice missing Pi and Sigma characters.
String decodedString = ascii.GetString(encodedBytes);
Console.WriteLine();
Console.WriteLine("Decoded bytes:");
Console.WriteLine(decodedString);
}
输出:
二.Unicode
UTF-8 编码将每个码位表示为一个由 1 至 4 个字节组成的序列
应尽量使用该编码,其经过.net优化
参考:http://baike.baidu.com/view/40801.htm
示例:
public static void Main()
{
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// A Unicode string with two characters outside an 8-bit code range.
String unicodeString =
"This unicode string contains two characters " +
"with codes outside an 8-bit code range, " +
"Pi (\u03a0) and Sigma (\u03a3).";
Console.WriteLine("Original string:");
Console.WriteLine(unicodeString);
// Encode the string.
Byte[] encodedBytes = utf8.GetBytes(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
Console.Write("[{0}]", b);
}
Console.WriteLine();
// Decode bytes back to string.
// Notice Pi and Sigma characters are still present.
String decodedString = utf8.GetString(encodedBytes);
Console.WriteLine();
Console.WriteLine("Decoded bytes:");
Console.WriteLine(decodedString);
}
输出:
其他:
- UTF-7 编码将 Unicode 字符表示为 7 位 ASCII 字符的序列
- UTF-16,它将每个码位表示为一个由 1 至 2 个 16 位整数组成的序列
- UTF-32 编码将每个码位表示为一个 32 位整数