读写Unicode字符串(UTF8,UTF16…)
写UTF-16字符串:
class TestDataGenerator
{
public static void CreateNewTestDataFile(string FileName, int record_length)
{
using (FileStream fs = File.Create(FileName))
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < record_length; i++)
{
sb.Append('的');
}
byte[] content = Encoding.Unicode.GetBytes(sb.ToString());
fs.Write(content, 0, content.Length);
}
}
}
{
public static void CreateNewTestDataFile(string FileName, int record_length)
{
using (FileStream fs = File.Create(FileName))
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < record_length; i++)
{
sb.Append('的');
}
byte[] content = Encoding.Unicode.GetBytes(sb.ToString());
fs.Write(content, 0, content.Length);
}
}
}
高亮的那句话用于把string编码为UTF-16字节流。
调用该函数生成包含100个字符的测试文件:
TestDataGenerator.CreateNewTestDataFile("test.txt", 100);
可以看到文件大小为200字节。原因是UTF-16使用2个字节来存储包括汉字在内的非ASCII字符
读取Unicode字符串:
FileStream fs = new System.IO.FileStream("test.txt", FileMode.Open, FileAccess.Read);
byte[] blob = new byte[100];
fs.Read(blob, 0, 100);
fs.Flush();
string strUtf16 = Encoding.Unicode.GetString(blob);
string strUtf8 = Encoding.UTF8.GetString(blob);
byte[] blob = new byte[100];
fs.Read(blob, 0, 100);
fs.Flush();
string strUtf16 = Encoding.Unicode.GetString(blob);
string strUtf8 = Encoding.UTF8.GetString(blob);
从Watch窗口可见, 将字符串强转为UTF-8形式会出现乱码,这是因为UTF-8标准使用3个字节来存储汉字等字符,而不是UTF-16的2个字节。
尝试使用UTF-8编码存储字符:
byte[] content = Encoding.UTF8.GetBytes(sb.ToString());
fs.Write(content, 0, content.Length);
fs.Write(content, 0, content.Length);
刷新后查看文件属性可见文件大小变为300字节:
同理,读取时将UTF-8字符强转为UTF-16也是不行的, strUtf16显示为乱码: