ClearCanvas源码解析3：中文乱码问题

　　国内的DCM文件里，有的会用到中文，在用ClearCanvas显示图像时，一些信息有出现乱码的问题。。

　　乱码问题比较简单，无非就是字符集不对。。

　　我们先来看一下ClearCanvas是怎么解析字符的（从DCM文件中读取出来的字节）

这里是字符转换实现。。看起来相当复杂。。转换一个byte数组还要写这么多？？？

字面意思是：特定的字符集的解析。

　　我是没看懂，这么多标准。。。

　　ClearCanvas的解释如下，高手可以理解一下

// What does "isomorphic code page" mean? And why use the Arabic code page (Windows-1256) as
// isomorphic code page?
//
// The use of the isomorphic code page is in enabling us to store DICOM strings in Unicode form,
// for example, storing the Patient's Name tag data in a SQL Server 2005 database, so that
// the tag data can be, e.g. conveyed unaltered to a C-FIND query SCU. Note that the goal isn't
// to convert the DICOM string into a Unicode string, i.e. not to convert a string of double-byte
// characters into Unicode Kanji characters; the goal is to represent the double-byte characters
// in Unicode so that it can be stored or transmitted as if we were dealing with the original
// DICOM source data. For example, if the DICOM string contains an escape sequence, the
// escape sequence should be preserved in the Unicode string.
//
// In order to accomplish this, an appropriate code page must be used. How does the code page
// fit into this? To convert to and from DICOM string data and Unicode data, you must specify to
// the converter which 'code page' the DICOM data is or is to be represented in, since there is,
// and there will not be anything inherent in the string that conveys encoding information.
//
// Why must the appropriate code page be used? A code page basically serves as a mapping from
// single bytes to characters (in a logical sense). For example, the byte \x1a represents the
// Escape character. The problem arises, however, when for certain code pages, certain bytes
// have no mapping to characters. Thus, when this code page is used to convert DICOM data
// to Unicode data, if the converter runs into these problem bytes, it cannot represent the
// byte as a Unicode character (and will represent it as a question mark '?').
//
// This applies in the reverse direction as well. Which finally brings us to Windows-1256.
// As it turns out, the Arabic code page is the only one for which every byte value (00 to ff)
// has a character representation in Unicode. Therefore, I call it the Isomorphic Code Page:
// it allows characters to be encoded in both directions and still 'look' the same.

最后，给出解决方案。。在大中国，我们用gb2312解析就行。。如果用到utf8可自己修改

Encoding.Default.GetString(buffer);

也可以写成

Encoding.GetEncoding("gb2312").GetString(buffer);

posted @ 2017-03-16 09:25 走楼梯阅读(1344) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

走楼梯

一点是一点，水会多的

ClearCanvas源码解析3：中文乱码问题

公告