Java字符集
2012-08-22 16:05 javaspring 阅读(491) 评论(0) 编辑 收藏 举报通常而言,把明文的字符序列转换成计算机能理解的二进制序列称为编码,把二进制序列转换成普通人能看懂的明文字符串称为解码。
JDK1.4提供了Charset来处理字节序列和字符序列之间的转换关系,该类包含了用于创建解码器和编码器的方法,还提供了Charset所支持的字符集的方法,Charset类是不可变的。
Charset类提供了一个availableCharset()的静态方法来获取当前JDK所支持的所有字符集,下面小试牛刀
import java.nio.charset.Charset; import java.util.SortedMap; public class Test { public static void main(String[] args) throws Exception { SortedMap<String,Charset> sm= Charset.availableCharsets(); for(String str:sm.keySet()) { System.out.println(sm.get(str)); } } }结果如下
Big5 Big5-HKSCS EUC-JP EUC-KR GB18030 GB2312 GBK IBM-Thai IBM00858 IBM01140 IBM01141 IBM01142 IBM01143 IBM01144 IBM01145 IBM01146 IBM01147 IBM01148 IBM01149 IBM037 IBM1026 IBM1047 IBM273 IBM277 IBM278 IBM280 IBM284 IBM285 IBM297 IBM420 IBM424 IBM437 IBM500 IBM775 IBM850 IBM852 IBM855 IBM857 IBM860 IBM861 IBM862 IBM863 IBM864 IBM865 IBM866 IBM868 IBM869 IBM870 IBM871 IBM918 ISO-2022-CN ISO-2022-JP ISO-2022-JP-2 ISO-2022-KR ISO-8859-1 ISO-8859-13 ISO-8859-15 ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-9 JIS_X0201 JIS_X0212-1990 KOI8-R KOI8-U Shift_JIS TIS-620 US-ASCII UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE UTF-8 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 windows-31j x-Big5-HKSCS-2001 x-Big5-Solaris x-euc-jp-linux x-EUC-TW x-eucJP-Open x-IBM1006 x-IBM1025 x-IBM1046 x-IBM1097 x-IBM1098 x-IBM1112 x-IBM1122 x-IBM1123 x-IBM1124 x-IBM1364 x-IBM1381 x-IBM1383 x-IBM33722 x-IBM737 x-IBM833 x-IBM834 x-IBM856 x-IBM874 x-IBM875 x-IBM921 x-IBM922 x-IBM930 x-IBM933 x-IBM935 x-IBM937 x-IBM939 x-IBM942 x-IBM942C x-IBM943 x-IBM943C x-IBM948 x-IBM949 x-IBM949C x-IBM950 x-IBM964 x-IBM970 x-ISCII91 x-ISO-2022-CN-CNS x-ISO-2022-CN-GB x-iso-8859-11 x-JIS0208 x-JISAutoDetect x-Johab x-MacArabic x-MacCentralEurope x-MacCroatian x-MacCyrillic x-MacDingbat x-MacGreek x-MacHebrew x-MacIceland x-MacRoman x-MacRomania x-MacSymbol x-MacThai x-MacTurkish x-MacUkraine x-MS932_0213 x-MS950-HKSCS x-MS950-HKSCS-XP x-mswin-936 x-PCK x-SJIS_0213 x-UTF-16LE-BOM X-UTF-32BE-BOM X-UTF-32LE-BOM x-windows-50220 x-windows-50221 x-windows-874 x-windows-949 x-windows-950 x-windows-iso2022jp一旦知道了字符集的别名之后,程序就可以调用Charset的forName()来创建对应的Charset对象,forName()方法的参数就是相应字符集的别名
Charset cs=Charset.forName("ISO-8859-1");
Charset cs=Charset.forName("GBK");
获得了Charset对象的之后,就可已通过该对象的newDecode()和newEncode()这两个方法分别返回CharsetDecode和CharsetEncode对象,代表该Charset的解码器和编码器,调用CharsetDecode的decode()方法就可以将ByteBuffer转换成CharBuffer,调用CharsetEncode就可以将CharBuffer或String转换成ByteBuffer。
import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.Charset; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; public class Test { public static void main(String[] args) throws Exception { Charset cs=Charset.forName("GBK"); CharsetDecoder cd=cs.newDecoder(); CharsetEncoder ce=cs.newEncoder(); CharBuffer cb=CharBuffer.allocate(6); cb.put("张"); cb.put("译"); cb.put("成"); cb.flip(); ByteBuffer bb=ce.encode(cb); for(int i=0;i<bb.capacity();i++) { System.out.println(bb.get(i)); } System.out.println(cd.decode(bb)); } }
Charset还提供了一下方法处理编码问题
CharBuffer decode(ByteBuffer bb)
ByteBuffer encode(CharBuffer cb)
ByteBuffer encode(String str)
我就不解释这三个方法了,估计都能估计出来