Java字符编码问题
今天研究了一下,记录下来
中间用的是redis,可以使用任意其他的io替代,一样的
Test1
String s1 = "我要测试";
String s2 = "I want to test";
String s3 = "경쟁력, 네이버";
redis.lpush("testencode", s1);
redis.lpush("testencode", s2);
redis.lpush("testencode", s3);
System.out.println(redis.lpop("testencode"));
System.out.println(redis.lpop("testencode"));
System.out.println(redis.lpop("testencode"));
结果:全部正确
注解:Java内部也是unicode,所以如果发送和接受端都是Java写的,无需任何转码(前提是发送和接受端的默认编码一致)
Java在往I/O发送和从I/O接受的时候会默认转码,一般用系统默认的编码,貌似文档本身的编码格式优先级更高
所以这里发送到时候转成utf-8,接受时再从utf-8转回unicode,所以没有问题
Test2
String s1 = "我要测试";
byte[] key = "testencode".getBytes();
byte[] b1 = s1.getBytes("gb2312"); //自己转码,而非默认转码
redis.lpush(key, b1);
System.out.println(new String(redis.lpop(key),"gb2312"));
//System.out.println(new String(redis.lpop(key)));
结果:正确
注解:由于发送的时候已经转成gb2312,所以接受的时候,必须转回来,如果用默认的(注释掉部分)就会转成默认编码utf-8,就会乱码
前面的转码都是在知道原编码的情况下,但有时在接收端无法知道原来的编码,这是就需要detect编码
使用JCharDet,这个的接口写的不好,蛮难用的
参考,http://blog.csdn.net/chenvsa/article/details/7445569
我改了一下,
import org.mozilla.intl.chardet.nsDetector;
import org.mozilla.intl.chardet.nsICharsetDetectionObserver;
import org.mozilla.intl.chardet.nsPSMDetector;
public class CharsetDetector{
private boolean found = false;
private String result;
private int lang = nsPSMDetector.ALL;
public String[] detectCharset(byte[] bytes) throws IOException
{
String[] prob;
// Initalize the nsDetector() ;
nsDetector det = new nsDetector(lang);
// Set an observer...
// The Notify() will be called when a matching charset is found.
det.Init(
new nsICharsetDetectionObserver(){
public void Notify(String charset)
{
found = true;
result = charset;
}
});
int len = bytes.length;
boolean isAscii = true;
if (isAscii){
isAscii = det.isAscii(bytes, len);
}
// DoIt if non-ascii and not done yet.
if (!isAscii){
if (det.DoIt(bytes, len, false));
}
det.DataEnd();
if (isAscii){
found = true;
prob = new String[] {"ASCII"};
} else if (found){
prob = new String[] {result};
} else {
prob = det.getProbableCharsets();
}
return prob;
}
public String[] detectChineseCharset(byte[] bytes) throws IOException
{
try{
lang = nsPSMDetector.CHINESE;
return detectCharset(bytes);
} catch (IOException e){
throw e;
}
}
使用,
CharsetDetector cd = new CharsetDetector();
String[] probableSet = {};
try {
probableSet = cd.detectChineseCharset(b1);
} catch (IOException e) {
e.printStackTrace();
}
for (String charset : probableSet)
{
System.out.println(charset);
}