IO 流读取文件时候出现乱码 文件编码格式问题 怎么转换解决方法

在使用下面这个写法时候UTF-8文件编码 在读取时候出现乱码问题。

File myFile=new File("文件路径");

BufferedReader in = new BufferedReader(new FileReader(myFile));  

应该修改为:

BufferedReader in = new BufferedReader( new InputStreamReader( new FileInputStream(myFile), "UTF-8") ); 

如果使用INSA编码时候 请使用下面文件读取方式:

InputStreamReader reader = new InputStreamReader(   new FileInputStream(new File("文件路径")), "gb2312");  

下面是我对文件编码的判断方法:

/** 
     * 上传文件编码判断 
     * */  
    public static String get_charset(File file) {  
        String charset = "GBK";  
        byte[] first3Bytes = new byte[3];  
        try {  
            boolean checked = false;  
            ;  
            BufferedInputStream bis = new BufferedInputStream(  
                    new FileInputStream(file));  
            bis.mark(0);  
            int read = bis.read(first3Bytes, 0, 3);  
            if (read == -1)  
                return charset;  
            if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {  
                charset = "UTF-16LE";  
                checked = true;  
            } else if (first3Bytes[0] == (byte) 0xFE  
                    && first3Bytes[1] == (byte) 0xFF) {  
                charset = "UTF-16BE";  
                checked = true;  
            } else if (first3Bytes[0] == (byte) 0xEF  
                    && first3Bytes[1] == (byte) 0xBB  
                    && first3Bytes[2] == (byte) 0xBF) {  
                charset = "UTF-8";  
                checked = true;  
            }  
            bis.reset();  
            if (!checked) {  
                // int len = 0;  
                int loc = 0;  
  
                while ((read = bis.read()) != -1) {  
                    loc++;  
                    if (read >= 0xF0)  
                        break;  
                    if (0x80 <= read && read <= 0xBF) // 单独出现BF以下的,也算是GBK  
                        break;  
                    if (0xC0 <= read && read <= 0xDF) {  
                        read = bis.read();  
                        if (0x80 <= read && read <= 0xBF) // 双字节 (0xC0 - 0xDF)  
                            // (0x80  
                            // - 0xBF),也可能在GB编码内  
                            continue;  
                        else  
                            break;  
                    } else if (0xE0 <= read && read <= 0xEF) {// 也有可能出错,但是几率较小  
                        read = bis.read();  
                        if (0x80 <= read && read <= 0xBF) {  
                            read = bis.read();  
                            if (0x80 <= read && read <= 0xBF) {  
                                charset = "UTF-8";  
                                break;  
                            } else  
                                break;  
                        } else  
                            break;  
                    }  
                }  
  
            }  
  
            bis.close();  
        } catch (Exception e) {  
            e.printStackTrace();  
        }  
  
        return charset;  
    }  

调用时候判断编码方式UTF-8 或是 INSA编码:

BufferedReader br = null;  
            if (charset == "GBK") {  
                InputStreamReader reader = new InputStreamReader(  
                        new FileInputStream(new File(filepath)), "gb2312");  
                br = new BufferedReader(reader);  
            }  
            if (charset == "UTF-8") {  
                br = new BufferedReader(new InputStreamReader(  
                        new FileInputStream(filepath), "UTF-8"));  
            }  
posted @ 2017-11-23 15:48  飞鸿踏雪不留痕  阅读(761)  评论(0编辑  收藏  举报