java IO之编码（码表编码解码转换流）

编码

　　什么是编码？

　　计算机中存储的都是二进制，但是要显示的时候，就是我们看到的却可以有中国，a 1 等字符

　　计算机中是没有存储字符的，但是我们却看到了。计算机在存储这些信息的时候，根据一个有规

则的编号，当用户输入a 有a对映的编号，就将这个编号存进计算机中这就是编码。

　　计算机只能识别二进制数据。

　　为了方便应用计算机，让它可以识别各个国家的文字。就将各个国家的文字用数字来表示，

并一一对应，形成一张表，这就是编码表。

例如：

　　　　汉字中

　　有一种编码：

　　中字在utf 8中对映的编码 utf-8 --> 100

　　在gbk中呢？有可能就不是100了 gbk --> 150

　　很显然同一个信息在不同的编码中对映的数字也不同，

　　不同的国家和地区使用的码表是不同的，

　　gbk 是中国大陆

　　bjg5 是台湾同胞中的繁体字。所以如果给big5一个简体字是不认识的。

　　还有ASCII 美国标准信息交换码

码表

常见的码表如下：

ASCII	美国标准信息交换码。用一个字节的7位可以表示。
ISO8859-1	拉丁码表。欧洲码表，用一个字节的8位表示。
GB2312	中国的中文编码表
GBK	中国的中文编码表升级，融合了更多的中文文字符号
Unicode	国际标准码，融合了多种文字。所有文字都用两个字节来表示，Java语言使用的就是unicode。
UTF-8	最多用三个字节来表示一个字符。

　　我们以后接触最多的是iso8859-1、gbk、utf-8

　　ISO8859-1又称Latin-1(拉丁编码)或“西欧语言”。

　　ASCII码是包含的仅仅是英文字母，并且没有完全占满256个编码位置，所以它以ASCII为基础，

在空置的0xA0-0xFF的范围内，加入192个字母及符号，藉以供使用变音符号的拉丁字母语言

使用。从而支持德文，法文等。因而它依然是一个单字节编码，只是比ASCII更全面。

　　查看上述码表后，很显然中文的‘中’在iso8859-1中是没有对映的编码的。

或者一个字符在2中码表中对应的编码不同，例如有一些字在不同的编码中是有交集的，

例如bjg5 和gbk 中的汉字简体和繁体可能是一样的，就是有交集，但是在各自码表中

的数字不一样。

例如

使用gbk 将中文保存在计算机中，中国对映 100 200

如果使用big5 打开可能？ ...

不同的编码对映的是不一样的。

很显然，我们使用什么样的编码写数据，就需要使用什么样的编码来对数据。

ISO8859-1：一个字节

GBK：两个字节包含了英文字符和扩展的中文 ISO8859-1+中文字符

UTF-8 万国码，推行的。是1~3个字节不等长。英文存的是1个字节，中文存的是3个字节，是为了节省空间。

编码：

字符串---》字节数组

String类的getBytes() 方法进行编码，将字符串，转为对映的二进制，并且这个方法可以指定编码表。

如果没有指定码表，该方法会使用操作系统默认码表。

注意：中国大陆的Windows系统上默认的编码一般为GBK。

在Java程序中可以使用System.getProperty("file.encoding")方式得到当前的默认编码。

解码：

字节数组---》字符串

String类的构造函数完成。

注意：我们使用什么字符集（码表）进行编码，就应该使用什么字符集进行解码，否则很有可能出现乱码（兼容字符集不会）。

// 编码操作与解码操作。
    public static void main(String[] args) throws Exception {
        String value = System.getProperty("file.encoding");
        System.out.println("系统默认的编码为 " + value);

        String str = "中";
        // 编码操作
        byte[] bytes = str.getBytes();
        byte[] bytes2 = str.getBytes("gbk");// d6d0
        byte[] bytes3 = str.getBytes("utf-8");// e4b8ad

        System.out.println(Arrays.toString(bytes)); // [-42, -48]
        System.out.println(Arrays.toString(bytes2));// [-42, -48]
        System.out.println(Arrays.toString(bytes3));// [-28, -72, -83]

        // 解码操作
        // 编码gbk,解码utf-8乱码。
        String str2 = new String(bytes2, "utf-8");
        System.out.println(str2);//??

        // 编码utf-8 解码gbk，乱码
        str2 = new String(bytes3, "gbk");
        System.out.println(str2);//涓?
        // gbk兼容gb2312所以，没有问题。
        str = new String("中国".getBytes("gb2312"), "gbk");
        System.out.println(str);//中国
    }

　　存文件时可以使用各种编码，但是解码的时候要对映的采用相同的解码方式。

　　我们的字符流自动的做了编码和解码的工作，写一个中文，字符流进行了编码，

存到了计算机中读到了一个字符，字符流进行了解码，我们可以看到字符。因为文件存的都是二进制。

但是拷贝图片时，是纯二进制，不是有意义的字符，所以码表无法转换。

字符流的弊端：

一：无法拷贝图片和视频。

二：拷贝文件使用字节流而不使用字符流，因为字符流读文件涉及到解码，会先解码，

写文件的时候又涉及到编码，这些操作多余，而且读和写的码表不对应还容易引发问题。

例如FileReader 读文件，我们没有指定编码时，默认是按照系统编码gbk进行操作，如果

读到utf-8的文件也是按照gbk编码进行解码，那就会出现问题。

字节流读取中文

可以将字节输入流读取的信息保存在字节数组中，指定对应的码表进行解码即可。

public class TestIo {
    public static void main(String[] args) throws IOException {
        readFileByInputStream("c:\\a.txt");
    }

    private static void readFileByInputStream(String path) throws IOException {
        FileInputStream fis = new FileInputStream(path);
        int len = 0;
        byte[] buffer = new byte[1024];
        while ((len = fis.read(buffer)) != -1) {
            System.out.println(new String(buffer, 0, len, "gbk"));
        }

    }
}

注意：如果指定的编码表和解码表不对应就会出现问题

public class TestIo {
    public static void main(String[] args) throws IOException {
        // 该文件默认是gbk编码
        readFileByInputStream("c:\\a.txt");
    }

    private static void readFileByInputStream(String path) throws IOException {
        FileInputStream fis = new FileInputStream(path);
        int len = 0;
        byte[] buffer = new byte[1024];
        while ((len = fis.read(buffer)) != -1) {
            // 使用utf-8 解码，解错。
            System.out.println(new String(buffer, 0, len, "utf-8"));
        }

    }
}

字节流写出中文

需要编码，可以指定码表。就需要自己把字符串进行编码操作后，把得到的二进制内容通过字节流

写入到文件中使用String的getBytes方法，无参数的会使用系统默认的码表进行编码，也可以指定码表

系统默认编码

public class TestIo {
    public static void main(String[] args) throws IOException {

        String path = "c:\\test.txt";
        writeFileByOutputStream(path, "世界你好");
        readFileByInputStream(path);
    }

    private static void writeFileByOutputStream(String path, String content)
            throws IOException {
        FileOutputStream fos = new FileOutputStream(path);
        // 把字符串进行编码操作，系统默认编码
        byte[] bytes = content.getBytes();
        // 内容通过字节流写入到文件中。
        fos.write(bytes);
        fos.close();
    }
    private static void readFileByInputStream(String path) throws IOException {
        FileInputStream fis = new FileInputStream(path);
        int len = 0;
        byte[] buffer = new byte[1024];

        while ((len = fis.read(buffer)) != -1) {
            // 二进制解码，使用系统默认编码
            System.out.println(new String(buffer, 0, len));
        }

    }
}

使用utf-8进行编码

public class TestIo {
    public static void main(String[] args) throws IOException {
        String path = "c:\\test.txt";
        writeFileByOutputStream(path, "世界你好");
        readFileByInputStream(path);
    }
    private static void writeFileByOutputStream(String path, String content)
            throws IOException {
        FileOutputStream fos = new FileOutputStream(path);
        // 把字符串进行编码操作
        byte[] bytes = content.getBytes("utf-8");
        // 内容通过字节流写入到文件中。
        fos.write(bytes);
        fos.close();
    }

    private static void readFileByInputStream(String path) throws IOException {
        FileInputStream fis = new FileInputStream(path);
        int len = 0;
        byte[] buffer = new byte[1024];
        while ((len = fis.read(buffer)) != -1) {
            // 二进制解码，使用系统默认编码
            System.out.println(new String(buffer, 0, len,"utf-8"));
        }

    }
}

在明白了字节流也可以正确的处理中文字符之后，就应该明白字符流其实就是字节流

在加上系统默认的码表。自动进行了编码和解码的操作。底层还是使用字节流读取文件。

通过转换流的学习就可以明白这些道理。

转换流

InputStreamReader

　　查看API文档，发现是字节流通向字符流的桥梁。查看构造，可以传递字节流，可以

指定编码，该流可以实现什么功能？很显然可以包装我们的字节流，自动的完成节流

编码和解码的工作。该流是一个Reader的子类，是字符流的体系。所以将转换流称之

为字节流和字符流之间的桥梁。

InputStreamReader 是字节流通向字符流的桥梁

测试InputStreamReader：

第一步: 需要专门新建以GBK编码的文本文件。为了便于标识，我们命名为gbk.txt

和以UFT-8编码的文本文件,命名为utf.txt
第二步: 分别写入汉字”中国”

第三步:编写测试方法,用InputStreamReader 分别使用系统默认编码,GBK,UTF-8编码读取文件.

public class Demo4 {
    public static void main(String[] args) throws IOException {
        File file = new File("c:\\a.txt");
        File fileGBK = new File("c:\\gbk.txt");
        File fileUTF = new File("c:\\utf.txt");
        // 默认编码
        testReadFile(file);
        // 传入gbk编码文件,使用gbk解码
        testReadFile(fileGBK, "gbk");
        // 传入utf-8文件,使用utf-8解码
        testReadFile(fileUTF, "utf-8");
    }

    // 该方法中nputStreamReader使用系统默认编码读取文件.
    private static void testReadFile(File file) throws 
            IOException {
        FileInputStream fis = new FileInputStream(file);
        InputStreamReader ins = new InputStreamReader(fis);
        int len = 0;
        while ((len = ins.read()) != -1) {
            System.out.print((char) len);
        }
        ins.close();
        fis.close();
    }
    // 该方法使用指定编码读取文件
    private static void testReadFile(File file, String encod)
            throws IOException {
        FileInputStream fis = new FileInputStream(file);
        InputStreamReader ins = new InputStreamReader(fis, encod);
        int len = 0;
        while ((len = ins.read()) != -1) {
            System.out.print((char) len);
        }
        ins.close();
        }
}

类 OutputStreamWriter

那么其实还有OutputStreamWriter 可以转换OutputStream

OutputStreamWriter 是字符流通向字节流的桥梁

测试OutputStreamWriter

一: 分别使用OutputStreamWriter使用系统默认编码,GBK,UTF-8相对应的默认编码文件,GBK编码文件,UTF-8编码文件中写出汉字”中国”.

二: 在使用上述案例中的readFile方法传入相对应码表读取.

public class TestIo {
    public class Demo4 {
    public static void main(String[] args) throws IOException {
        File file = new File("c:\\a.txt");
        File fileGBK = new File("c:\\gbk.txt");
        File fileUTF = new File("c:\\utf.txt");
        // 写入
        // 使用系统默认码表写入
        testWriteFile(file);
        // 使用gbk编码向gbk文件写入信息
        testWriteFile(fileGBK, "gbk");
        // 使用utf-8向utf-8文件中写入信息
        testWriteFile(fileUTF, "utf-8");

        // 读取
        // 默认编码
        testReadFile(file);
        // 传入gbk编码文件,使用gbk解码
        testReadFile(fileGBK, "gbk");
        // 传入utf-8文件,使用utf-8解码
        testReadFile(fileUTF, "utf-8");

    }

    // 使用系统码表将信息写入到文件中
    private static void testWriteFile(File file) throws IOException {
        FileOutputStream fos = new FileOutputStream(file);
        OutputStreamWriter ops = new OutputStreamWriter(fos);
        ops.write("中国");
        ops.close();
    }
    // 使用指定码表,将信息写入到文件中
    private static void testWriteFile(File file, String encod)
            throws IOException {
        FileOutputStream fos = new FileOutputStream(file);
        OutputStreamWriter ops = new OutputStreamWriter(fos, encod);
        ops.write("中国");
        ops.close();
    }

    // 该方法中nputStreamReader使用系统默认编码读取文件.
    private static void testReadFile(File file) throws IOException {
        FileInputStream fis = new FileInputStream(file);
        InputStreamReader ins = new InputStreamReader(fis);
        int len = 0;
        while ((len = ins.read()) != -1) {
            System.out.print((char) len);
        }
        ins.close();
    
    }
    // 该方法适合用指定编码读取文件
    private static void testReadFile(File file, String encod)
            throws IOException {
        FileInputStream fis = new FileInputStream(file);
        InputStreamReader ins = new InputStreamReader(fis, encod);
        int len = 0;
        while ((len = ins.read()) != -1) {
            System.out.print((char) len);
        }
    
        ins.close();
    }

}

InputStreamReader：字节到字符的桥梁。

OutputStreamWriter：字符到字节的桥梁。

它们有转换作用，而本身又是字符流。所以在构造的时候，需要传入字节流对象进来。

构造函数：

InputStreamReader(InputStream)

通过该构造函数初始化，使用的是本系统默认的编码表GBK。

InputStreamReader(InputStream,String charSet)

通过该构造函数初始化，可以指定编码表。

OutputStreamWriter(OutputStream)

通过该构造函数初始化，使用的是本系统默认的编码表GBK。

OutputStreamWriter(OutputStream,String charSet)

通过该构造函数初始化，可以指定编码表。

注意：

操作文件的字符流对象是转换流的子类。

Reader
|--InputStreamReader
|--FileReader
Writer 
|--OutputStreamWriter
|--FileWriter

注意：

在使用FileReader操作文本数据时，该对象使用的是默认的编码表。

如果要使用指定编码表时，必须使用转换流。

如果系统默认编码是GBK的：

FileReader fr = new FileReader("a.txt");//操作a.txt的中的数据使用的本系统默认的GBK。

操作a.txt中的数据使用的也是本系统默认的GBK。

InputStreamReader isr = new InputStreamReader(new FileInputStream("a.txt"));

这两句的代码的意义相同。

但是：如果a.txt中的文件中的字符数据是通过utf-8的形式编码。使用FileReader就无能为力，那么在读取时，就必须指定编码表。那么转换流必须使用。

InputStreamReader isr =new InputStreamReader(new FileInputStream("a.txt"),"utf-8");

posted @ 2016-03-30 20:33 loveincode 阅读(971) 评论(0) 收藏举报

刷新页面返回顶部

loveincode

To strive, to seek, to find, and not to yield.

java IO之编码（码表编码解码转换流）

编码

码表

编码：

解码：

字节流读取中文

字节流写出中文

转换流

公告

loveincode

To strive, to seek, to find, and not to yield.

java IO之 编码 （码表 编码 解码 转换流）

编码

码表

编码：

解码：

字节流读取中文

字节流写出中文

转换流

公告

java IO之编码（码表编码解码转换流）