JAVA I/O（二）文件NIO

一、Unix五种I/O模型

读取和写入文件I/O操作都是调用操作系统提高的接口，对磁盘I/O来说，一般是将数据从磁盘拷贝到内核空间，然后从内核空间拷贝到用户空间。为了减小I/O时间，一般内核空间存在高速页缓存，应用访问时，直接读取缓存中数据。也就是说，用户空间发生I/O操作时，内核空间缓存中如果没有，则需要从底层磁盘读取，进行缓存，然后再复制到用户空间。

文章I/O模型之一：Unix的五种I/O模型，对阻塞非阻塞、同步异步I/O进行了描写。

阻塞I/O，以读为例，进程从发起读操作开始，等待内核空间从磁盘读取数据（可认为是数据准备阶段），然后再拷贝到用户空间。

非阻塞I/O，对数据为准备好时，则直接返回，进程可以执行其他操作，如循环检测数据是否准备好。

I/O复用，进程通过select方法监控多个通道，只要有操作变化，即可执行读或写，没有事件发生时，处于阻塞状态。

信号驱动I/O，进程发起I/O操作后即返回，等数据准备好，通知该进程进行处理，然后拷贝数据到用户空间。

异步I/O，进程发起I/O操作后，直到数据拷贝到用户空间，才会通知该进程。

其中，同步I/O是指请求进程在I/O操作未完成时一直处于阻塞状态，则阻塞I/O、非阻塞I/O、I/O复用、信号驱动I/O都属于同步I/O。五中I/O模型的表现如下图：

二、Java NIO

新I/O（NIO）是JDK1.4引入的新Java I/O类库，目的在于提速，现在旧I/O也是基于NIO实现的。I/O包括文件I/O和网络I/O。速度的提升源自于所使用的结构更接近操作系统执行I/O的方式：通道和缓冲器。应用与缓冲器交互，缓冲器与通道交互。其中，最基础的与通道交互的是ByteBuffer，即用于存储字节的缓冲器。

NIO的核心包括：通道（Channel）、缓冲器（ByteBuffer）和选择器（Selector）。其中通道与缓冲器交互方式如下图，缓冲器可以从通道读数据和写数据，通道与具体数据来源对应。

通道可以认为是数据资源的实体，可以通过通道进行读写。常用的通道有FileChannel、SocketChannel、ServerSocketChannel和DatagramChannel，FileChannel用于本地磁盘文件的操作，后三者用于网络传输。

缓冲器除了基本的ByteBuffer外，还有CharBuffer、IntBuffer、ShortBuffer、LongBuffer、FloatBuffer、DoubleBuffer等基本类型缓冲器。体可参考Java NIO系列教程（二） Channel通道介绍及FileChannel详解

选择器：可以连接多个通道，如下图所示。在非阻塞模式下，用select()方法检测发生变化的通道，用一个线程实现多个客户端的请求，从而实现多路复用。具体参考Java NIO系列教程（一） Java NIO 概述

本文先对文件I/O进行记录，主要涉及FileChannel和ByteBuffer等。

1. 获取通道

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

/**
 * 1. nio:通道与缓冲器
 * 2. 缓冲器作为通道对外输出和输入的容器，应用不直接与通道交互
 * 3. 唯一直接与通道交互的缓冲器是ByteBuffer，字节缓冲器，其他可以以此做变种
 * 4. 旧IO类库中修改了 FileInputStream\FileOutputStream\RandomAccessFile三个类，用于产生唯一通道FileChannel
 * 5. 因为都是字节流，古不能用Reader和Writer产生通道
 * @author bob
 *
 */
public class GetChannel {
    
    private static final int BSIZE = 1024;
    
    public static void main(String[] args) throws IOException{

        FileChannel fc = new FileOutputStream("niodata.txt").getChannel();
        fc.write(ByteBuffer.wrap("some text".getBytes()));
        fc.close();
        
        fc = new RandomAccessFile("niodata.txt", "rw").getChannel();
        fc.position(fc.size());
        fc.write(ByteBuffer.wrap(" some more".getBytes()));
        fc.close();
        
        fc = new FileInputStream("niodata.txt").getChannel();
        ByteBuffer buffer = ByteBuffer.allocate(BSIZE);//只读访问时，必须显示的使用静态allocate()分配大小
        /**
         * 通道读取文件中的数据，并存到ByteBuffer中，ByteBuffer的position会移动，移动到实际读取的字节数。
         * 为了能进一步处理，需要调flip()方法，将position还原
         */
        fc.read(buffer);
        /**
         * 调整limit为position，将position设置为0.
         * 一般放在put和read之后，用于写入和读取ByteBuffer中的数据
         * The limit is set to the current position and then the position is set to zero.
         */
        buffer.flip();
        while (buffer.hasRemaining()) {
            System.out.print((char) buffer.get());
        }
    }
}

输出：some text some more

如注释中所述，通过FileInputStream\FileOutputStream\RandomAccessFile三个类，产生通道FileChannel，该通道与ByteBuffer交互进行读写。缓冲器每次从通道中读取BSIZE个字节，忽略文件中多于BSIZE的字节。

2. 文件复制

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

/**
 * 文件copy
 * read方法会根据读取字节数移动position
 * write()时，需要将position恢复为0
 *
 */
public class ChannelCopy {
    
    private static final int BSIZE= 1024;
    
    public static void main(String[] args) throws IOException{

        FileChannel in = new FileInputStream("niodata.txt").getChannel();
        FileChannel out = new FileOutputStream("niodatacopy.txt").getChannel();
        
        ByteBuffer buffer = ByteBuffer.allocate(BSIZE);
        
        while(in.read(buffer) != -1) {
            buffer.flip();//准备写
            out.write(buffer);
            /**
             * 清空缓冲器
             * Clears this buffer.  The position is set to zero, the limit is set to
             * the capacity, and the mark is discarded.
             */
            buffer.clear();//准备下一次读
        }
    }
}

还有一种较为理想的方法，通过特殊方法transferTo()和transferFrom()，将两个通道直连。

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.channels.FileChannel;

/**
 * 通道直连
 * transferTo/transferFrom，比read copy效率高
 *
 */
public class TransferTo {

    public static void main(String[] args) throws IOException{

        FileChannel in = new FileInputStream("niodata.txt").getChannel();
        FileChannel out = new FileOutputStream("niodatacopy1.txt").getChannel();
        /**
         * 设置position位置，从position开始读
         */
//        in.transferTo(0, in.size(), out);//设置position>0，生效
        
//        out.transferFrom(in, 3, in.size()); //将position设置为3，没有生效
        
        in.position(2);//该方式设置position，生效
        out.transferFrom(in, 0, in.size());
    }
}

2个方法中，有参数position和count，position表示从那个位置开始读取，0表示从文件开始读起；count表示读取总字节数。

3. 基本类型对应的Buffer

除了常用的ByteBuffer外，其他基本类型Buffer，如下图所示，具体Buffer的使用可以参考《Java编程思想》或Java NIO系列教程（三） Channel之Socket通道

可以直接创建不同类型的Buffer，也可以通过视图缓冲器 以特定基本数据类型查看底层的ByteBuffer，这个过程涉及编码的问题。以下例子是通过char视图查看缓冲器。

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;

/**
 * 字符串获取字节，默认采用环境编码UTF-8
 * 采用asCharBuffer()的toString()输出的时候，采用的是系统的编码UTF-16/UTF-16BE，导致乱码
 * 1. 采用环境编码进行解码；
 * 2. 输出到文件时采用UTF-16/UTF-16BE
 * 3. 输出与输入文件编码保持一致
 *
 */
public class BufferToText {
    
    private static final int BSIZE = 1024;

    public static void main(String[] args) throws IOException{

        //一、采用环境默认编码输出 UTF-8
        FileChannel fc = new FileOutputStream("data2.txt").getChannel();
        fc.write(ByteBuffer.wrap("some text".getBytes()));
        fc.close();
        //1. 直接用asCharBuffer()的toString()，由于编码问题，输出为乱码：獯浥⁴數
        fc = new FileInputStream("data2.txt").getChannel();
        ByteBuffer buffer = ByteBuffer.allocate(BSIZE);
        fc.read(buffer);
        buffer.flip();
        System.out.println(buffer.asCharBuffer());
        //2. 采用环境编码，对buffer进行解码，输出：Decoded using UTF-8:some text
        buffer.rewind();
        String encoding = System.getProperty("file.encoding");
        System.out.println("Decoded using " + encoding + ":" 
                + Charset.forName(encoding).decode(buffer));
        
        //二、采用制定编码输出UTF-16BE/UTF-16,输出正常，说明asCharBuffer()读取数据的时候 字符集采用的是操作系统的UTF-16BE
        fc = new FileOutputStream("data2.txt").getChannel();
        fc.write(ByteBuffer.wrap("some text".getBytes("UTF-16BE")));
        fc.close();
        
        fc = new FileInputStream("data2.txt").getChannel();
        buffer.clear();
        fc.read(buffer);
        buffer.flip();
        System.out.println(buffer.asCharBuffer());
        
        //三、直接采用 asCharBuffer()写，编码一致，正常输出
        fc = new FileOutputStream("data2.txt").getChannel();
//        buffer = ByteBuffer.allocate(24);
        buffer.clear();
        buffer.asCharBuffer().put("some text");
//        System.out.println(buffer.position());
        fc.write(buffer);
        fc.close();
        
        fc = new FileInputStream("data2.txt").getChannel();
        buffer.clear();
        fc.read(buffer);
        buffer.flip();
        System.out.println(buffer.asCharBuffer());
    }
}

输出：

獯浥⁴數
Decoded using UTF-8:some text
some text
some text

第一种，采用环境默认的编码UTF-8，调buffer.asCharBuffer()，以CharBuffer视图调toString()时，出现乱码，以UTF-8解码可以正常输出。第二种，以UTF-16BE编码写入，再以同样的方式调toString()，输出正常，说明buffer.asCharBuffer()采用操作系统的编码方式UTF-16BE。第三中，以CharBuffer的方式写和读，正常输出。故需要保持读写编码的一致性。

编码与字节存放次序有关，不同的机器使用不同的字节排序方法存储数据。“big endian”（高位优先，如UTF-16BE）将重要字节存放在地址最低的存储器单元，而“little endian”（地位优先）则是将重要的字节放在最高的存储器单元。当存储大于一个字节时，像int、float等，就需要考虑字节的顺序。这个存储顺序，可以通过ByteOrder.BIG_ENDIAN和ByteOrder.LITTLE_ENDIAN设定。

import java.nio.Buffer;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.DoubleBuffer;
import java.nio.IntBuffer;

/**
 * 视图缓冲器
 * 通过特定基本类型的视窗查看底层的ByteBuffer
 *
 */
public class ViewBuffers {

    public static void main(String[] args) {
        
        ByteBuffer buffer = ByteBuffer.wrap(new byte[] {0, 0, 0, 0, 0, 0, 0, 'a'});
//        System.out.println(buffer.position());
        System.out.print("Byte buffer: ");
        while(buffer.hasRemaining()) {
            System.out.print(buffer.position() + "->" + buffer.get() + ", ");
        }
        System.out.println();
        
        //存在问题，获取不到char
//        buffer.rewind();
//        CharBuffer charBuffer = buffer.asCharBuffer();
//        System.out.print("Char buffer: ");
//        while(charBuffer.hasRemaining()) {
//            System.out.print(charBuffer.position() + "->" + charBuffer.get() + ", ");
//        }
//        System.out.println();
        
        buffer.rewind();
        IntBuffer intBuffer = buffer.asIntBuffer();
        System.out.print("Int buffer: ");
        while(intBuffer.hasRemaining()) {
            System.out.print(intBuffer.position() + "->" + intBuffer.get() + ", ");
        }
        System.out.println();
        
        buffer.rewind();
        DoubleBuffer doubleBuffer = buffer.asDoubleBuffer();
        System.out.print("dubble buffer: ");
        while(doubleBuffer.hasRemaining()) {
            System.out.print(doubleBuffer.position() + "->" + doubleBuffer.get() + ", ");
        }
        System.out.println();
    }
}

输出：

Byte buffer: 0->0, 1->0, 2->0, 3->0, 4->0, 5->0, 6->0, 7->97, 
Int buffer: 0->0, 1->97, 
dubble buffer: 0->4.8E-322,

本例中，IntBuffer对应整型4个字节，DoubleBuffer对应double8个字节。

4. 缓冲器细节

Buffer中存在4个关键索引：mark（标记）、position（位置）、limit（界限）、capacity（容量）。

mark：当调reset()方法的时候，会将position移动到mark位置，然后重新处理数据。

position：即当前Buffer读取或写入的位置。

limit：当前Buffer读物或写入的界限，即position不会超过limit

capacity：缓冲区的总容量。

详细介绍可以参考《Java编程思想》第560-563页或Java NIO系列教程（三） Channel之Socket通道

posted @ 2019-01-26 18:05 水木竹水阅读(597) 评论(0) 编辑收藏举报

刷新页面返回顶部

水木竹水