MapReduce框架原理-OutputFormat工作原理

OutputFormat概述

OutputFormat主要是用来指定MR程序的最终的输出数据格式。

默认使用的是TextOutputFormat，默认是将数据一行写一条数据，并且把数据放到指定的输出目录下，以 part-r-xxxxx数字开头。并且默认情况下有几个ReduceTask就有几个结果文件产生

自定义OutputFormat

自定义OutputFormat的详细流程：

定义MyOutputFormat继承FileOutputFormat<T>，泛型传入的是Reducer的输出类型
重写里面的getRecordWriter()方法，这个方法需要返回一个RecordWriter对象。

这个方法里面定义了最终文件输出到什么地方
创建一个RecordWriter对象，继承RecordWriter<T>，重写里面的两个方法：write()、close()。其中write()方法中需要定义想要将文件输出到什么地方去，在这个方法中定义输出数据地址和输出数据格式
在Driver中通过job.setOutputFormatClass()指定我们使用的是哪个OutputFormat实现类

【注意】如果设置了分区，并且指定了ReduceTask的数量，那么根据以前所学的有多少个ReduceTask就会生成多少个结果文件，是因为默认使用的是TextOutputFormat实现类，这个实现类就是几个ReduceTask就有几个结果文件。但是如果我们自定义了OutputFormat，那么结果文件只有我们指明的地址，没有其他。

案例实操

案例一：存储数据到MySQL中

需求：将手机流量数据根据总流向升序输出到MySQL数据库中

代码：

FlowOutputInformat.java

public class FlowOutputFormat extends FileOutputFormat<FlowBean, NullWritable> {
    @Override
    public RecordWriter<FlowBean, NullWritable> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        return new MyRecordWriter();
    }
}

MyRecordWriter.java

public class MyRecordWriter extends RecordWriter<FlowBean, NullWritable> {
    /**
     * 需要在这个方法中定义输出格式、输出数据地址
     * @param flowBean：Reduce阶段输出数据Key值
     * @param nullWritable：Reduce阶段输出value值
     */
    @SneakyThrows
    @Override
    public void write(FlowBean flowBean, NullWritable nullWritable) throws IOException, InterruptedException {
        Connection connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/sx_bigdata?serverTimezone=UTC", "root", "root");
        PreparedStatement preparedStatement = connection.prepareStatement("insert into phone_flow values (?, ?, ?, ?)");
        preparedStatement.setString(1, flowBean.getPhone());
        preparedStatement.setInt(2, flowBean.getUpFlow());
        preparedStatement.setInt(3, flowBean.getDownFlow());
        preparedStatement.setInt(4, flowBean.getSumFlow());
        int i = preparedStatement.executeUpdate();
        if (i > 0) {
            System.out.println("添加成功！");
        } else {
            System.out.println("添加失败！");
        }
        connection.close();
        preparedStatement.close();
    }

    @Override
    public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {

    }

FlowDriver.java

job.setOutputFormatClass(FlowOutputFormat.class);

案例二：存储数据到HDFS本地指定文件夹中

需求：将单词计数案例结果输出到本地，其中首字母为大写字母存储在/upper.txt目录下，首字母为小写字母存储在/lower.txt目录下