hadoop中的压缩方式
1、在代码中设置压缩
设置我们的map阶段的压缩
Configuration configuration = new Configuration();
configuration.set("mapreduce.map.output.compress","true");
configuration.set("mapreduce.map.output.compress.codec","org.apache.hadoop.io.compress.SnappyCodec");
设置我们的reduce阶段的压缩
configuration.set("mapreduce.output.fileoutputformat.compress","true");
configuration.set("mapreduce.output.fileoutputformat.compress.type","RECORD");
configuration.set("mapreduce.output.fileoutputformat.compress.codec","org.apache.hadoop.io.compress.SnappyCodec");
2、配置全局的MapReduce压缩
我们可以修改mapred-site.xml配置文件,然后重启集群,以便对所有的mapreduce任务进行压缩
map输出数据进行压缩
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
reduce输出数据进行压缩
<property> <name>mapreduce.output.fileoutputformat.compress</name>
<value>true</value>
</property>
<property> <name>mapreduce.output.fileoutputformat.compress.type</name>
<value>RECORD</value>
</property>
<property> <name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
所有节点都要修改mapred-site.xml,修改完成之后记得重启集群