hadoop, hive 启用LZO压缩

环境:

   ubuntu

   hadoop-2.6.0 

   hive-1.1.0

 
1  
sudo apt-get install liblzo2-dev
 
hadoop@idex140:~/modules/hadoop-2.6.0$ dpkg -L liblzo2-2  (查看安装包的位置)
/.
/usr
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/liblzo2.so.2.0.0
/usr/share
/usr/share/doc
/usr/share/doc/liblzo2-2
/usr/share/doc/liblzo2-2/THANKS
/usr/share/doc/liblzo2-2/AUTHORS
/usr/share/doc/liblzo2-2/changelog.Debian.gz
/usr/share/doc/liblzo2-2/copyright
/usr/share/doc/liblzo2-2/LZO.TXT.gz
/usr/lib/x86_64-linux-gnu/liblzo2.so.2

 

2  
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.09.tar.gz

 

3  
tar -xzvf lzo-2.09.tar.gz  
cd lzo-2.09
export CFLAGS=-m64 (字段64位操作系统)
./configure --enable-shared --prefix /usr/local/lzo-2.09
make && sudo make install

 

5  
sudo apt-get install lzop

 

6  
hadoop@master:~/hadoop-lzo$ C_INCLUDE_PATH=/usr/local/lzo-2.09/include/ \
   > LIBRARY_PATH=/usr/local/lzo-2.09/lib/ \
   > CXXFLAGS=-m64 \
   > mvn clean package  (修改hadoop.version为对应正确的版本)

 

7   

tar -cBf - -C target/native/Linux-amd64-64/lib . | tar -xBvf - -C ~/modules/hadoop-2.6.0/lib/native/

  

8  

cp ${HADOOP_LZO_HOME}/target/hadoop-lzo-0.4.20-SNAPSHOT.jar  ${HADOOP_HOME}/share/hadoop/common/lib/
source /etc/profile

 

9  同步以上操作至其它节点 

 scp lzo-2.09.tar.gz  hadoop-slave1:/home/hadoop/
 scp lzo-2.09.tar.gz  hadoop-slave2:/home/hadoop/
 
 ./configure --enable-shared --prefix /usr/local/lzo-2.09
 make && sudo make install
 
 sudo apt-get install liblzo2-dev
 sudo apt-get install lzop
 
 scp -r libgpl* hadoop-slave1:/home/hadoop/modules/hadoop-2.6.0/lib/native/
 scp -r libgpl* hadoop-slave2:/home/hadoop/modules/hadoop-2.6.0/lib/native/
 
 scp   $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/
 scp   $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/
 source /etc/profile

 

10 更新hadoop配置文件

   (1)在文件$HADOOP_HOME/etc/hadoop/hadoop-env.sh中追加如下内容:
# add lzo environment variables
export LD_LIBRARY_PATH=/usr/local/lzo-2.09/lib

   (2)修改core-size.xml

 
      <property>
        <name>io.compression.codecs </name>
        <value>org.apache.hadoop.io.compress.GzipCodec,
          org.apache.hadoop.io.compress.DefaultCodec,
          com.hadoop.compression.lzo.LzoCodec,
          com.hadoop.compression.lzo.LzopCodec,
          org.apache.hadoop.io.compress.BZip2Codec,
          org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
      <property>
        <name>io.compression.codec.lzo.class </name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
   (3)修改mapred-site.xml
 
      <property>
       <name>mapred.child.env </name>
        <value>LD_LIBRARY_PATH =/usr/local/lzo-2.09/lib </value>
      </property>
       <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress.type</name>
       <value>BLOCK</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress</name>
       <value>false</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress.codec</name>
       <value>org.apache.hadoop.io.compress.DefaultCodec</value>
      </property>

 PS:

       中间结果压缩
 
hadoop设置或者hive设置 属性名称(最新名称) 默认值 过时属性名称
hadoop job mapreduce.map.output.compress false mapred.compress.map.output
mapreduce.map.output.compress.codec org.apache.hadoop.io.compress.DefaultCodec
mapred.map.output.compression.codec
hive   job hive.exec.compress.intermediate false  
 
       最终输出结果压缩
 
hadoop设置或者hive设置 属性名称(最新名称) 默认值 过时属性名称
hadoop job mapreduce.output.fileoutputformat.compress  false mapred.output.compress
mapreduce.output.fileoutputformat.compress.type RECORD mapred.output.compression.type
mapreduce.output.fileoutputformat.compress.codec org.apache.hadoop.io.compress.DefaultCodec mapred.output.compression.codec
hive       job hive.exec.compress.output false  
 
11  hive创建支持存储lzo压缩数据的测试表
 
    CREATE TABLE rawdata(
      appkey string, uid string, uidtype string                            
    )                
    COMMENT 'This is the staging of raw data'
    PARTITIONED BY (day INT)
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY '\t' 
    STORED AS INPUTFORMAT 
      'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; 
 
posted @ 2015-06-05 15:41  清山布衣  阅读(3323)  评论(0编辑  收藏  举报