hadoop的LZO压缩算法配置详解


操作系统:CentOS 5.4(64位),Hadoop版本:hadoop-0.20.2
安装lzo所需要软件包:gcc、ant、lzo、lzo编码/解码器,另外,还需要lzo-devel依赖
配置lzo的文件:core-site.xml、mapred-site.xml

1:安装jdk并配置环境变量(64位)
jdk安装这里就不详细介绍了。
export JAVA_HOME=/usr/java/jdk1.6.0_21
export PATH=$PATH:$JAVA_HOME/bin

2.在集群的所有节点上安装Lzo库,下载地址(http://www.oberhumer.com/opensource/lzo/download/lzo-2.04.tar.gz)

cd /opt/ysz/src/lzo-2.04

./configure --enable-shared

make

make install

#编辑/etc/ld.so.conf,加入/usr/local/lib/后,执行/sbin/ldconfig

或者cp /usr/local/lib/liblzo2.* /usr/lib64/

#如果没有这一步,最终会导致以下错误:(lzo.LzoCompressor: java.lang.UnsatisfiedLinkError: Cannot load liblzo2.so.2 (liblzo2.so.2: cannot open shared object file: No such file or directory)

3:安装ant并配置环境变量
删除旧版本:yum remove ant
安装新版本:
wget http://labs.renren.com/apache-mirror//ant/binaries/apache-ant-1.8.2-bin.tar.gz
tar -jxvf apache-ant-1.8.2-bin.tar.bz2
添加ant的环境变量:
vi /etc/profile
export ANT_HOME=/usr/local/apache-ant-1.8.2
export PATH=$PATH:$ANT_HOME/bin

 

4:安装lzo编码/解码器
安装lzo和lzo-devel:
wget http://pkgs.repoforge.org/lzo/lzo-devel-2.04-1.el5.rf.x86_64.rpm
wget http://pkgs.repoforge.org/lzo/lzo-2.04-1.el5.rf.x86_64.rpm
rpm -ivh lzo*.rpm
#yum install lzo lzo-devel

安装hadoop-lzo:
wget https://github.com/kevinweil/hadoop-lzo/archive/master.zip
unzip master.zip
cd hadoop-lzo-master/
export CFLAGS=-m64
export CXXFLAGS=-m64
ant compile-native tar #编译生成hadoop-lzo包和native包
cp build/hadoop-lzo-*.jar $HADOOP_HOME/lib
cp build/native/Linux-amd64-64/* $HADOOP_HOME/lib/native/Linux-amd64-64

5:配置hadoop
core-site.xml:
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

mapred-site.xml:
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

<property>

<name>mapred.child.env</name>

<value>JAVA_LIBRARY_PATH=/home/hadoop/hadoop/lib/native/Linux-amd64-64</value>
</property>

4:测试
注:把namenode节点上的lib和conf目录复制到各datanode节点。
hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.LzoIndexer /datatest/input

分布式创建索引:

hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.DistributedLzoIndexer /datatest/input

5:安装lzop
wget http://www.lzop.org/download/lzop-1.03.tar.gz
tar -zxvf lzop-1.03
cd lzop-1.03
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
./configure
make && make install
使用lzop压缩文档并上传到hdfs:
lzop -U -9 66_22_2011-04-14.txt
$HADOOP_HOME/bin/hadoop fs -copyFromLocal /home/hdfs/66_22_2011-04-14.txt.lzo /user/s3/ifocus

 

posted @ 2012-11-07 19:16  出发一路向北  阅读(1533)  评论(0编辑  收藏  举报