Hadoop3.1.3+HBase-2.2.4设置snappy压缩

HBase支持snappy的前提是hadoop支持snappy,所以需要先从底层,从hadoop开始添加snappy

同时,snappy设置完成为了保险起见可以进行压测,看看集群的效果,存储压缩效果和性能测试,性能测试报告点击这里

安装Snappy本地库:


 下载snappy:

hadoop@hadoop1$ wget https://src.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.4.tar.gz/sha512/\
873f655713611f4bdfc13ab2a6d09245681f427fbd4f6a7a880a49b8c526875dbdd623e203905450268f542be24a2dc9dae50e6acc1516af1d2ffff3f96553da/\
snappy-1.1.4.tar.gz

安装snappy

hadoop@hadoop1$ tar zxvf snappy-1.1.4.tar.gz -C /tmp/snappy 
hadoop@hadoop1$ cd /tmp/snappy/snappy-1.1.4 
hadoop@hadoop1$ ./autogen.sh 
hadoop@hadoop1$ ./configure 
hadoop@hadoop1$ make 
hadoop@hadoop1$ make install

 编译安装默认是安装到/usr/local/lib下的,拷贝到/usr/lib64下

hadoop@hadoop1$ sudo cp -dr /usr/local/lib/* /usr/lib64

 

安装hadoop-snappy


 安装hadoop-snappy的相关依赖

hadoop@hadoop1$ sudo apt-get install pkg-config libtool automake maven -y

下载,打包hadoop-snappy

hadoop@hadoop1$ git clone https://github.com/electrum/hadoop-snappy.git
hadoop@hadoop1$ cd hadoop-snappy && mvn package 

 

Hadoop配置snappy


 添加snappy本地库到 $HADOOP_HOME/lib/native/ 目录下

hadoop@hadoop1$ cp -dr /usr/local/lib/* /opt/hadoop-3.1.3/lib/native

hadoop-snappy-0.0.1-SNAPSHOT.jar拷贝到 $HADOOP_HOME/lib、snappy的library拷贝到$HADOOP_HOME/lib/native/目录下即可

hadoop@hadoop1$ cp -r /home/hadoop/snappy/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT.jar $HADOOP_HOME/lib
hadoop@hadoop1$ cp /home/hadoop/snappy/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT-tar/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64/* $HADOOP_HOME/lib/native/

 添加配置到hadoopenv.sh

export LD_LIBRARY_PATH=/usr/local/hadoop/hadoop-3.1.3/lib/native:/usr/local/lib/ 

 添加配置到core-site.xml

<!-- 开启压缩 -->    
<property>
     <name>io.compression.codecs</name>
     <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property> 
     <name>io.compression.codec.lzo.class</name
     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

 添加配置到mapred-site.xml

<!-- 这个参数设为true启用压缩 -->
<property>
      <name>mapreduce.output.fileoutputformat.compress</name>
      <value>true</value>
</property>
<property>
      <name>mapreduce.map.output.compress</name>      
      <value>true</value>    
</property>
<!-- 使用编解码器 -->
<property>
      <name>mapreduce.output.fileoutputformat.compress.codec</name>
      <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

 到此配置snappy完成,下面命令是验证(其中/input是HDFS上的目录,下面随便丢几个文本文件即可。同时/output目录必须是不存在的,否则会失败)

hadoop@hadoop1$ hadoop jar /usr/local/hadoop/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output 

 执行若成功,查看/output目录下的文件即可

hadoop@hadoop1:~$ hadoop fs -ls /output5
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2020-08-02 06:11 /output5/_SUCCESS
-rw-r--r--   2 hadoop supergroup       6994 2020-08-02 06:11 /output5/part-r-00000.snappy

对比同样的/input文本但未使用snappy执行的结果如下,6994(snappy)对比23635(非snappy),压缩效果还是挺明显的

hadoop@hadoop1:~$ hadoop fs -ls /output4
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2020-08-02 06:06 /output4/_SUCCESS
-rw-r--r--   2 hadoop supergroup      23635 2020-08-02 06:06 /output4/part-r-00000

 HBase配置snappy


 将hadoop-snappy-0.0.1-SNAPSHOT.jar拷贝到$HBASE_HOME/lib 目录下,同时将$HADOOP_HOME/lib/native软连接到$HBASE_HOME/lib/native/(native目录没有的话创建一个就好了)

hadoop@hadoop1$ cp /home/hadoop/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT.jar $HBASE_HOME/lib

hadoop@hadoop1$ ln -s /opt/hadoop-3.1.3/lib/native /opt/hbase-2.2.4/lib/native/Linux-amd64-64

添加配置到hbase-env.sh

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/hadoop-3.1.3/lib/native/:/usr/local/lib
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:/opt/hbase-2.2.4/lib/native/Linux-amd64-64/:/usr/local/lib/
export CLASSPATH=$CLASSPATH:$HBASE_LIBRARY_PATH

添加配置到hbase-site.xml

<property>
      <name>hbase.regionserver.codecs</name>
      <value>snappy</value>
</property>

然后就是验证snappy功能

hbase org.apache.hadoop.hbase.util.CompressionTest file:///home/hadoop/ouput snappy 

返回如下则为成功

hadoop@hadoop1:~$ hbase org.apache.hadoop.hbase.util.CompressionTest file:///home/hadoop/ouput snappy 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase-2.2.4/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-08-02 10:01:34,858 INFO  [main] metrics.MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
2020-08-02 10:01:34,921 INFO  [main] compress.CodecPool: Got brand-new compressor [.snappy]
2020-08-02 10:01:34,924 INFO  [main] compress.CodecPool: Got brand-new compressor [.snappy]
2020-08-02 10:01:34,983 INFO  [main] compress.CodecPool: Got brand-new decompressor [.snappy]
SUCCESS

进入hbase shell创建带有snappy的表(这里着重强调一下  "创建表时要指定多个region,否则创建表默认一个region,压测时就会疯狂压测region分布的regionserver机器上,会导致负载集中一台,进而导致压测结果无法表达集群的性能")

hbase(main):004:0> create 'snappy-test',  {NUMREGIONS => 10, SPLITALGO => 'HexStringSplit' },{ NAME => 'data', COMPRESSION => 'snappy'}
Created table snappy-test
Took 1.2345 seconds                                                                                                                                               
=> Hbase::Table - snappy-test
hbase(main):005:0> put 'snappy-test', '001', 'data:addr', 'beijing'
Took 0.0078 seconds                                                                                                                                               
hbase(main):006:0> put 'snappy-test', '001', 'data:comp', 'baidu' 
Took 0.0036 seconds                                                                                                                                               
hbase(main):007:0> describe 'snappy-test'
Table snappy-test is ENABLED                                                                                                                                      
snappy-test                                                                                                                                                       
COLUMN FAMILIES DESCRIPTION                                                                                                                                       
{NAME => 'data', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false'
, DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY
 => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'snappy', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}           

1 row(s)

QUOTAS                                                                                                                                                            
0 row(s)
Took 0.0963 seconds                                                                                                                                               
hbase(main):008:0> 

由创建表的代码块可以看出,创建snappy压缩表其实表述不是很准确, 因为看命令行可以了解到,snappy是赋予'data'列族的一个压缩选项,而不是'snappy-test'表的属性,所以执行desc 'snappy-test'所获取的关于snappy的属性本身是列族的属性,若多个列族则可以选择性的指定某个列族是否开启snappy压缩。

如下所示,我又创建一张带有snappy的表,不过这张表有两个列族,可以选择指定某一个列族snappy,或者都压缩,或者都不,或者选择其一进行压缩,都ok:

hbase(main):010:0> create 'snappy-test3', {NUMREGIONS => 10, SPLITALGO => 'HexStringSplit' }, {NAME => 'data',  COMPRESSION => 'snappy'}, {NAME=> 'data1'}
Created table snappy-test3
Took 2.3475 seconds                                                                                                                                                                                                                           
=> Hbase::Table - snappy-test3
hbase(main):012:0> desc 'snappy-test3'
Table snappy-test3 is ENABLED                                                                                                                                                                                                                 
snappy-test3                                                                                                                                                                                                                                  
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                                                   
{NAME => 'data', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPL
ICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 

{NAME => 'data1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REP
LICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}  

2 row(s)

QUOTAS                                                                                                                                                                                                                                        
0 row(s)
Took 0.2907 seconds                                                                                                                                                                                                                           
hbase(main):013:0>

至此,Hadoop,HBase安装snappy就完成了,如果有什么问题或者探讨欢迎评论和联系我,我每天都在线,欢迎讨论

 

posted @ 2020-08-03 01:08  wen1995  阅读(1569)  评论(0编辑  收藏  举报