scala+spark+Hbase
-
spark特点与应用场景
- Spark是通用的并行化计算框架,基于MapReduce实现分布式计算,其中间结果可以保存在内存中,从而不再需要读写HDFS。
- 特点:
- 简单方便,使用scala语言。(与RDD很好结合)
- 计算速度快,中间结果缓存在内存中。
- 高错误容忍。
- 操作丰富。
- 广播,每个节点可以保留一份小数据集。
- 核心:RDD(Resilient Distributed Datasets弹性分布式数据集)
- 应用场景:
- 迭代式算法:迭代式机器学习、图算法,包括PageRank、K-means聚类和逻辑回归(logistic regression)。
-
交互式数据挖掘工具:用户在同一数据子集上运行多个Adhoc查询。
- 在上篇博文中搭建了zookpree+hadoop集群,接下来准备搭建scala+spark+Hbase完善下集群。
- rz -E 上传scala+spark+hbase包
- tar -zxvf scala-2.11.8.tgz
- tar -zxf spark-2.0.1-bin-hadoop2.7.tgz
-
Spark的安装教程
-
安装JDK与Scala
- 下载JDK:sudo apt-get install openjdk-7-jre-headless。
- 下载Scala: http://www.scala-lang.org/。
- 解压缩:tar -zxvf scala-2.11.8.tgz。
- 进入sudo vim /etc/profile在下面添加路径:(vi .bashrc)
- export SCALA_HOME=/data/app/scala-2.11.8
- export SPARK_HOME=/data/app/spark-2.0.1-bin-hadoop2.7
- export HBASE_HOME=/data/app/hbase-1.2.3
- export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$ZOOKEEPER/bin:$HADOOP/bin:$HADOOP/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$HBASE_HOME/bin
- 使修改生效source /etc/profile。 source /etc/profile
- 在命令行输入scala测试。
- scala
- Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
- Type in expressions for evaluation. Or try :help.
- scala>
- 出现这个证明scala已经安装成功。
-
安装Spark
- 下载Spark: http://spark.apache.org/downloads.html
- 解压缩:spark-2.0.1-bin-hadoop2.7.tgz
- 进入sudo vim /etc/profile在下面添加路径:
- (上面已经添加过了就不用添加了)
- vi /home/soft/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh
-
export JAVA_HOME=/home/soft/app/jdk1.8.0_101
export SCALA_HOME=/home/soft/app/scala-2.11.8
export HADOOP_HOME=/home/soft/app/hadoop-2.7.3
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node1:2181,node2:2181,node3:2181,node4:2181,node5:2181 -Dspark.deploy
.zookeeper.dir=/spark"
export SPARK_EXECUTOR_MEMORY=5g
export SPARK_WORKER_MEMORY=7g
export SPARK_LOG_DIR=/data/logs/spark_logs/ - mkdir -pv /data/logs/spark_logs/
vi /home/soft/app/spark-2.0.1-bin-hadoop2.7/conf/slaves
node1
node2
node3
node4
media@node1:~$ spark-shell
/data/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh: line 72: unexpected EOF while looking for matching `"'
/data/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh: line 76: syntax error: unexpected end of file
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/07/11 11:41:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/11 11:41:47 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.31.81.41:4040
Spark context available as 'sc' (master = local[*], app id = local-1499744506831).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
出现这个界面spark安装成功!!!
安装hbase
cd /home/soft/app/hbase-1.2.3/conf
cat regionservers
node1
node2
node3
node4
cat backup-masters
node2
cat hbase-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- /** * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ --> <configuration> <property> <name>dfs.ha.namenodes.ns</name> <value>node1,node2</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://ns/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.master</name> <value>16000</value> </property> <property> <name>hbase.master.port</name> <value>16000</value> </property> <property> <name>hbase.master.info.port</name> <value>16010</value> </property> <property> <name>hbase.regionserver.port</name> <value>16020</value> </property> <property> <name>hbase.regionserver.info.port</name> <value>16030</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect. </description> </property> <property> <name>hbase.zookeeper.quorum</name> <value>node1,node2,node3,node4,node5</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/data/zk_data</value> </property> <property> <name>zookeeper.session.timeout</name> <value>180000</value> </property> <property> <name>hbase.zookeeper.property.tickTime</name> <value>9000</value> </property> <property> <name>hbase.tmp.dir</name> <value>/data/hbase/tmp</value> </property> </configuration>
mkdir -pv /data/hbase/tm
vi hbase-env.sh
27 export JAVA_HOME=/home/media/app/jdk1.8
30 export HBASE_CLASSPATH=/home/media/app/hadoop-2.7.3/etc/hadoop