Hadoop2.6.0 + Spark1.4.0 在Ubuntu14.10环境下的伪分布式集群的搭建(实践可用)
前言,之前曾多次搭建集群,由于疏于记录,每次搭建的时候到处翻阅博客,很是费劲,在此特别记录集群的搭建过程。
0、环境:Ubuntu14.10、Hadoop2.6.0、spark-1.4.0
1、安装jdk1.7
(1)下载jdk-7u25-linux-i586.tar.gz;
(2)解压jdk-7u25-linux-i586.tar.gz,并将其移动到 /opt/java/jdk/路径下面
(3)配置java环境变量:
在 /etc/profile文件中追加
#set java env export JAVA_HOME=/opt/java/jdk/jdk1.7.0_25 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
(4)验证,如下则安装成功:
hadoop@ubuntu:~/installs$ java -version java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.0_25-b15) Java HotSpot(TM) Client VM (build 23.25-b01, mixed mode)
特别注意:之前在root用户下安装好jdk,然后切换到hadoop用户下执行java -version就报错,最后排查是因为把java环境变量配置到~/.bashrc中了,重新配置到/etc/profile后,问题解决。
2、安装并配置ssh
由于在线安装故障连连,我选择了离线安装:
(1)下载ssh包
“在launchpad.net/Ubuntu/中搜索openssh,根据搜索结果选择对应开发代号下选择相应版本即可。本文是在Ubuntu 12.10上安装的,而其对应的开发代号为Quantal Quetzal,运行环境为i386,故而下载以下三个文件:openssh-client_6.0p1-3ubuntu1_i386.deb、openssh-server_6.0p1-3ubuntu1_i386.deb、ssh_6.0p1- 3ubuntu1_all.deb。”
(2)运行安装命令
依次运行如下安装命令: sudo dpkg -i openssh-client_6.0p1-3ubuntu1_i386.deb sudo dpkg -i openssh-server_6.0p1-3ubuntu1_i386.deb sudo dpkg -i ssh_6.0p1-3ubuntu1_all.deb
(3)验证,执行 ssh localhost 能登录则说明安装成功。
(4)ssh免密码登录(root用户下)
ssh-keygen -t rsa -P ""然后一直回车即可 cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
3、安装配置Hadoop
(1)安装hadoop
将hadoop-2.6.0.tar.gz 解压到 /opt/hadoop/路径下;
(2)配置hadoop({HADOOP_HOME}/etc/hadoop路径下)
配置hadoop-env.sh,追加java环境变量
#java env export JAVA_HOME=/opt/java/jdk/jdk1.7.0_25
(3)配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> </configuration>
(4)配置hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/dfs/name</value> </property> <property> <name>dfs.datannode.data.dir</name> <value>/home/hadoop/dfs/data</value> </property> </configuration>
(5)配置mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
(6)配置yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- Site specific YARN configuration properties --> </configuration>
(7)格式化namenode,并启动集群
bin/hdfs namenode -format
sbin/start-all.sh
可通过localhost:50070和localhost:8088 查看Web或者用bin/hadoop dfsadmin -report命令查看集群是否正常启动,如下:
hadoop@ubuntu:/opt/hadoop/hadoop-2.6.0$ bin/hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 15/10/22 01:34:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 19945680896 (18.58 GB) Present Capacity: 13635391488 (12.70 GB) DFS Remaining: 13635178496 (12.70 GB) DFS Used: 212992 (208 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Live datanodes (1): Name: 127.0.0.1:50010 (localhost) Hostname: ubuntu Decommission Status : Normal Configured Capacity: 19945680896 (18.58 GB) DFS Used: 212992 (208 KB) Non DFS Used: 6310289408 (5.88 GB) DFS Remaining: 13635178496 (12.70 GB) DFS Used%: 0.00% DFS Remaining%: 68.36% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Thu Oct 22 01:34:25 PDT 2015
(8)运行WordCount
$bin/hadoop fs -mkdir /input $bin/hadoop fs -copyFromLocal /home/test.txt /input $cd /opt/hadoop/hadoop-2.6.0/share/hadoop/mapreduce $/opt/hadoop/hadoop-2.6.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output 查看结果: $/opt/hadoop/hadoop-2.6.0/bin/hadoop fs -cat /output/*
4、安装配置Spark1.4
将spark-1.4.0-bin-hadoop2.6.tgz解压到 /opt/spark/路径下
验证:可通过Web管理页面localhost:4040或者运行自带程序验证(bin/run-example
SparkPi 10
)
安装成功:在spark目录下,运行spark-shell将出现如下:
hadoop@ubuntu:/opt/spark$ bin/spark-shell log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/10/22 01:44:26 INFO SecurityManager: Changing view acls to: hadoop 15/10/22 01:44:26 INFO SecurityManager: Changing modify acls to: hadoop 15/10/22 01:44:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/10/22 01:44:26 INFO HttpServer: Starting HTTP Server 15/10/22 01:44:27 INFO Utils: Successfully started service 'HTTP class server' on port 51327. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_25) Type in expressions to have them evaluated. Type :help for more information. 15/10/22 01:44:36 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.111.130 instead (on interface eth0) 15/10/22 01:44:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/10/22 01:44:36 INFO SparkContext: Running Spark version 1.4.0 15/10/22 01:44:36 INFO SecurityManager: Changing view acls to: hadoop 15/10/22 01:44:36 INFO SecurityManager: Changing modify acls to: hadoop 15/10/22 01:44:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/10/22 01:44:37 INFO Slf4jLogger: Slf4jLogger started 15/10/22 01:44:37 INFO Remoting: Starting remoting 15/10/22 01:44:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.111.130:35977] 15/10/22 01:44:38 INFO Utils: Successfully started service 'sparkDriver' on port 35977. 15/10/22 01:44:38 INFO SparkEnv: Registering MapOutputTracker 15/10/22 01:44:38 INFO SparkEnv: Registering BlockManagerMaster 15/10/22 01:44:38 INFO DiskBlockManager: Created local directory at /tmp/spark-08e380aa-a102-48a2-91e3-b358cb2a6a35/blockmgr-d25aa3bd-b1af-4746-9d1a-edd7e8f1e08c 15/10/22 01:44:38 INFO MemoryStore: MemoryStore started with capacity 267.3 MB 15/10/22 01:44:39 INFO HttpFileServer: HTTP File server directory is /tmp/spark-08e380aa-a102-48a2-91e3-b358cb2a6a35/httpd-4113cef7-2865-4efd-890a-19fcbde49bcb 15/10/22 01:44:39 INFO HttpServer: Starting HTTP Server 15/10/22 01:44:39 INFO Utils: Successfully started service 'HTTP file server' on port 33633. 15/10/22 01:44:39 INFO SparkEnv: Registering OutputCommitCoordinator 15/10/22 01:44:41 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/10/22 01:44:41 INFO SparkUI: Started SparkUI at http://192.168.111.130:4040 15/10/22 01:44:42 INFO Executor: Starting executor ID driver on host localhost 15/10/22 01:44:42 INFO Executor: Using REPL class URI: http://192.168.111.130:51327 15/10/22 01:44:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37625. 15/10/22 01:44:45 INFO NettyBlockTransferService: Server created on 37625 15/10/22 01:44:45 INFO BlockManagerMaster: Trying to register BlockManager 15/10/22 01:44:45 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37625 with 267.3 MB RAM, BlockManagerId(driver, localhost, 37625) 15/10/22 01:44:45 INFO BlockManagerMaster: Registered BlockManager 15/10/22 01:44:45 INFO SparkILoop: Created spark context.. Spark context available as sc. 15/10/22 01:44:48 INFO HiveContext: Initializing execution hive, version 0.13.1 15/10/22 01:44:49 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 15/10/22 01:44:49 INFO ObjectStore: ObjectStore, initialize called 15/10/22 01:44:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 15/10/22 01:44:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 15/10/22 01:44:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) Thu Oct 22 01:44:51 PDT 2015 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied) 15/10/22 01:44:51 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) ---------------------------------------------------------------- Thu Oct 22 01:44:51 PDT 2015: Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.1.1 - (1458268): instance a816c00e-0150-8eb8-dd90-0000186374f8 on database directory /tmp/spark-ea20e824-5489-4ead-a2d7-c8b14434dc51/metastore with class loader sun.misc.Launcher$AppClassLoader@1b56848 Loaded from file:/opt/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar java.vendor=Oracle Corporation java.runtime.version=1.7.0_25-b15 user.dir=/opt/spark os.name=Linux os.arch=i386 os.version=3.16.0-23-generic derby.system.home=null Database Class Loader started - derby.database.classpath='' 15/10/22 01:44:53 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 15/10/22 01:44:53 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "". 15/10/22 01:44:54 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 15/10/22 01:44:54 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 15/10/22 01:44:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 15/10/22 01:44:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 15/10/22 01:44:55 INFO ObjectStore: Initialized ObjectStore 15/10/22 01:44:56 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa 15/10/22 01:44:56 INFO HiveMetaStore: Added admin role in metastore 15/10/22 01:44:56 INFO HiveMetaStore: Added public role in metastore 15/10/22 01:44:56 INFO HiveMetaStore: No user is added in admin role, since config is empty 15/10/22 01:44:57 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr. 15/10/22 01:44:57 INFO SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala>
参考文献:
1、http://www.aboutyun.com/thread-10554-1-1.html
2、http://www.linuxidc.com/Linux/2013-04/82814.htm
3、http://blog.csdn.net/jediael_lu/article/details/45314317