Hadoop2.6.0 + Spark1.4.0 在Ubuntu14.10环境下的伪分布式集群的搭建(实践可用)

前言,之前曾多次搭建集群,由于疏于记录,每次搭建的时候到处翻阅博客,很是费劲,在此特别记录集群的搭建过程。

 

0、环境:Ubuntu14.10、Hadoop2.6.0、spark-1.4.0

1、安装jdk1.7

  (1)下载jdk-7u25-linux-i586.tar.gz;

  (2)解压jdk-7u25-linux-i586.tar.gz,并将其移动到 /opt/java/jdk/路径下面

  (3)配置java环境变量:

    在 /etc/profile文件中追加  

  #set java env
  export JAVA_HOME=/opt/java/jdk/jdk1.7.0_25
  export JRE_HOME=${JAVA_HOME}/jre
  export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
  export PATH=${JAVA_HOME}/bin:$PATH

  (4)验证,如下则安装成功:

hadoop@ubuntu:~/installs$ java -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) Client VM (build 23.25-b01, mixed mode)

  特别注意:之前在root用户下安装好jdk,然后切换到hadoop用户下执行java -version就报错,最后排查是因为把java环境变量配置到~/.bashrc中了,重新配置到/etc/profile后,问题解决。

2、安装并配置ssh

  由于在线安装故障连连,我选择了离线安装:

  (1)下载ssh包

   “在launchpad.net/Ubuntu/中搜索openssh,根据搜索结果选择对应开发代号下选择相应版本即可。本文是在Ubuntu 12.10上安装的,而其对应的开发代号为Quantal   Quetzal,运行环境为i386,故而下载以下三个文件:openssh-client_6.0p1-3ubuntu1_i386.deb、openssh-server_6.0p1-3ubuntu1_i386.deb、ssh_6.0p1-               3ubuntu1_all.deb。”

     (2)运行安装命令

  依次运行如下安装命令:

  sudo dpkg -i openssh-client_6.0p1-3ubuntu1_i386.deb
  sudo dpkg -i openssh-server_6.0p1-3ubuntu1_i386.deb
  sudo dpkg -i ssh_6.0p1-3ubuntu1_all.deb

  (3)验证,执行 ssh localhost 能登录则说明安装成功。

  (4)ssh免密码登录(root用户下)

  ssh-keygen -t rsa -P ""然后一直回车即可
  cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

3、安装配置Hadoop

  (1)安装hadoop

  将hadoop-2.6.0.tar.gz 解压到 /opt/hadoop/路径下;

  (2)配置hadoop({HADOOP_HOME}/etc/hadoop路径下)

  配置hadoop-env.sh,追加java环境变量 

  #java env
  export JAVA_HOME=/opt/java/jdk/jdk1.7.0_25

  (3)配置core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>

</configuration>

  (4)配置hdfs-site.xml

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

<property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/dfs/name</value>
</property>

<property>
    <name>dfs.datannode.data.dir</name>
    <value>/home/hadoop/dfs/data</value>
</property>
</configuration>

  (5)配置mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

  (6)配置yarn-site.xml

<configuration>

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

<!-- Site specific YARN configuration properties -->

</configuration>

  (7)格式化namenode,并启动集群

  bin/hdfs namenode -format

  sbin/start-all.sh

  可通过localhost:50070和localhost:8088 查看Web或者用bin/hadoop dfsadmin -report命令查看集群是否正常启动,如下:

hadoop@ubuntu:/opt/hadoop/hadoop-2.6.0$ bin/hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

15/10/22 01:34:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 19945680896 (18.58 GB)
Present Capacity: 13635391488 (12.70 GB)
DFS Remaining: 13635178496 (12.70 GB)
DFS Used: 212992 (208 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (1):

Name: 127.0.0.1:50010 (localhost)
Hostname: ubuntu
Decommission Status : Normal
Configured Capacity: 19945680896 (18.58 GB)
DFS Used: 212992 (208 KB)
Non DFS Used: 6310289408 (5.88 GB)
DFS Remaining: 13635178496 (12.70 GB)
DFS Used%: 0.00%
DFS Remaining%: 68.36%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Oct 22 01:34:25 PDT 2015

 

  (8)运行WordCount

$bin/hadoop fs -mkdir /input
$bin/hadoop fs -copyFromLocal /home/test.txt /input 
$cd  /opt/hadoop/hadoop-2.6.0/share/hadoop/mapreduce
$/opt/hadoop/hadoop-2.6.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
查看结果:
 $/opt/hadoop/hadoop-2.6.0/bin/hadoop fs -cat /output/*

4、安装配置Spark1.4

  将spark-1.4.0-bin-hadoop2.6.tgz解压到 /opt/spark/路径下

  验证:可通过Web管理页面localhost:4040或者运行自带程序验证(bin/run-example SparkPi 10

  安装成功:在spark目录下,运行spark-shell将出现如下:

hadoop@ubuntu:/opt/spark$ bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/10/22 01:44:26 INFO SecurityManager: Changing view acls to: hadoop
15/10/22 01:44:26 INFO SecurityManager: Changing modify acls to: hadoop
15/10/22 01:44:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/10/22 01:44:26 INFO HttpServer: Starting HTTP Server
15/10/22 01:44:27 INFO Utils: Successfully started service 'HTTP class server' on port 51327.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_25)
Type in expressions to have them evaluated.
Type :help for more information.
15/10/22 01:44:36 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.111.130 instead (on interface eth0)
15/10/22 01:44:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/10/22 01:44:36 INFO SparkContext: Running Spark version 1.4.0
15/10/22 01:44:36 INFO SecurityManager: Changing view acls to: hadoop
15/10/22 01:44:36 INFO SecurityManager: Changing modify acls to: hadoop
15/10/22 01:44:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/10/22 01:44:37 INFO Slf4jLogger: Slf4jLogger started
15/10/22 01:44:37 INFO Remoting: Starting remoting
15/10/22 01:44:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.111.130:35977]
15/10/22 01:44:38 INFO Utils: Successfully started service 'sparkDriver' on port 35977.
15/10/22 01:44:38 INFO SparkEnv: Registering MapOutputTracker
15/10/22 01:44:38 INFO SparkEnv: Registering BlockManagerMaster
15/10/22 01:44:38 INFO DiskBlockManager: Created local directory at /tmp/spark-08e380aa-a102-48a2-91e3-b358cb2a6a35/blockmgr-d25aa3bd-b1af-4746-9d1a-edd7e8f1e08c
15/10/22 01:44:38 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/10/22 01:44:39 INFO HttpFileServer: HTTP File server directory is /tmp/spark-08e380aa-a102-48a2-91e3-b358cb2a6a35/httpd-4113cef7-2865-4efd-890a-19fcbde49bcb
15/10/22 01:44:39 INFO HttpServer: Starting HTTP Server
15/10/22 01:44:39 INFO Utils: Successfully started service 'HTTP file server' on port 33633.
15/10/22 01:44:39 INFO SparkEnv: Registering OutputCommitCoordinator
15/10/22 01:44:41 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/10/22 01:44:41 INFO SparkUI: Started SparkUI at http://192.168.111.130:4040
15/10/22 01:44:42 INFO Executor: Starting executor ID driver on host localhost
15/10/22 01:44:42 INFO Executor: Using REPL class URI: http://192.168.111.130:51327
15/10/22 01:44:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37625.
15/10/22 01:44:45 INFO NettyBlockTransferService: Server created on 37625
15/10/22 01:44:45 INFO BlockManagerMaster: Trying to register BlockManager
15/10/22 01:44:45 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37625 with 267.3 MB RAM, BlockManagerId(driver, localhost, 37625)
15/10/22 01:44:45 INFO BlockManagerMaster: Registered BlockManager
15/10/22 01:44:45 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/10/22 01:44:48 INFO HiveContext: Initializing execution hive, version 0.13.1
15/10/22 01:44:49 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/10/22 01:44:49 INFO ObjectStore: ObjectStore, initialize called
15/10/22 01:44:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/10/22 01:44:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/10/22 01:44:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
Thu Oct 22 01:44:51 PDT 2015 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
15/10/22 01:44:51 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
----------------------------------------------------------------
Thu Oct 22 01:44:51 PDT 2015:
Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.1.1 - (1458268): instance a816c00e-0150-8eb8-dd90-0000186374f8 
on database directory /tmp/spark-ea20e824-5489-4ead-a2d7-c8b14434dc51/metastore with class loader sun.misc.Launcher$AppClassLoader@1b56848 
Loaded from file:/opt/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
java.vendor=Oracle Corporation
java.runtime.version=1.7.0_25-b15
user.dir=/opt/spark
os.name=Linux
os.arch=i386
os.version=3.16.0-23-generic
derby.system.home=null
Database Class Loader started - derby.database.classpath=''
15/10/22 01:44:53 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/10/22 01:44:53 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
15/10/22 01:44:54 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:54 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:55 INFO ObjectStore: Initialized ObjectStore
15/10/22 01:44:56 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/10/22 01:44:56 INFO HiveMetaStore: Added admin role in metastore
15/10/22 01:44:56 INFO HiveMetaStore: Added public role in metastore
15/10/22 01:44:56 INFO HiveMetaStore: No user is added in admin role, since config is empty
15/10/22 01:44:57 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/10/22 01:44:57 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 

 

参考文献:

1、http://www.aboutyun.com/thread-10554-1-1.html

2、http://www.linuxidc.com/Linux/2013-04/82814.htm

3、http://blog.csdn.net/jediael_lu/article/details/45314317

posted @ 2015-10-22 16:47  MERRU  阅读(1893)  评论(0编辑  收藏  举报