Hadoop开发环境搭建

                之前稍微了解了一下大数据方面的知识,在搭建环境的时候我很惆怅的,因为那时候没有弄好,刚好这几天有时间,于是把以前没有弄好的又来配置了一下,没想到居然成功了,这个过程走了很多的弯路,查阅了网上大量资料,终于直接也来整理一下了。

               hadoop是分为3种安装模式的,单机模式,伪分布模式和完全分布模式。众所周知,完全分布模式是企业真实开发用的,会需要多台电脑,这里没有条件,所以不讲解这个方面。伪分布模式就是可以多夹几台虚拟机就可以了,然后各种配置,这里我主要是想做大数据的分析方面,暂时不考虑运维,所以我只用的是单机模式。这里介绍运行伪分布式配置。

          

一、配置hadoop

首先去github上面下载源码:http://hadoop.apache.org/releases.html#News

           在linux中安装jdk和配置ssh免密码登陆,其实最开始我并没有区配置免密码的,后来发现免密码登陆真方便。免密码登陆就是在控制台上面输入:ssh  -keygen   然后一路回车就可以了。

         我是把hadoop解压后放在/home/admin1/下载/hadoop-2.5.2。我们主要是在etc/hadoop  中配置文件。

      1、core-site.xml

<pre name="code" class="java"><configuration>

   <property>

      <name>fs.defaultFS</name>

      <value>hdfs://ubuntu2:9000</value>

   </property>
 
   <property>

      <name>hadoop.tmp.dir</name>
      <value>/home/admin1/hadoop/hadoop-2.5.2/tmp/hadoop</value>

   </property>


</configuration>


2、hdfs-site.xml
 <configuration>

  <property>

     <name>dfs.replication</name>

     <value>1</value>

  </property>

</configuration>

3、mapred-site.xml

<pre name="code" class="java"> <configuration>

 	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
    </property>

</configuration>


4、在yarn-site.xml中

<configuration>

<property>
			<name>yarn.resourcemanager.hostname</name>
			<value>ubuntu2</value>
    </property>
		<!-- reducer获取数据的方式 -->
    <property>
			<name>yarn.nodemanager.aux-services</name>
			<value>mapreduce_shuffle</value>
     </property>
</configuration>

5、在hadoop-env.sh中添加

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64


# The jsvc implementation to use. Jsvc is required to run secure datanodes.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/admin1/hadoop/hadoop-2.5.2
"}
6、记得把salver改成你主机的名字,我这里就是把localhost改成了ubuntu2.因为我主机名就是ubuntu2

还需要配置一下java_home:在hadoop-env.sh中:(就是改成你的绝对路径就可以了,不要用相对路径)。

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64   
如果你运行报错的话,那应该就是java_home没有配置了

用命令行输入: sudo gedit /etc/profile

在最末尾加上:

JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export JAVA_HOME

export PATH

export CLASSPATH 

这样基本上应该是可以运行了:

启动方式:  在/home/admin1/下载/hadoop-2.5.2   中   使用:
bin/hadoop namenode -format

sbin/start-all.sh

访问地址是:
http://localhost:50070/



http://localhost:8088/cluster


如果想要关闭则:sbin/stop-all.sh


问题:

若hadoop报错,则配置hadoop命令:
export HADOOP_HOME=/home/admin1/下载/hadoop-2.5.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH


若datanode未启动
rm -rf /usr/hadoop/tmp/*
rm -rf /tmp/hadoop*
在1之前进行以上两步操作。


二、配置eclipse

              因为我是是直接把eclipse安装在linux中的,同时我用的还是4.5版本的,然后出现了各种各样奇奇怪怪的问题。我最开始以为是hadoop-eclipse-plugin-2.5.2.jar有问题,然后我就去下了应该ant和hadoop2x-eclipse-plugin-master来重新编译,编译的方法大致是这样:

         1、先去下载hadoop2x-eclipse-plugin-master,https://github.com/winghc/hadoop2x-eclipse-plugin

        2、下载ant:http://ant.apache.org/bindownload.cgi

       3、分别解压出来,如果你的是在win下面就需要配置一下环境变量,在linux下面不要配置其实也是可以运行的。进入你刚才解压下来的 /hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin中。

     4、在上面哪个目录下,对以下文件进行修改:

         在vxy中:libraries.properties

    里面的 版本记得要和你直接版本相对应,如果你不找到你的版本的话可以去你哪个hadoop文件的share/hadoop中找,其实有个很简单的办法,你可以直接把我这段代码复制过去,等下编译的时候汇报哪个jar包的版本不对,然后你可以根据报错提示来改就可以了。

#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.
 
#This properties file lists the versions of the various artifacts used by hadoop and components.
#It drives ivy and the generation of a maven POM
# This is the version of hadoop we are generating
hadoop.version=2.5.2
hadoop-gpl-compression.version=0.1.0
 
#These are the versions of our dependencies (in alphabetical order)
apacheant.version=1.7.0
ant-task.version=2.0.10
 
asm.version=3.2
aspectj.version=1.6.5
aspectj.version=1.6.11
 
checkstyle.version=4.2
 
commons-cli.version=1.2
commons-codec.version=1.4
commons-collections.version=3.2.1
commons-configuration.version=1.6
commons-daemon.version=1.0.13
commons-httpclient.version=3.1
commons-lang.version=2.6
commons-logging.version=1.1.3
commons-logging-api.version=1.0.4
commons-math.version=3.1.1
commons-el.version=1.0
commons-fileupload.version=1.2
commons-io.version=2.4
commons-net.version=3.1
core.version=3.1.1
coreplugin.version=1.3.2
 
hsqldb.version=1.8.0.10
 
ivy.version=2.1.0
 
jasper.version=5.5.12
jackson.version=1.9.13
#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy
# but still declared here as we are going to have a local copy from the lib folder
jsp.version=2.1
jsp-api.version=5.5.12
jsp-api-2.1.version=6.1.14
jsp-2.1.version=6.1.14
jets3t.version=0.6.1
jetty.version=6.1.26
jetty-util.version=6.1.26
jersey-core.version=1.9
jersey-json.version=1.9
jersey-server.version=1.9
junit.version=4.11
jdeb.version=0.8
jdiff.version=1.0.9
json.version=1.0
 
kfs.version=0.1
 
log4j.version=1.2.17
lucene-core.version=2.3.1
 
mockito-all.version=1.8.5
jsch.version=0.1.42
 
oro.version=2.0.8
 
rats-lib.version=0.5.1
 
servlet.version=4.0.6
servlet-api.version=2.5
slf4j-api.version=1.7.5
slf4j-log4j12.version=1.7.5
 
wagon-http.version=1.0-beta-2
xmlenc.version=0.52
xerces.version=1.4.4

protobuf.version=2.5.0
guava.version=11.0.2
netty.version=3.6.2.Final

build.xml

这个也是同理的,如果你版本不对可以按上面的道理来的。 xml的文件头记得要顶格写。

<?xml version="1.0" encoding="UTF-8" standalone="no"?>  
      
    <!--  
       Licensed to the Apache Software Foundation (ASF) under one or more  
       contributor license agreements.  See the NOTICE file distributed with  
       this work for additional information regarding copyright ownership.  
       The ASF licenses this file to You under the Apache License, Version 2.0  
       (the "License"); you may not use this file except in compliance with  
       the License.  You may obtain a copy of the License at  
      
           http://www.apache.org/licenses/LICENSE-2.0  
      
       Unless required by applicable law or agreed to in writing, software  
       distributed under the License is distributed on an "AS IS" BASIS,  
       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
       See the License for the specific language governing permissions and  
       limitations under the License.  
    -->  
      
    <project default="jar" name="eclipse-plugin">  
      
      <import file="../build-contrib.xml"/>  
      
      <path id="eclipse-sdk-jars">  
        <fileset dir="${eclipse.home}/plugins/">  
          <include name="org.eclipse.ui*.jar"/>  
          <include name="org.eclipse.jdt*.jar"/>  
          <include name="org.eclipse.core*.jar"/>  
          <include name="org.eclipse.equinox*.jar"/>  
          <include name="org.eclipse.debug*.jar"/>  
          <include name="org.eclipse.osgi*.jar"/>  
          <include name="org.eclipse.swt*.jar"/>  
          <include name="org.eclipse.jface*.jar"/>  
      
          <include name="org.eclipse.team.cvs.ssh2*.jar"/>  
          <include name="com.jcraft.jsch*.jar"/>  
        </fileset>   
      </path>  
      
      <path id="hadoop-sdk-jars">  
        <fileset dir="${hadoop.home}/share/hadoop/mapreduce">  
          <include name="hadoop*.jar"/>  
        </fileset>   
        <fileset dir="${hadoop.home}/share/hadoop/hdfs">  
          <include name="hadoop*.jar"/>  
        </fileset>   
        <fileset dir="${hadoop.home}/share/hadoop/common">  
          <include name="hadoop*.jar"/>  
        </fileset>   
      </path>  
      
      
      
      <!-- Override classpath to include Eclipse SDK jars -->  
      <path id="classpath">  
        <pathelement location="${build.classes}"/>  
        <!--pathelement location="${hadoop.root}/build/classes"/-->  
        <path refid="eclipse-sdk-jars"/>  
        <path refid="hadoop-sdk-jars"/>  
      </path>  
      
      <!-- Skip building if eclipse.home is unset. -->  
      <target name="check-contrib" unless="eclipse.home">  
        <property name="skip.contrib" value="yes"/>  
        <echo message="eclipse.home unset: skipping eclipse plugin"/>  
      </target>  
      
     <!--<target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">-->  
     <!-- 此处去掉 depends="init, ivy-retrieve-common" -->  
     <target name="compile"  unless="skip.contrib">  
        <echo message="contrib: ${name}"/>  
        <javac  
         encoding="${build.encoding}"  
         srcdir="${src.dir}"  
         includes="**/*.java"  
         destdir="${build.classes}"  
         debug="${javac.debug}"  
         deprecation="${javac.deprecation}">  
         <classpath refid="classpath"/>  
        </javac>  
      </target>  
      
      <!-- Override jar target to specify manifest -->  
      <target name="jar" depends="compile" unless="skip.contrib">  
        <mkdir dir="${build.dir}/lib"/>  
        <copy  todir="${build.dir}/lib/" verbose="true">  
              <fileset dir="${hadoop.home}/share/hadoop/mapreduce">  
               <include name="hadoop*.jar"/>  
              </fileset>  
        </copy>  
        <copy  todir="${build.dir}/lib/" verbose="true">  
              <fileset dir="${hadoop.home}/share/hadoop/common">  
               <include name="hadoop*.jar"/>  
              </fileset>  
        </copy>  
        <copy  todir="${build.dir}/lib/" verbose="true">  
              <fileset dir="${hadoop.home}/share/hadoop/hdfs">  
               <include name="hadoop*.jar"/>  
              </fileset>  
        </copy>  
        <copy  todir="${build.dir}/lib/" verbose="true">  
              <fileset dir="${hadoop.home}/share/hadoop/yarn">  
               <include name="hadoop*.jar"/>  
              </fileset>  
        </copy>  
      
        <copy  todir="${build.dir}/classes" verbose="true">  
              <fileset dir="${root}/src/java">  
               <include name="*.xml"/>  
              </fileset>  
        </copy>  
      
      
      
        <copy file="${hadoop.home}/share/hadoop/common/lib/protobuf-java-${protobuf.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/log4j-${log4j.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/commons-configuration-${commons-configuration.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/commons-lang-${commons-lang.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <!-- 此处增加 commons-collections 依赖-->  
        <copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-core-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-mapper-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-log4j12-${slf4j-log4j12.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-api-${slf4j-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/guava-${guava.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/hadoop-auth-${hadoop.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.home}/share/hadoop/common/lib/netty-${netty.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
      
        <jar  
          jarfile="${build.dir}/hadoop-${name}-${version}.jar"  
          manifest="${root}/META-INF/MANIFEST.MF">  
          <manifest>  
         <attribute name="Bundle-ClassPath"   
            value="classes/,   
     lib/hadoop-mapreduce-client-core-${hadoop.version}.jar,  
     lib/hadoop-mapreduce-client-common-${hadoop.version}.jar,  
     lib/hadoop-mapreduce-client-jobclient-${hadoop.version}.jar,  
     lib/hadoop-auth-${hadoop.version}.jar,  
     lib/hadoop-common-${hadoop.version}.jar,  
     lib/hadoop-hdfs-${hadoop.version}.jar,  
     lib/protobuf-java-${protobuf.version}.jar,  
     lib/log4j-${log4j.version}.jar,  
     lib/commons-cli-${commons-cli.version}.jar,
lib/commons-configuration-${commons-configuration.version}.jar,
lib/commons-httpclient-${commons-httpclient.version}.jar,
lib/commons-lang-${commons-lang.version}.jar,
lib/jackson-core-asl-${jackson.version},
lib/jackson-mapper-asl-${jackson.version}.jar,
lib/slf4j-log4j12-${slf4j-log4j12.version}.jar,
lib/slf4j-api-${slf4j-api.version}.jar,
     lib/guava-${guava.version}.jar,  
     lib/netty-${netty.version}.jar"/>  
         </manifest>  
          <fileset dir="${build.dir}" includes="classes/ lib/"/>  
          <!--fileset dir="${build.dir}" includes="*.xml"/-->  
          <fileset dir="${root}" includes="resources/ plugin.xml"/>  
        </jar>  
      </target>  
      
    </project>  

makePlus.sh

ant jar -Dversion=2.5.2 -Declipse.home=/home/admin1/Public/eclipse     
-Dhadoop.home=/home/admin1/下载/hadoop-2.5.2

说明:Declipse就是你eclipse的安装目录,Dhadoop就是你hadoop的安装目录。

接着运行   ./makePlus.sh就可以编译了。编译好之后放到eclipse的plugins中,然后重启eclipse就可以了。


好吧,说到这里我的内心崩溃的,这里过程台复杂了有木有,而且最后我重启eclipse居然没有成功难过,后来我发现弄了这么久居然是eclipse的原因,我的内心简直一万头×××,后来我把4.5版的删了,换成4.4的就可以了。当然中间还有一个小插曲,就是配置后不能右键New  Hadoop了,所以我就在命令行输入了:

./eclipse -clean -consolelog -debug

然后重启就没有问题了。

现在来看运行配置:


可以在浏览器上面看到我刚才新建的几个文件夹:



这样就算是完全弄好了,接下来就可以愉快的开始hadoop的开发了。

总结:在学习一个新东西的时候,最开始那一步是很难踏出的很难,甚至让我们很痛苦和烦躁,但是相信吧,累了可以休息下,然后继续完成,我在搭这个环境的时候也搭了近2天,中间走了很多弯路,试了很多的方法,我也尝试了搭建多台虚拟机来做一个伪分布式环境,但是我最后发现我做伪分布式环境并不是我所要的。网上资料非常丰富,要善于找到对自己有用的东西,不要盲目的跟着网上代码跑,要有自己的思路,相信吧,最后会成功的!没有做不到,只有想不到!




posted on 2016-05-25 16:27  王大王  阅读(377)  评论(0编辑  收藏  举报

导航