Hadoop实战

本文环境:

OS:CentOS 6.6

JDK:1.7.0_79

User:xavier

Hadoop:1.0.4

 

一、安装Hadoop:
1.1.安装并配置好java
(1)vi /etc/profile
添加内容:
    #Set Java Environment
    export JAVA_HOME="/usr/java/jdk1.7.0_79"
    export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
    export PATH="$JAVA_HOME/bin:$PATH"
(2)source /etc/profile

(3)update-alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_79/bin/java 300

(4)update-alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_79/bin/javac 300

(5)update-alternatives --config java


1.2.安装并配置好ssh:
(1)ssh-keygen -t rsa -P ''

(2)cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

(3)chmod 600 authorized_keys

(4)将/etc/ssh/sshd_config中下列行的#号去除:

RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile      .ssh/authorized_keys

查看22端口是否被ssh使用:curl localhost:22

 

【重要备注】

请注意打开防火墙的特定端口。

方法一:(不推荐,关闭了iptables和SELinux,可能导致系统受到攻击)

(1)关闭防火墙:service iptables stop

(2)验证服务关闭:chkconfig iptables off

(3)关闭SELinux: vim /etc/sysconfig/selinux

将文件中的:SELINUX=enforcing改成SELINUX=disabled

(4)设置SELinux 成为permissive模式:setenforce 0

(5)查询SELinux状态:getenforce

 

方法二:推荐使用,开放iptables的特定端口即可

(1)vi /etc/sysconfig/iptables插入:

#Xavier Setting for Hadoop
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9000 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9001 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50010 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50020 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50030 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50060 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50075 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50090 -j ACCEPT
#Xavier Setting End

(2)重启iptables服务:service iptables restart



1.3.配置Hadoop:
1.3.1>conf/Hadoop-env.sh:
export JAVA_HOME="/usr/java/jdk1.7.0_79''

1.3.2>conf/core-site.xml:
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

1.3.3>conf/hdfs-site.xml:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

1.3.4>conf/mapred-site.xml:
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
</configuration>

1.4.启动Hadoop:
1.4.1>格式化Hadoop文件系统HDFS:
bin/hadoop namenode -format

1.4.2>启动所有进程:
bin/start-all.sh

1.4.3>浏览器查看:
http://localhost:50030 JobTracker
http://localhost:50070 NameNode


1.4.4>关闭所有进程:
bin/stop-all.sh

二、Eclipse使用Hadoop:
2.1.下载并且使用hadoop-eclipse-plugin-1.0.4.jar:
下载地址:https://github.com/jrnz/hadoop-eclipse-plugin-1.0.4

 (最好自己编译以适配不同的Eclipse,详见三、为Eclipse编译Hadoop插件)

2.2设置Hadoop MapReduce安装目录

2.2.建立Hadoop工程,配置参数:

参数名 配置参数 说明
Location name Hadoop  
MapReduce Master Host: localhost NameNode的IP地址
MapReduce Master Port: 9001 MapReduce Port,参考自己配置的mapred-site.xml
DFS Master Port:9000 DFS Port,参考自己配置的core-site.xml
User name xavier  


2.3.切换到Advanced parameters,配置参数:

参数名 配置参数 说明
fs.default.name hdfs://localhost:9000 参考core-site.xml
hadoop.tmp.dir /tmp/hadoop-xavier 参考core-site.xml
mapred.job.tracker localhost:9001 参考mapred-site.xml

 2.4在Hadoop下运行WordCount.java:

(1)/home/xavier/Hadoop/src/examples/org/apache/hadoop/examples下有WordCount.java源代码

(2)新建word.txt,使用hadoop fs -mkdir /xavier/WordCount/建立工作目录,使用hadoop fs -copyFromLocal ./word.txt /xavier/WordCount/word.txt
(3)建立WordCount项目,对WordCount的Run Configruations设置:

1>Program Auguments:

hdfs://localhost:9000/xavier/WordCount/word.txt hdfs://localhost:9000/xavier/WordCount/count

2>VM Auguments:

-Djava.library.path=/home/xavier/Hadoop/lib/native/Linux-amd64-64

(4)在Eclipse中Export出WordCount项目为wordcount.jar包,勾选:Java source files 和generated class

(5)将wordcount.jar包放在工程目录下,并且在WordCount.java中main函数下的Configuration conf = new Configuration();后添加:

conf.set("mapred.jar", "wordcount.jar");

 

三、为Eclipse编译Hadoop插件

3.1安装Ant:

参照:

 
3.2在Hadoop目录下src/contrib中,修改build-contrib.xml
在project下添加:
<property name="eclipse.home" value="/home/xavier/Eclipse"/>
其中value值为eclipse的安装目录
 
3.3在Hadoop目录下src/contrib/eclipse-plugin中,修改build.xml
1>注释<target name="jar">下所有<copy>
2>添加:
<copy file="${hadoop.root}/hadoop-core-1.0.4.jar" tofile="${build.dir}/lib/hadoop-core.jar" verbose="true"/>  
        <copy file="${hadoop.root}/lib/commons-cli-1.2.jar"  todir="${build.dir}/lib" verbose="true"/>  
        <copy file="${hadoop.root}/lib/commons-configuration-1.6.jar" tofile="${build.dir}/lib/commons-configuration-1.6.jar" verbose="true"/>  
        <copy file="${hadoop.root}/lib/commons-httpclient-3.0.1.jar" tofile="${build.dir}/lib/commons-httpclient-3.0.1.jar" verbose="true"/>  
        <copy file="${hadoop.root}/lib/commons-lang-2.4.jar" tofile="${build.dir}/lib/commons-lang-2.4.jar" verbose="true"/>  
        <copy file="${hadoop.root}/lib/jackson-core-asl-1.8.8.jar" tofile="${build.dir}/lib/jackson-core-asl-1.8.8.jar" verbose="true"/>  
        <copy file="${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar" tofile="${build.dir}/lib/jackson-mapper-asl-1.8.8.jar" verbose="true"/> 
 
3.4在Hadoop目录下src/contrib/eclipse-plugin/META-INF中,修改MANIFEST.MF
在lib/hadoop-core.jar后添加
,lib/commons-configuration-1.6.jar,lib/commons-httpclient-3.0.1.jar,lib/commons-lang-2.4.jar,lib/jackson-core-asl-1.8.8.jar,lib/commons-cli-1.2.jar
 
3.5Terminal切换到Hadoop目录下src/contrib/eclipse-plugin,输入ant,回车编译
 
3.6编译完成的jar文件在Hadoop目录下build/contrib/eclipse-plugin中

 

posted @ 2015-06-15 10:46  XavierJZhang  阅读(263)  评论(0编辑  收藏  举报