Hadoop实战
本文环境:
OS:CentOS 6.6
JDK:1.7.0_79
User:xavier
Hadoop:1.0.4
一、安装Hadoop:
1.1.安装并配置好java
(1)vi /etc/profile
添加内容:
#Set Java Environment
export JAVA_HOME="/usr/java/jdk1.7.0_79"
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/bin:$PATH"
(2)source /etc/profile
(3)update-alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_79/bin/java 300
(4)update-alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_79/bin/javac 300
(5)update-alternatives --config java
1.2.安装并配置好ssh:
(1)ssh-keygen -t rsa -P ''
(2)cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(3)chmod 600 authorized_keys
(4)将/etc/ssh/sshd_config中下列行的#号去除:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
查看22端口是否被ssh使用:curl localhost:22
【重要备注】
请注意打开防火墙的特定端口。
方法一:(不推荐,关闭了iptables和SELinux,可能导致系统受到攻击)
(1)关闭防火墙:service iptables stop
(2)验证服务关闭:chkconfig iptables off
(3)关闭SELinux: vim /etc/sysconfig/selinux
将文件中的:SELINUX=enforcing改成SELINUX=disabled
(4)设置SELinux 成为permissive模式:setenforce 0
(5)查询SELinux状态:getenforce
方法二:推荐使用,开放iptables的特定端口即可
(1)vi /etc/sysconfig/iptables插入:
#Xavier Setting for Hadoop
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9000 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9001 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50010 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50020 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50030 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50060 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50075 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50090 -j ACCEPT
#Xavier Setting End
(2)重启iptables服务:service iptables restart
1.3.配置Hadoop:
1.3.1>conf/Hadoop-env.sh:
export JAVA_HOME="/usr/java/jdk1.7.0_79''
1.3.2>conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
1.3.3>conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
1.3.4>conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
1.4.启动Hadoop:
1.4.1>格式化Hadoop文件系统HDFS:
bin/hadoop namenode -format
1.4.2>启动所有进程:
bin/start-all.sh
1.4.3>浏览器查看:
http://localhost:50030 JobTracker
http://localhost:50070 NameNode
1.4.4>关闭所有进程:
bin/stop-all.sh
二、Eclipse使用Hadoop:
2.1.下载并且使用hadoop-eclipse-plugin-1.0.4.jar:
下载地址:https://github.com/jrnz/hadoop-eclipse-plugin-1.0.4
(最好自己编译以适配不同的Eclipse,详见三、为Eclipse编译Hadoop插件)
2.2设置Hadoop MapReduce安装目录
2.2.建立Hadoop工程,配置参数:
参数名 | 配置参数 | 说明 |
Location name | Hadoop | |
MapReduce Master | Host: localhost | NameNode的IP地址 |
MapReduce Master | Port: 9001 | MapReduce Port,参考自己配置的mapred-site.xml |
DFS Master | Port:9000 | DFS Port,参考自己配置的core-site.xml |
User name | xavier |
2.3.切换到Advanced parameters,配置参数:
参数名 | 配置参数 | 说明 |
fs.default.name | hdfs://localhost:9000 | 参考core-site.xml |
hadoop.tmp.dir | /tmp/hadoop-xavier | 参考core-site.xml |
mapred.job.tracker | localhost:9001 | 参考mapred-site.xml |
2.4在Hadoop下运行WordCount.java:
(1)/home/xavier/Hadoop/src/examples/org/apache/hadoop/examples下有WordCount.java源代码
(2)新建word.txt,使用hadoop fs -mkdir /xavier/WordCount/建立工作目录,使用hadoop fs -copyFromLocal ./word.txt /xavier/WordCount/word.txt
(3)建立WordCount项目,对WordCount的Run Configruations设置:
1>Program Auguments:
hdfs://localhost:9000/xavier/WordCount/word.txt hdfs://localhost:9000/xavier/WordCount/count
2>VM Auguments:
-Djava.library.path=/home/xavier/Hadoop/lib/native/Linux-amd64-64
(4)在Eclipse中Export出WordCount项目为wordcount.jar包,勾选:Java source files 和generated class
(5)将wordcount.jar包放在工程目录下,并且在WordCount.java中main函数下的Configuration conf = new Configuration();后添加:
conf.set("mapred.jar", "wordcount.jar");
三、为Eclipse编译Hadoop插件
3.1安装Ant:
参照:
<copy file="${hadoop.root}/lib/commons-cli-1.2.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-configuration-1.6.jar" tofile="${build.dir}/lib/commons-configuration-1.6.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-httpclient-3.0.1.jar" tofile="${build.dir}/lib/commons-httpclient-3.0.1.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-lang-2.4.jar" tofile="${build.dir}/lib/commons-lang-2.4.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/jackson-core-asl-1.8.8.jar" tofile="${build.dir}/lib/jackson-core-asl-1.8.8.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar" tofile="${build.dir}/lib/jackson-mapper-asl-1.8.8.jar" verbose="true"/>