1准备篇
1.1 服务器
10.18.11.130 (master) 机器名:rac1
10.16.11.253( datanode) 机器名:mos5200app
10.18.11.159(datanode) 机器名:rac4
1.2 JDK 版本
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)
版本下载可到oracle官方网站下载
下载地址:
http://www.oracle.com/technetwork/java/javase/downloads/jdk6u38-downloads-1877406.html
1.3 hadoop版本
hadoop-1.1.1
下载地址:
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-1.1.1/
1.4 服务器操作系统
Linux-Suse10
1.5 SSH (linux)安装包
由于suse系统自带,这里略去。
1.6 Ant(linux)安装包
版本:apache-ant-1.8.4
下载地址:
http://ant.apache.org/bindownload.cgi
2 安装篇
2.1创建用户与组
为使集群顺利、方便简单的工作,需在namenode与datanode上创建相同组,相同用户 本次搭建 用户组统一为hadoop 用户为hadoop 密码为hadoop
2.1.1创建组
groupadd hadoop
groupadd -g 301 hadoop
2.1.2添加用户
useradd -d /home/hadoop -m hadoop
useradd hadoop –g hadoop -d /home/hadoop -s /bin/bash -m hadoop
passwd hadoop
2.2安装配置SSH
由于hadoop 集群通信是基于SSH ,因此配置ssh非常必要。Namenode 与datanode 机器都必须安装配置SSH
2.2.1修改sshd_config 配置文件
修改/etc/ssh/sshd_config找到以下内容并去掉“#”:
vi /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
添加允许帐户:
AllowUsers root hadoop
2.2.2重启SSH服务:
service sshd restart
2.2.3生成密钥
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cp id_rsa.pub authorized_keys
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat id_dsa.pub >> authorized_keys
chmod 600 authorized_keys
namenode 与datanode之间无密码访问
关闭防火墙 sudo SuSEfirewall2 stop
密钥 namenode 与 datanode 相互拷贝(注:namenode与datanode是一对。)
scp /home/hadoop/.ssh/id_rsa.pub hadoop@10.18.11.130:/home/hadoop/
cat /home/hadoop/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
2.2.4测试
注:确保两台机器间 ssh 无密码登陆
2.3 JDK配置
JDK 为hadoop亦为hadoop运行必须条件,Namenode 与datanode 机器都必须安装配置JDK。
2.3.1将准备好的JDK 传至各个服务器。
FTP方式,采用2进制模式安全传送
传送目标目录为
/usr
完毕后进行解压。
2.3.2 环境变量配置
在hadoop的.profile文件中添加下列内容:
JAVA_HOME=/usr/jdk1.7.0_10
CLASSPATH=$JAVA_HOME/lib:.$JAVA_HOME/jre/lib:.
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
注:/usr/jdk1.7.0_10 目录内为 JDK6版本内容,当时1.7测试没通过,又更换为1.6版本。
2.3.3 测试JDK 安装
Java -version
正确结果为:
hadoop@rac1:/usr> java -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)
注:此步骤是在hadoop 用户下验证的。
2.4 Hadoop 安装
2.4.1将namenode 与datanode 机器上都传送下载好的hadoop 安装包
目标路径为
/home/software/
2.4.2 将所有机器上的hadoop包解压
解压后正确的结果是 /home/software/hadoop-1.1.1
2.4.3 hadoop 环境变量配置
在hadoop的.profile文件中添加下列内容:
HADOOP_HOME=/home/software/hadoop-1.1.1
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME PATH
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_CONF_DIR=/home/software/hadoop-1.1.1/conf
2.4.4 修改各机器hosts 文件
打开机器/etc/hosts 文件
添加内容:
#127.0.0.1 localhost rac1
10.18.11.130 rac1
10.16.11.253 mos5200app
10.18.11.159 rac4
注意:若有127.0.0.1 请注释掉
请注意顺序,namenode 在前面,后面是 datanode 三台机器顺序需一致
2.4.5 Namecode hadoop配置文件修改
修改/home/software/hadoop-1.1.1/conf/master配置文件 添加内容
rac1
修改slaves 文件,添加内容:
mos5200app
rac4
修改/home/software/hadoop-1.1.1/conf/hdfs-site.xml 文件内容 在根节点<configuration> 里添加内容
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hdfs_data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
nfiguration>
</property>
</configuration>
修改/home/software/hadoop-1.1.1/conf/core-site.xml 在根节点<configuration>添加内容:
<property>
<name>fs.default.name</name>
<value>hdfs://10.18.11.130:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopDATA/tmp/</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
</property><property>
<name>fs.default.name</name>
<value>hdfs://10.18.11.130:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopDATA/tmp/</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
</property><property>
<name>fs.default.name</name>
<value>hdfs://10.18.11.130:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopDATA/tmp/</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
</property>
注意:<value>hdfs://10.18.11.130:9000</value>请修改为当前机器的Ip
修改/home/software/hadoop-1.1.1/conf/mapred-site.xml在根节点<configuration>添加内容:
<property>
<name>mapred.job.tracker</name>
<value>10.18.11.130:9001</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx256m</value>
</property>
2.4.6 hadoop native 类库重新编译
将准备好的ant 拷贝至各机器 目标文件夹 /opt
解压ant
配置环境变量 /home/hadoop/.profile
ANT_HOME=/opt/apache-ant-1.8.4
PATH=$ANT_HOME/bin:$PATH
export ANT_HOME HADOOP_HOME PATH
编译
在$HADOOP_HOME目录下,使用如下命令:
$ant compile-native
重新编译hadoop的native库,把HADOOP HOME下:
build/native/Linux-amd64-64/lib
的所有文件,拷贝到HADOOP HOME的:编译完成后,可以在$HADOOP_HOME/build/native目录下找到相应的文件,然后移动编译好的文件到默认目录
lib/native/Linux-amd64-64/
注意:删除编译好的文件夹。
2.4.7 hadoop 拷贝
拷贝 namenode 上已经配置修改完毕的hadoop 至其他datanode 机器上 目标目录依然为:/home/software
2.4.8 启动hadoop
启动 namenode start-all.sh
2.5测试集群
2.5.1 Jps 命令
Namenode:
hadoop@rac1:~> jps
23854 JobTracker
23732 SecondaryNameNode
14989 Jps
23487 NameNode
29934 TaskTracker
hadoop@rac1:~>
datanode:
hadoop@mos5200app:~> jps
18939 Jps
32214 TaskTracker
32080 DataNode
hadoop@mos5200app:~>
2.5.2 hadoop dfsadmin -report
查看集群状态命令
namenode:
hadoop@rac1:~> hadoop dfsadmin -report
Configured Capacity: 415503699968 (386.97 GB)
Present Capacity: 225885907968 (210.37 GB)
DFS Remaining: 220155703296 (205.04 GB)
DFS Used: 5730204672 (5.34 GB)
DFS Used%: 2.54%
Under replicated blocks: 55
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 10.18.11.159:50010
Decommission Status : Normal
Configured Capacity: 266211225600 (247.93 GB)
DFS Used: 2865048576 (2.67 GB)
Non DFS Used: 184850818048 (172.16 GB)
DFS Remaining: 78495358976(73.1 GB)
DFS Used%: 1.08%
DFS Remaining%: 29.49%
Last contact: Sun Jan 13 18:20:14 CST 2013
Name: 10.16.11.253:50010
Decommission Status : Normal
Configured Capacity: 149292474368 (139.04 GB)
DFS Used: 2865156096 (2.67 GB)
Non DFS Used: 4766973952 (4.44 GB)
DFS Remaining: 141660344320(131.93 GB)
DFS Used%: 1.92%
DFS Remaining%: 94.89%
Last contact: Sun Jan 13 18:20:15 CST 2013
Datanode:
hadoop@mos5200app:~> hadoop dfsadmin -report
Configured Capacity: 415503699968 (386.97 GB)
Present Capacity: 225885907968 (210.37 GB)
DFS Remaining: 220155703296 (205.04 GB)
DFS Used: 5730204672 (5.34 GB)
DFS Used%: 2.54%
Under replicated blocks: 55
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 10.18.11.159:50010
Decommission Status : Normal
Configured Capacity: 266211225600 (247.93 GB)
DFS Used: 2865048576 (2.67 GB)
Non DFS Used: 184850818048 (172.16 GB)
DFS Remaining: 78495358976(73.1 GB)
DFS Used%: 1.08%
DFS Remaining%: 29.49%
Last contact: Sun Jan 13 18:20:56 GMT+08:00 2013
Name: 10.16.11.253:50010
Decommission Status : Normal
Configured Capacity: 149292474368 (139.04 GB)
DFS Used: 2865156096 (2.67 GB)
Non DFS Used: 4766973952 (4.44 GB)
DFS Remaining: 141660344320(131.93 GB)
DFS Used%: 1.92%
DFS Remaining%: 94.89%
Last contact: Sun Jan 13 18:20:57 GMT+08:00 2013
2.5.3 Hadoop 文件系统hdfs查看命令 hadoop dfs –ls
Namenode 与datanode 同时显示一下内容:
hadoop@mos5200app:~> hadoop dfs -ls
Found 6 items
drwxr-xr-x - hadoop supergroup 0 2013-01-10 14:51 /user/hadoop/collect
drwxr-xr-x - hadoop supergroup 0 2013-01-09 17:26 /user/hadoop/in
drwxr-xr-x - hadoop supergroup 0 2013-01-09 17:29 /user/hadoop/out
drwxr-xr-x - hadoop supergroup 0 2013-01-09 17:11 /user/hadoop/testdir