hadoop集群配置
在网上查了很长时间很多次因为种种细节错误,都没能成功,今天总算可以了。过程如下:
基本环境:
Master cloud003 IP:192.168.140.203
Slave cloud004 IP:192.168.140.204
注:设置虚拟机IP时,一定要用NAT方式,
操作系统:ubuntu
现在开始安装了,下载JAVA等过程略过
在每台机器上安装JAVA、HADOOP、SSH等,注意要安装在同一目录下;
JAVA和Hadoop安装在 /usr/local目录下
设置环境变量到/etc/profile文件,这样其他用户也可以访问了
export JAVA_HOME=/usr/local/jdk1.6.0_25
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export JRE_HOME=$JAVA_HOME/jre
export HADOOP_HOME=/usr/local/hadoop-0.20.2
export PATH=$HADOOP_HOME/bin:$PATH
SSH登录
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 644 authorized_keys
这一步非常关键,必须保证authorized_keys只对其所有者有读写权限,其他人不允许有写的权限,否则SSH是不会工作的。我就曾经在配置SSH的时候郁闷了好久。
上面是两台机子都要的相同操作,下面有所差异:
把cloud003上的id_dsa.pub拷贝到cloud004,之后把其内容追加到authorized_keys,同理cloud004上生成的id_dsa.pub也要追加到cloud003上的authorized_keys
测试SSH是否连通
cloud003上ssh 192.168.140.203
或
cloud004上ssh 192.168.140.204
下面比较关键:
Hadoop配置文件修改
conf文件夹下Master文件配置:
master文件内容为:
cloud003
slaves文件内容为:
cloud004
conf/core-site.xml文件:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://cloud003:9000</value>
</property>
</configuration>
conf/hdfs-site.xml文件:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
conf/mapred-site.xml文件:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>cloud003:9001</value>
</property>
</configuration>
Slave上设置(hdfs-site.xml不需要设置)
Masters文件内容:
cloud003
slaves文件内容为:
cloud004
conf/core-site.xml文件内容:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://cloud003:9000</value>
</property>
</configuration>
conf/mapred-site.xml文件内容:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>cloud003:9001</value>
</property>
</configuration>
配置完成后在cloud003操作
1:bin/hadoop namenode -format #格式化
2:./start-dfs.sh
3:./start-mapred.sh
测试
bin/hadoop jar ./hadoop-0.20.2-examples.jar wordcount input /usr/root/output
--查看详情
bin/hadoop dfsadmin -report
结果如下:
Configured Capacity: 83492536320 (77.76 GB)
Present Capacity: 72159903744 (67.2 GB)
DFS Remaining: 72159100928 (67.2 GB)
DFS Used: 802816 (784 KB)
DFS Used%: 0%
Under replicated blocks: 7
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.140.203:50010
Decommission Status : Normal
Configured Capacity: 41746268160 (38.88 GB)
DFS Used: 401408 (392 KB)
Non DFS Used: 5471211520 (5.1 GB)
DFS Remaining: 36274655232(33.78 GB)
DFS Used%: 0%
DFS Remaining%: 86.89%
Last contact: Sat Jun 25 01:20:39 PDT 2011
Name: 192.168.140.204:50010
Decommission Status : Normal
Configured Capacity: 41746268160 (38.88 GB)
DFS Used: 401408 (392 KB)
Non DFS Used: 5861421056 (5.46 GB)
DFS Remaining: 35884445696(33.42 GB)
DFS Used%: 0%
DFS Remaining%: 85.96%
Last contact: Sat Jun 25 01:20:15 PDT 2011