Hadoop集群搭建
本文将介绍在Linux(Red Hat 9)环境下搭建Hadoop集群,此Hadoop集群主要由三台机器组成,主机名分别为:
linux 192.168.35.101
linux02 192.168.35.102
linux03 192.168.35.103
从map reduce计算的角度讲,linux作为master节点,linux02和linux03作为slave节点。
从hdfs数据存储角度讲,linux作为namenode节点,linux02和linux03作为datanode节点。
一台namenode机,主机名为linux,hosts文件内容如下:
127.0.0.1 linux localhost.localdomain localhost
192.168.35.101 linux linux.localdomain linux
192.168.35.102 linux02
192.168.35.103 linux03
两台datanode机,主机名为linux02和linux03
>linux02的hosts文件
127.0.0.1 linux02 localhost.localdomain localhost
192.168.35.102 linux02 linux02.localdomain linux02
192.168.35.101 linux
192.168.35.103 linux03
>inux03的hosts文件
127.0.0.1 linux03 localhost.localdomain localhost
192.168.35.103 linux03 linux03.localdomain linux03
192.168.35.101 linux
192.168.35.102 linux02
1.安装JDK
> 从java.cun.com下载jdk-6u7-linux-i586.bin
> ftp上传jdk到linux的root目录下
> 进入root目录,先后执行命令
chmod 755 jdk-6u18-linux-i586-rpm.bin
./jdk-6u18-linux-i586-rpm.bin
一路按提示下去就会安装成功
> 配置环境变量
cd进入/etc目录,vi编辑profile文件,将下面的内容追加到文件末尾
export JAVA_HOME=/usr/java/jdk1.6.0_18
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
注意:三台机器都要安装JDK~
2.设置Master/Slave机器之间可以通过SSH无密钥互相访问
最好三台机器的使用相同的账户名,我是直接使用的root账户
操作namenode机linux:
以用户root登录linux,在/root目录下执行下述命令:
ssh-keygen -t rsa
一路回车下去即可在目录/root/.ssh/下建立两个文件id_rsa.pub和id_rsa。
接下来,需要进入/root/.ssh目录,执行如下命令:
cd .ssh
再把is_rsa.pub文件复制到linux02和linux03机器上去。
scp -r id_rsa.pub root@192.168.35.102:/root/.ssh/authorized_keys_01
scp -r id_rsa.pub root@192.168.35.103:/root/.ssh/authorized_keys_01
操作datanode机linux02:
以用户root登录linux02,在目录下执行命令:
ssh-keygen -t rsa
一路回车下去即可在目录/root/.ssh/下建立两个文件 id_rsa.pub和id_rsa。
接下来,需要进入/root/.ssh目录,执行如下命令:
cd .ssh
再把is_rsa.pub文件复制到namenode机linux上去。
scp -r id_rsa.pub root@192.168.35.101:/root/.ssh/authorized_keys_02
操作datanode机linux03:
以用户root登录linux03,在目录下执行命令:
ssh-keygen -t rsa
一路回车下去即可在目录/root/.ssh/下建立两个文件 id_rsa.pub和id_rsa。
接下来,需要进入/root/.ssh目录,执行如下命令:
cd .ssh
再把is_rsa.pub文件复制到namenode机linux上去。
scp -r id_rsa.pub root@192.168.35.101:/root/.ssh/authorized_keys_03
*******************************************************************************
上述方式分别为linux\linux02\linux03机器生成了rsa密钥,并且把linux的id_rsa.pub复制到linux02\linux03上去了,而把linux02和linux03上的id_rsa.pub复制到linux上去了。
接下来还要完成如下步骤:
linux机:
以root用户登录linux,并且进入目录/root/.ssh下,执行如下命令:
cat id_rsa.pub >> authorized_keys
cat authorized_keys_02 >> authorized_keys
cat authorized_keys_03 >> authorized_keys
chmod 644 authorized_keys
linux02机:
以root用户登录linux02,并且进入目录/root/.ssh下,执行如下命令:
cat id_rsa.pub >> authorized_keys
cat authorized_keys_01 >> authorized_keys
chmod 644 authorized_keys
linux03机:
以root用户登录linux03,并且进入目录/root/.ssh下,执行如下命令:
cat id_rsa.pub >> authorized_keys
cat authorized_keys_01 >> authorized_keys
chmod 644 authorized_keys
通过上述配置,现在以用户root登录linux机,既可以无密钥认证方式访问linux02和linux03了,同样也可以在linux02和linux03上以ssh linux方式连接到linux上进行访问了。
3.安装和配置Hadoop
> 在namenode机器即linux机上安装hadoop
我下载的是hadoop-0.20.2.tar.gz,ftp上传到linux机的/root目录上,解压到安装目录/usr/hadoop,最终hadoop的根目录是/usr/hadoop/hadoop-0.20.2/
编辑/etc/profile文件,在文件尾部追加如下内容:
export HADOOP_HOME=/usr/hadoop/hadoop-0.20.2
export PATH=$HADOOP_HOME/bin:$PATH
> 配置Hadoop
core-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.35.101:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop/hadoop-${user.name}</value>
</property>
</configuration>
hdfs-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.35.101:9001</value>
</property>
</configuration>
masters
192.168.35.101
slaves
192.168.35.102
192.168.35.103
至此,hadoop的简单配置已经完成
> 将在namenode机器上配置好的hadoop部署到datanode机器上
这里使用scp命令进行远程传输,先后执行命令
scp -r /usr/hadoop/hadoop-0.20.2 root@192.168.35.102:/usr/hadoop/
scp -r /usr/hadoop/hadoop-0.20.2 root@192.168.35.103:/usr/hadoop/
4.测试
以root用户登入namenode机linux,进入目录/usr/hadoop/hadoop-0.20.2/
cd /usr/hadoop/hadoop-0.20.2
> 执行格式化
[root@linux hadoop-0.20.2]# bin/hadoop namenode -format
11/07/26 21:16:03 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = linux/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
Re-format filesystem in /home/hadoop/name ? (Y or N) Y
11/07/26 21:16:07 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
11/07/26 21:16:07 INFO namenode.FSNamesystem: supergroup=supergroup
11/07/26 21:16:07 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/07/26 21:16:07 INFO common.Storage: Image file of size 94 saved in 0 seconds.
11/07/26 21:16:07 INFO common.Storage: Storage directory /home/hadoop/name has been successfully formatted.
11/07/26 21:16:07 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at linux/127.0.0.1
************************************************************/
> 启动hadoop
[root@linux hadoop-0.20.2]# bin/start-all.sh
starting namenode, logging to /usr/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-linux.out
192.168.35.102: starting datanode, logging to /usr/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-linux02.out
192.168.35.103: starting datanode, logging to /usr/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-linux03.out
192.168.35.101: starting secondarynamenode, logging to /usr/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-linux.out
starting jobtracker, logging to /usr/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-linux.out
192.168.35.103: starting tasktracker, logging to /usr/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-linux03.out
192.168.35.102: starting tasktracker, logging to /usr/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-linux02.out
[root@linux hadoop-0.20.2]#
> 用jps命令查看进程
[root@linux hadoop-0.20.2]# jps
7118 SecondaryNameNode
7343 Jps
6955 NameNode
7204 JobTracker
[root@linux hadoop-0.20.2]#