Hadoop分布式环境搭建
一、系统环境
10.0.0.11 master centos6.6 x86_64
10.0.0.12 salve1 centos6.6 x86_64
二、设置host
将master和slave1加进两台服务器的hosts文件中
三、ssh无密钥登陆
master操作
#ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#cat id_dsa.pub >> authorized_keys
#若成功则ssh master 无需密码即可登陆
slave1执行相同操作,若成功则slave1上ssh slave1成功无密钥登陆
将master的公钥id_dsa.pub内容 加到slave1的authorized_keys中
将slave1的公钥id_dsa.pub 内容 加到master的authorized_keys中
配置成功则master和slave1可相互无密钥登陆
PS:.ssh目录不能手动创建,要通过ssh-keygen自动生成
四、安装java
yum -y install java-1.7.0-openjdk*
安装完成后配置环境变量
编辑/etc/profile,在最后加入export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
执行source /etc/profile使配置立即生效
五、hadoop安装
官网下载hadoop最新stable版本binary包(本文为2.6.0)
解压至/usr/local/hadoop
编辑/etc/profile,在最后加入
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
执行source /etc/profile使其立即生效
创建相关目录
mkdir /usr/local/hadoop/dfs
mkdir /usr/local/hadoop/dfs/name
mkdir /usr/local/hadoop/dfs/data
mkdir /usr/local/hadoop/tmp
六、Hadoop配置
1、etc/hadoop/slaves
填入所有的slave的hostname,本文仅有一个slave1
2、etc/hadoop/core-site.xml
1 <configuration> 2 <property> 3 <name>fs.defaultFS</name> 4 <value>hdfs://master:8020</value> 5 </property> 6 <property> 7 <name>io.file.buffer.size</name> 8 <value>131072</value> 9 </property> 10 <property> 11 <name>hadoop.tmp.dir</name> 12 <value>file:/usr/local/hadoop/tmp</value> 13 <description>Abase for other temporary directories.</description> 14 </property> 15 </configuration>
3、etc/hadoop/hdfs-site.xml
1 <configuration> 2 <property> 3 <name>dfs.namenode.secondary.http-address</name> 4 <value>master:9001</value> 5 </property> 6 <property> 7 <name>dfs.namenode.name.dir</name> 8 <value>file:/usr/local/hadoop/dfs/name</value> 9 </property> 10 <property> 11 <name>dfs.datanode.data.dir</name> 12 <value>file:/usr/local/hadoop/dfs/data</value> 13 </property> 14 <property> 15 <name>dfs.replication</name> 16 <value>1</value> 17 </property> 18 <property> 19 <name>dfs.webhdfs.enabled</name> 20 <value>true</value> 21 </property> 22 </configuration>
4、etc/hadoop/mapred-site.xml
1 <configuration> 2 <property> 3 <name>mapreduce.framework.name</name> 4 <value>yarn</value> 5 </property> 6 <property> 7 <name>mapreduce.jobhistory.address</name> 8 <value>master:10020</value> 9 </property> 10 <property> 11 <name>mapreduce.jobhistory.webapp.address</name> 12 <value>master:19888</value> 13 </property> 14 </configuration>
5、etc/hadoop/yarn-site.xml
1 <configuration> 2 <property> 3 <name>yarn.nodemanager.aux-services</name> 4 <value>mapreduce_shuffle</value> 5 </property> 6 <property> 7 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 8 <value>org.apache.hadoop.mapred.ShuffleHandler</value> 9 </property> 10 <property> 11 <name>yarn.resourcemanager.address</name> 12 <value>master:8032</value> 13 </property> 14 <property> 15 <name>yarn.resourcemanager.scheduler.address</name> 16 <value>master:8030</value> 17 </property> 18 <property> 19 <name>yarn.resourcemanager.resource-tracker.address</name> 20 <value>master:8031</value> 21 </property> 22 <property> 23 <name>yarn.resourcemanager.admin.address</name> 24 <value>master:8033</value> 25 </property> 26 <property> 27 <name>yarn.resourcemanager.webapp.address</name> 28 <value>master:8088</value> 29 </property> 30 </configuration>
6、将hadoop文件目录拷贝到所有节点上
七、启动
1、格式化namenode :
hadoop namenode -format
2、start-all.sh 启动所有节点
用jps查看master和slave1的进程状态
master上:NameNode、SecondaryNameNode、ResourceManager
slave上: DataNode、NodeManager