docker中搭建分布式hadoop集群
1、pull Ubuntu镜像配置Java环境
2、下载hadoop软件包, 配置hosts /etc/hosts
172.17.0.5 hadoop1 172.17.0.6 hadoop2 172.17.0.2 hadoop3
3、配置JAVA_HOME(hadoop-env.sh、mapred-env.sh、yarn-env.sh)
4、配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop1:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/root/data/tmp</value> </property> </configuration>
5、配置hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop3:50090</value> </property> </configuration>
6、配置slave
hadoop1
hadoop2
hadoop3
7、配置yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop2</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>106800</value> </property> </configuration>
8、配置mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop1:19888</value> </property> </configuration>
9、设置ssh登录
安装sshd
apt-get install openssh-server
service ssh start
ps -e | grep ssh
生成秘钥
ssh-keygen -t rsa
设置root密码
passwd
设置root远程登录 PermitRootLogin yes
vim /etc/ssh/sshd_config
/etc/init.d/ssh restart
分发公钥
ssh-copy-id hadoop1 ssh-copy-id hadoop2 ssh-copy-id hadoop3
NameNode执行格式化
hdfs namenode –format
hadoop1上启动HDFS集群
/sbin/start-dfs.sh
启动出错
The authenticity of host '127.17.0.2 (127.17.0.2)' can't be established. Host key verification failed.
vi /etc/ssh/ssh_config
修改 StrictHostKeyChecking no
hadoop1 上启动yarn
sbin/start-yarn.sh
hadoop2 上启动ResourceManager
sbin/yarn-daemon.sh start resourcemanager