hadoop
HDFS分布式文件系统
环境准备:
1.安装java环境
yum -y install java-1.8.0-openjdk-devel
2.配置/etc/hosts
3.配置ssh信任关系(NameNode)
rm -rf /root/.ssh/known_hosts
# 配置/etc/ssh/ssh_config 取消yes询问
Host * StrictHostKeyChecking no
# 生成密钥对
ssh-kengen -b 2048 -t rsa -N '' -f key
# 部署
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.24
配置文件 (参考 : https://hadoop.apache.org/docs/r2.7.6/)
环境配置文件 /usr/local/hadoop/etc/hadoop/hadoop-env.sh
核心配置文件 /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <!-- <value>file:///</value> 使用本地文件系统 --> <value>hdfs://nn01:9000</value> <!-- # 使用hdfs文件系统 --> </property> <property> <!-- 数据存放目录 --> <name>hadoop.tmp.dir</name> <value>/var/hadoop</value> </property> </configuration>
HDFS配置文件 /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration> <property> <!-- namenode address and port --> <name>dfs.namenode.http-address</name> <value>nn01:50070</value> </property> <property> <!-- secondary address and port --> <name>/menode.secondary.http-address</name> <value>nn01:50090</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
节点配置文件 /usr/local/hadoop/etc/hadoop/slaves
node1
node2
node3
启动hdfs集群
ALL: 创建数据存储文件 mkdir /var/hadoop
拷贝nn01:/usr/local/hadoop 至所有的node节点
rsync -aSH --delete /usr/local/hadoop node1:/usr/local/
rsync -aSH --delete /usr/local/hadoop node2:/usr/local/
...
在namenode上执行格式化操作
/usr/local/hadoop/bin/hdfs namenode -format
启动集群
/usr/local/hadoop/sbin/start-dfs.sh
所有节点jps验证角色
jps
namenode上节点验证
/usr/local/hadoop/bin/hdfs dfsadmin -report
配置/usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
配置/usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>nn01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
同步配置文件至node节点
rsync -aSH --delete /usr/local/hadoop/etc/hadoop/ node1:/usr/local/hadoop/etc/hadoop/
。。。
启动hadoop集群
/usr/local/hadoop/sbin/start-yarn.sh
验证
jps
/usr/local/hadoop/bin/yarn node -list
web页面浏览
http://192.168.1.21:50070/ # namenode ip为设置的ip http://192.168.1.21:50090/ # secondarynamenode ip为设置的ip http://192.168.1.21:8088/ # resourcemangager ip为设置的ip http://192.168.1.22:50075/ # datanode ip为设置的ip http://192.168.1.22:8042/ #nodemanager ip为设置的ip