hadoop单机部署
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
https://www.cnblogs.com/ee900222/p/hadoop_1.html
1、安装jdk,设置环境变量
mkdir /usr/java
tar xf jdk1.8.0_221.tar.gz -C /usr/java
vi /etc/profile
...
export JAVA_HOME=/usr/java/jdk1.8.0_221
export PATH=$JAVA_HOME/bin:$PATH:$HOME/bin
下载hadoop
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz
tar xf hadoop-2.7.0.tar.gz -C /home/centos
ln -s /home/centos/hadoop-2.7.0 /home/centos/hadoop
设置hadoop环境变量
vi .bash_profile
export HADOOP_HOME=/home/centos/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source .bash_profile
验证安装是否正常,以下job是使用hadoop自带的样例,在input中统计含有dfs的字符串。
cd hadoop
mkdir input
cp etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
cat output/*
二、配置伪分布式
修改配置etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://safe01:9000</value>
</property>
</configuration>
修改配置etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
配置ssh对等性
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
或使用以下命令配置ssh对等性
ssh-keygen -t rsa
ssh-copy-id safe01
执行MapReduce job
hdfs namenode -format
启动namenode和datanode
sbin/start-dfs.sh
使用jps命令查看,会看到有一个NameNode、DataNode
访问NameNode的web页面是http://ip:50070
cd hadoop
hdfs -mkdir /user
hdfs -mkdir /user/centos
hdfs dfs -put etc/hadoop /user/centos/input
hadoop fs -ls /user/centos/input
执行hadoop job
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar /user/centos/input output 'dfs[a-z.]+'
查看执行结果
hdfs dfs -cat output/*
配置Yarn
修改配置:etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
修改配置:etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
启动ResourceManager和NodeManager
sbin/start-yarn.sh
访问ResourceManager的端口: http://ip:8088/