1. Introduction to hdfs cluster roles
NameNode: hdfs cluster manager .
SecondaryNameNode: help NameNode to arrange metadata.
DataNode: store data.
2. hdfs cluster node role setting in this example
node1 : NameNode, SecondaryNameNode, DataNode
node2: DataNode
Note:
# for using host name instead of host ip
$ gedit /etc/hosts
node1_ip node1
node2_ip node2
3. Create a normal user hadoop and a group hadoop
$ sudo adduser hadoop
4.Download hadoop package from https://archive.apache.org/dist/hadoop/common/current/
5. Untar hadoop package
$ sudo tar -zxvf hadoop-3.3.6.tar.gz -C /opt/software/hadoop
Introduction of folers in hadoop-3.3.6:
bin: commands of hadoop.
etc: configuration files of hadoop
include: C header files.
lib: .so files.
libexec: scripts for configuring hadoop, (.sh and .cmd).
licenses-binary: license files
6. Configure hadoop cluster (namely hdfs cluster) on node1
workers: configure DataNodes.
hadoop-env.sh: configure enviroment variables needed when running hadoop cluster.
core-site.xml: hadoop core configuration file.
hdfs-site.xml: hdfs core configuration file.
All these files are in hadoop-3.3.6/etc/hadoop directory.
6.1 configure file workers
$ cd hadoop-3.3.6/etc/hadoop
$ gedit workers
node1
node2
...
This means that the hadoop cluster has registered these nodes.
6.2 Configure hadoop-env.sh
# add at the end of file hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk-11 # replace the value as your actual jdk install path
export HADOOP_HOME=/opt/software/hadoop/hadoop-3.3.6
export HADOOP_CONF_DIR=\(HADOOP_HOME/etc/hadoop
export HADOOP_LOG_DIR=\)HADOOP_HOME/logs
export HADOOP_SECURE_PID_DIR=\(HADOOP_HOME/pids
export HADOOP_PID_DIR=\)HADOOP_HOME/pids
create directory pids in hadoop-3.3.6:
$ cd /opt/software/hadoop/hadoop-3.3.6
$ mkdir pids
Note:
find jdk install path:
$ dpkg -L jdk-11
6.3 configure core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
6.4 configure hdfs-site.xml
<configuration>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/nn</value>
</property>
<property>
<name>dfs.namenode.hosts</name>
<value>node1,node2</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/dn</value>
</property>
</configuration>
dfs.datanode.data.dir.perm: default permison for new created file, 700 is rwx------
dfs.namenode.name.dir: disrectory for metadata on NameNode
dfs.namenode.hosts: NameNode allows which DataNodes to connect, namely allow which nodes to join the cluster
dfs.namenode.handler.count: namenode parellel thread count
dfs.datanode.data.dir: directory for storing data on DataNode
create directory /data/nn, /data/dn
# on node1
$ mldir -p /data/nn
$ mkdir /data/dn
# on node2
$ mkdir -p /data/dn
7. copy directory hadoop to all other nodes
# on node1
$ cd /opt/software
$ scp -r hadoop user@node2_ip:~
$ ssh user@node2_ip
$ sudo mv ~/hadoop /opt/software
(if directory /opt/software not exists, create it.)
Note:
$ scp -r hadoop user@node2_ip:/opt/software
permission denied
This is due to user has no permission to access /opt/software.
8. configure /etc/profile on all nodes
$ sudo gedit /etc/profile
export HADOOP_HOME=/opt/software/hadoop/hadoop-3.3.6
export PATH=\(PATH:\)HADOOP_HOME/bin:$HADOOP_HOME/sbin
# make /etc/profile take effect
$ source /etc/profile
9. set no password ssh login for user hadoop
9.1 generate public and private key on all nodes
$ su hadoop
$ ssh-keygen -t rsa
press Enter at input query.
9.2 copy public key from all nodes to node1
# on all nodes
$ ssh-copy-id node1
9.3 copy authorized_keys from node1 to all other nodes
# on node1
$ scp /home/hadoop/.ssh/authorized_keys hadoop@node2:/home/hadoop/.ssh
10. change owner of directory hadoop, /data
For security, not start hadoop cluster as root, but as normal user hadoop.
# chown -R hadoop:hadoop /opt/software/hadoop
# chown -R hadoop:hadoop /data
11. format namenode
$ su hadoop
$ hadoop namenode -format
12. start hadoop cluster
# on node1
$ su hadoop
$ start-dfs.sh
Starting namenodes on [maye-inspiron-5547]
Starting datanodes
maye-inspiron-5547: mv: cannot stat '/opt/software/hadoop/hadoop-3.3.6/logs/hadoop-hadoop-datanode-maye-Inspiron-5547.out.3': No such file or directory
maye-inspiron-5547: mv: cannot stat '/opt/software/hadoop/hadoop-3.3.6/logs/hadoop-hadoop-datanode-maye-Inspiron-5547.out.1': No such file or directory
maye-inspiron-5547: mv: cannot stat '/opt/software/hadoop/hadoop-3.3.6/logs/hadoop-hadoop-datanode-maye-Inspiron-5547.out': No such file or directory
Starting secondary namenodes [maye-Inspiron-5547]
# check java process on node1
$ jps
3002193 SecondaryNameNode
3001728 DataNode
3003027 Jps
3001379 NameNode
# check java process on node2
$ jps
3001666 DataNode
3007777 Jps
Note:
if command not found,
$ source /etc/profile
if still not ok, use absolute path:
$ /opt/oftware/hadoop/hadoop-3.3.6/sbin/start-dfs.sh
13. stop hadoop cluster
# on node1
$ su hadoop
$ stop-dfs.sh
Note:
if command not found,
$ source /etc/profile
if still not ok, use absolute path:
$ /opt/oftware/hadoop/hadoop-3.3.6/sbin/stop-dfs.sh
References:
http://lihuaxi.xjx100.cn/news/1407123.html?action=onClick