Ubuntu配置hadoop实验(英文版)【大数据处理技术】
The virtual machine image, hadoop installation package and JAVA installation package used in this article are
链接:https://pan.baidu.com/s/1wDCUkjsk3OEJlcV_FQ3RBA?pwd=6q89
提取码:6q89
Experimental environment
Version | |
---|---|
OS | Ubuntu 20.04.4 LTS |
JDK | 1.8.0_144 |
Hadoop | 2.7.2 |
Hadoop Installation and Configuration
1.1 Create user hadoop
Open the terminal and enter the following command to create a new user hadoop:
sudo useradd -m hadoop -s /bin/bash //create user hadoop
sudo passwd hadoop //Set the password of user hadoop
sudo adduser hadoop sudo //Add administrator privileges for user hadoop
1.2 Switch the login user to hadoop
who
1.3 Update apt
sudo apt-get update # Update apt
sudo apt-get install vim # install vim
1.4 Install SSH and configure SSH to log in without password
① Install ssh server
SSH login is required for both cluster and single node modes, SSH client is installed by default for Ubuntu. In addition, ssh server needs to be installed:
sudo apt-get install openssh-server # install ssh server
② Use ssh to log in
After installing ssh server, log in to this computer with the following command:
ssh localhost
1.5 Install Java environment
① Download jdk compressed file to local
② Unzip the JDK compressed file
Execute the following shell commands in the terminal:
cd /usr/lib
sudo mkdir jvm
cd
cd 下载
sudo tar -zxvf ./jdk-8u144-linux-x64.tar.gz -C /usr/lib/jvm
After the JDK file is decompressed, execute the following command to check in
/usr/lib/jvm directory:
cd /usr/lib/jvm
ls
③ Setting environment variables
Execute the following command to set the environment variable. The below command uses the vim editor to open the hadoop user's environment variable configuration file.
cd
vim ~/.bashrc
Add the following lines at the beginning of the file:
Press I to insert and press ESC and :wq to save and quit.
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_144
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
Save the .bashrc file and exit the vim editor. Then, continue to execute the following command to make the configuration of .bashrc file take effect immediately:
source ~/.bashrc
Use the following command to check whether the installation is successful:
java -version
1.6 Install Hadoop2
Install Hadoop into /usr/local/ :
sudo tar -zxf ~/下载/hadoop-2.7.2.tar.gz -C /usr/local
cd /usr/local/
sudo mv ./hadoop-2.7.2/ ./hadoop
sudo chown -R hadoop ./hadoop
Hadoop can be used after decompression. Enter the following command to check whether Hadoop is available. If successful, the Hadoop version information will be displayed:
cd /usr/local/hadoop
./bin/hadoop version
2. Hadoop local mode
3.Hadoop Pseudo distributed configuration
Hadoop can run in a pseudo distributed way on a single node. Hadoop process runs in a separate java process. The node acts as both namenode and datanode. At the same time, it reads the files in HDFS.
3.1 Modify configuration file core-site.xml and hdfs-site.xml
The Hadoop configuration file is located in /usr/local/hadoop/etc/hadoop/
cd /usr/local/hadoop/etc/hadoop/
vim core-site.xml
File core-site.xml needs to be modified as
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vim hdfs-site.xml
And file hdfs-site.xml needs to be modified as
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
3.2 Execute the format of namenode
After configuration, execute the format of namenode:
cd /usr/local/hadoop
./bin/hdfs namenode -format
3.3 Start the namenode and datanode daemons
Then start the namenode and datanode daemons.
cd /usr/local/hadoop
./sbin/start-dfs.sh
After the startup is completed, you can judge whether the startup is successful through the command JPS. If it is successful, the following processes will be listed:
"NameNode", "DataNode" and "Secondary NameNode".
jps
3.4 Close the namenode and datanode daemons
sudo vim ~/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source ~/.bashrc
stop-all.sh