Loading

CentOS配置hadoop以及基本操作实验(英文版)【大数据处理技术】

The virtual machine image, hadoop installation package and JAVA installation package used in this article are
链接:https://pan.baidu.com/s/1qaA2DxPmwm8eN2qCl18jQQ?pwd=7zik
提取码:7zik

Experimental environment

Version
OS CentOS Linux release 7.9.2009 (Core)
JDK 1.8.0_144
Hadoop 2.7.2

image.png

Experiment Step

Show how you conduct the experiment (step by step) and show your codes/commands, results, and screenshots.

1 Linux commands Practice

1.1 cd

cd: change directory
(1) change to directory /usr/local

cd /usr/local

(2) move up/back one directory

cd ..

(3) move to your home directory

cd ~

image.png

1.2 ls

ls: lists files
(4) lists all of the files in /usr directory.

ls /usr

image.png

1.3 mkdir

mkdir: make a new directory
(5) change to /tmp directory, and make a new directory named ‘new’

cd /tmp
mkdir new

(6) make a directory a1/a2/a3/a4

mkdir -p a1/a2/a3/a4

image.png

1.4 rmdir

rmdir: remove empty directories
(7) remove the ‘new’ directory (created in 5)

cd /tmp
rmdir new

(8) remove a1/a2/a3/a4

rmdir -p a1/a2/a3/a4

image.png

1.5 cp

cp: copy files or directory
(9) copy .bashrc file (under your home folder) to /usr, and name it as ‘bashrc1’

sudo cp ~/.bashrc /usr
sudo mv /usr/.bashrc /usr/bashrc1

(10) create a new directory ‘test’ under /tmp, and copy this directory to /usr

mkdir /tmp/test
sudo cp -r /tmp/test /usr

image.png

1.6 mv

mv: move or rename files
(11) move file bashrc1 (created in 9) to /usr/test directory

sudo mv /usr/bashrc1 /usr/test

(12) rename test directory (created in 10) to test2

sudo mv /usr/test /usr/test2

image.png

1.7 rm

rm: remove files or directory
(13) remove file bashrc1

sudo rm -rf /usr/test2/bashrc1

(14) remove directory test2

sudo rm -rf /usr/test2

image.png

1.8 cat

cat: display file content
(15) view the content of file .bashrc

cat ~/.bashrc

image.png

1.9 tac

tac: display file content in reverse
(16) print the content of file .bashrc in reverse

tac ~/.bashrc

image.png

1.10 more

more: displays output one screenful at a time
(17) use more command the view the content of file .bashrc

more ~/.bashrc

image.png

1.11 head

head: view the top few lines of a file
(18) view the first 20 lines of file .bashrc

head -n 20 ~/.bashrc

image.png
(19) view the first few lines of file .bashrc, do not display the last 50 lines

head -n -50 ~/.bashrc

image.png

1.12 tail

tail: view the last few lines of a file
(20) view the last 20 lines of file .bashrc

tail -n 20 ~/.bashrc

(21) view the content of file .bashrc, only display the content after line 50

tail -n 50 ~/.bashrc

image.png

1.13 chown

chown: change ownership
(22) change the ownership of any file and view the permissions

vim hello.txt
sudo chown root hello.txt

image.png
image.png

1.14 chmod

chmod: change the permissions of a file
(23) change the permissions of any file

sudo chmod -R 777 hello.txt

image.png

1.15 find/locate

find/locate: search for files
(24) find file .bashrc, state the difference between find and locate command

find .bashrc

image.png

1.16 grep

grep: search through the text in a given file
(25) search for “examples” from /.bashrc file

grep examples ~/.bashrc

image.png

2. Hadoop installation and configuration

Install CentOS Virtual Machine

Click Create Virtual Machine
Click Customize>Next>Install the operating system later>Next>Set the virtual machine storage location>Set the virtual machine name>Number of processors:2>Allocate memory>Select NAT for network type>Finish>
Edit the virtual machine settings>Use ISO image file>Select the image file you have downloaded
Run the virtual machine>Install CentOS 7>Select language: Chinese>Continue>
Date and Time>Settings in the upper right corner>Cancel all previous ticks>Select three new ones from the following>Check the box and OK

ntp1.aliyun.com
ntp2.aliyun.com
ntp3.aliyun.com
ntp4.aliyun.com
ntp5.aliyun.com
ntp6.aliyun.com
ntp7.aliyun.com

Software selection > select "Infrastructure Server" in "Basic Environment" > select "Debugging Tools" in "Additional Options for Selected Environment" > Finish
Installation Location>Select "Auto-configure Partitions">Finish
Network and Hostname>Open Network>Finish
Start installation>ROOT password>Set ROOT password>Finish
Reboot after installation is complete

Configuring a static network

Enter root>Enter password

ping www.baidu.com

Check if the network can ping through, Ctrl+C to stop.

vi /etc/sysconfig/network-scripts/ifcfg-ens33

Press i to enter edit mode, change 2 main things.

  1. BOOTPROTO="dhcp" to BOOTPROTO="static"
  2. ONBOOT="no" to ONBOOT="yes", if it is already yes, don't touch it.
    Go to VMware interface > Edit > Virtual Network Editor > VMnet8 > NAT Settings > View Subnet IP, Subnet Mask and Gateway
    Then add the following lines in the last line, IPADDR can be in the same network segment as the viewed subnet IP, GATEWAY is the same as the viewed
IPADDR="192.168.19.11"
NETMASK="255.255.255.0"
GATEWAY="192.168.19.2"

image.png
When you are done, press esc to exit edit mode, then hold shift+: and type wq to save and exit.
After saving and exiting we need to restart the service::

service network restart

Then ping, can ping through, configure the network this step even if no problem. If you have problems, go to Problems and Solutions.

Turn off the firewall

systemctl stop firewalld
systemctl disable firewalld

If you are not sure, you can check it by entering the following command:

systemctl status firewalld

image.png
Seeing inactive followed by dead indicates a successful closure.

Configure the host name

hostnamectl set-hostname hadoop101

Enter reboot to reboot and find that it has changed

reboot

image.png

Connect to this virtual machine in Xshell

image.png

Install JAVA

Create a folder in the /opt directory
image.png
Transferring files with Xshell
New file transfer > transfer jdk installation package and hadoop installation package to virtual machine
Check whether the packages are imported successfully in the opt directory under Linux
image.png

tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/

View jdk path

/opt/module/jdk1.8.0_144
pwd

image.png
Open the /etc/profile file

vi /etc/profile

Add the JDK path at the end of the profile file

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin

Save and exit

:wq

Make the modified file effective

source /etc/profile

Test if the JDK is installed successfully

java -version

image.png
If the java command does not work, then see the Problems and Solutions section.

Install Hadoop

cd /opt/software

Unzip the installation file under /opt/module

tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/

Check if the decompression is successful

ls /opt/module/

image.png
Add hadoop to the environment variables

[root@hadoop101 hadoop-2.7.2]# pwd
/opt/module/hadoop-2.7.2

Open the /etc/profile file

vi /etc/profile

Add the hadoop path to the end of the profile file

##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
[root@hadoop101 hadoop-2.7.2]# source /etc/profile
[root@hadoop101 hadoop-2.7.2]# hadoop version
Hadoop 2.7.2
Subversion Unknown -r Unknown
Compiled by root on 2017-05-22T10:49Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar

If the hadoop command does not work, see the Problems and Solutions section.

1)Hadoop Local Mode

Official Grep Case

Create an input folder under the hadoop-2.7.2 file

mkdir input

Copy the Hadoop xml configuration file to input

cp etc/hadoop/*.xml input

Execute the MapReduce program in the share directory

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'

image.png
View output results

cat output/*

image.png

Official WordCount Case

Create a wcinput folder under the hadoop-2.7.2 file

mkdir wcinput

Create a wc.input file under the wcinput file

cd wcinput
touch wc.input

Edit the wc.input file

vi wc.input

Enter the following into the file

hadoop yarn
hadoop mapreduce
All I can tell you is it's all show biz.

Save to exit

:wq

Go back to the Hadoop directory /opt/module/hadoop-2.7.2

cd ../

Implementation Procedures

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput

View Results

cat wcoutput/part-r-00000

image.png

2)Hadoop Pseudo Distributed Configuration

Configuring the cluster
To obtain the installation path of the JDK on a Linux system:

echo $JAVA_HOME

Configure hadoop-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configuration: core-site.xml

<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://hadoop101:9000</value>
</property>

<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

Configuration: hdfs-site.xml

<!-- 指定HDFS副本的数量 -->
<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>

Configuring yarn-site.xml

<!-- Reducer获取数据的方式 -->
<property>
 		<name>yarn.nodemanager.aux-services</name>
 		<value>mapreduce_shuffle</value>
</property>

<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop101</value>
</property>

Configuration: mapred-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configuration: (rename mapred-site.xml.template to) mapred-site.xml

mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<!-- 指定MR运行在YARN上 -->
<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
</property>

Format the NameNode (format it the first time you start it, and don't always format it later)

bin/hdfs namenode -format

image.png

sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode

image.png
Check to see if it works
image.png

stop-all.sh
start-all.sh

Check to see if it works
image.png

Example 1

Pseudo distributed reads the data on HDFS. To use HDFS, first need to create a user directory in HDFS:

./bin/hdfs dfs -mkdir -p /user/hadoop

Then copy the XML file in ./etc/hadoop as the input file to the distributed file system,
That is copy /usr/local/hadoop/etc/hadoop to the distributed file system /user/hadoop/input.

./bin/hdfs dfs -mkdir -p input
./bin/hdfs dfs -put ./etc/hadoop/*.xml input

After copying, view the file list through the following command:

./bin/hdfs dfs -ls input

image.png
The way of pseudo distributed running MapReduce job is the same as that of stand-alone mode. The difference is that pseudo distributed reads the files in HDFS

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'

image.png
Use the following command to view the operation results (the output results located in HDFS are viewed):

./bin/hdfs dfs -cat output/*

image.png

Example 2

The same procedure as Example 1.
Example 2 run the wordcount example.
We take hamlet.txt in the wcinput folder as input and count the number of each word occurrences.
Finally, we output the results to the wcoutput folder.

./bin/hdfs dfs -mkdir wcinput
./bin/hdfs dfs -put ./wcinput/wc.input wcinput
./bin/hdfs dfs -ls wcinput

image.png

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount wcinput wcoutput

image.png
image.png

./bin/hdfs dfs -cat wcoutput/*

image.png

3)Web UI

http://hadoop101:50070/dfshealth.html#tab-overview,如果打不开,参见Problems and Solutions.
网页捕获_23-4-2023_20535_hadoop101.jpeg
http://hadoop101:50070/explorer.html#/
image.png
http://hadoop101:8088/cluster
网页捕获_23-4-2023_2058_hadoop101.jpeg

3.Problems and Solutions

  1. Configuring a static network

When configuring a static network, if you find the same error after pinging: Name or service not know
It should be a DNS configuration problem:

vi /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4

After that you should be able to ping again without any problems

  1. java commands and hadoop commands do not work after installation
sync
reboot
  1. Can't visit Web UI

Solution1: Check the IP and hostname mapping in your windows local profile

C:\Windows\System32\drivers\etc

Find the hosts file in the above path, copy it out and add the following content

YourHadoop101IPAddress hadoop101

Then save it and paste it back.
Solution2:

vi /etc/selinux/config

let

SELINUX=disabled

Solution3:
Check whether core-site.xml and hdfs-site.xml under your $HADOOP_HOME/etc/hadoop are configured
Solution4:
The absolute path to Java must be set in the hadoop-env.sh file
Solution4:
Whether to turn off the firewall of linux system

posted @ 2023-06-22 22:42  LateSpring  阅读(20)  评论(0编辑  收藏  举报