CentOS配置hadoop以及基本操作实验（英文版）【大数据处理技术】

The virtual machine image, hadoop installation package and JAVA installation package used in this article are
链接：https://pan.baidu.com/s/1qaA2DxPmwm8eN2qCl18jQQ?pwd=7zik
提取码：7zik

Experimental environment

	Version
OS	CentOS Linux release 7.9.2009 (Core)
JDK	1.8.0_144
Hadoop	2.7.2

Experiment Step

Show how you conduct the experiment (step by step) and show your codes/commands, results, and screenshots.

1 Linux commands Practice

1.1 cd

cd: change directory
(1) change to directory /usr/local

cd /usr/local

(2) move up/back one directory

cd ..

(3) move to your home directory

cd ~

1.2 ls

ls: lists files
(4) lists all of the files in /usr directory.

ls /usr

1.3 mkdir

mkdir: make a new directory
(5) change to /tmp directory, and make a new directory named ‘new’

cd /tmp
mkdir new

(6) make a directory a1/a2/a3/a4

mkdir -p a1/a2/a3/a4

1.4 rmdir

rmdir: remove empty directories
(7) remove the ‘new’ directory (created in 5)

cd /tmp
rmdir new

(8) remove a1/a2/a3/a4

rmdir -p a1/a2/a3/a4

1.5 cp

cp: copy files or directory
(9) copy .bashrc file (under your home folder) to /usr, and name it as ‘bashrc1’

sudo cp ~/.bashrc /usr
sudo mv /usr/.bashrc /usr/bashrc1

(10) create a new directory ‘test’ under /tmp, and copy this directory to /usr

mkdir /tmp/test
sudo cp -r /tmp/test /usr

1.6 mv

mv: move or rename files
(11) move file bashrc1 (created in 9) to /usr/test directory

sudo mv /usr/bashrc1 /usr/test

(12) rename test directory (created in 10) to test2

sudo mv /usr/test /usr/test2

1.7 rm

rm: remove files or directory
(13) remove file bashrc1

sudo rm -rf /usr/test2/bashrc1

(14) remove directory test2

sudo rm -rf /usr/test2

1.8 cat

cat: display file content
(15) view the content of file .bashrc

cat ~/.bashrc

1.9 tac

tac: display file content in reverse
(16) print the content of file .bashrc in reverse

tac ~/.bashrc

1.10 more

more: displays output one screenful at a time
(17) use more command the view the content of file .bashrc

more ~/.bashrc

1.11 head

head: view the top few lines of a file
(18) view the first 20 lines of file .bashrc

head -n 20 ~/.bashrc

(19) view the first few lines of file .bashrc, do not display the last 50 lines

head -n -50 ~/.bashrc

1.12 tail

tail: view the last few lines of a file
(20) view the last 20 lines of file .bashrc

tail -n 20 ~/.bashrc

(21) view the content of file .bashrc, only display the content after line 50

tail -n 50 ~/.bashrc

1.13 chown

chown: change ownership
(22) change the ownership of any file and view the permissions

vim hello.txt
sudo chown root hello.txt

1.14 chmod

chmod: change the permissions of a file
(23) change the permissions of any file

sudo chmod -R 777 hello.txt

1.15 find/locate

find/locate: search for files
(24) find file .bashrc, state the difference between find and locate command

find .bashrc

1.16 grep

grep: search through the text in a given file
(25) search for “examples” from /.bashrc file

grep examples ~/.bashrc

2. Hadoop installation and configuration

Install CentOS Virtual Machine

Click Create Virtual Machine
Click Customize>Next>Install the operating system later>Next>Set the virtual machine storage location>Set the virtual machine name>Number of processors:2>Allocate memory>Select NAT for network type>Finish>
Edit the virtual machine settings>Use ISO image file>Select the image file you have downloaded
Run the virtual machine>Install CentOS 7>Select language: Chinese>Continue>
Date and Time>Settings in the upper right corner>Cancel all previous ticks>Select three new ones from the following>Check the box and OK

ntp1.aliyun.com
ntp2.aliyun.com
ntp3.aliyun.com
ntp4.aliyun.com
ntp5.aliyun.com
ntp6.aliyun.com
ntp7.aliyun.com

Software selection > select "Infrastructure Server" in "Basic Environment" > select "Debugging Tools" in "Additional Options for Selected Environment" > Finish
Installation Location>Select "Auto-configure Partitions">Finish
Network and Hostname>Open Network>Finish
Start installation>ROOT password>Set ROOT password>Finish
Reboot after installation is complete

Configuring a static network

Enter root>Enter password

ping www.baidu.com

Check if the network can ping through, Ctrl+C to stop.

vi /etc/sysconfig/network-scripts/ifcfg-ens33

Press i to enter edit mode, change 2 main things.

BOOTPROTO="dhcp" to BOOTPROTO="static"
ONBOOT="no" to ONBOOT="yes", if it is already yes, don't touch it.
Go to VMware interface > Edit > Virtual Network Editor > VMnet8 > NAT Settings > View Subnet IP, Subnet Mask and Gateway
Then add the following lines in the last line, IPADDR can be in the same network segment as the viewed subnet IP, GATEWAY is the same as the viewed

IPADDR="192.168.19.11"
NETMASK="255.255.255.0"
GATEWAY="192.168.19.2"

When you are done, press esc to exit edit mode, then hold shift+: and type wq to save and exit.
After saving and exiting we need to restart the service::

service network restart

Then ping, can ping through, configure the network this step even if no problem. If you have problems, go to Problems and Solutions.

Turn off the firewall

systemctl stop firewalld
systemctl disable firewalld

If you are not sure, you can check it by entering the following command：

systemctl status firewalld

Seeing inactive followed by dead indicates a successful closure.

Configure the host name

hostnamectl set-hostname hadoop101

Enter reboot to reboot and find that it has changed

reboot

Connect to this virtual machine in Xshell

Install JAVA

Create a folder in the /opt directory

Transferring files with Xshell
New file transfer > transfer jdk installation package and hadoop installation package to virtual machine
Check whether the packages are imported successfully in the opt directory under Linux

tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/

View jdk path

/opt/module/jdk1.8.0_144
pwd

Open the /etc/profile file

vi /etc/profile

Add the JDK path at the end of the profile file

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin

Save and exit

:wq

Make the modified file effective

source /etc/profile

Test if the JDK is installed successfully

java -version

If the java command does not work, then see the Problems and Solutions section.

Install Hadoop

cd /opt/software

Unzip the installation file under /opt/module

tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/

Check if the decompression is successful

ls /opt/module/

Add hadoop to the environment variables

[root@hadoop101 hadoop-2.7.2]# pwd
/opt/module/hadoop-2.7.2

Open the /etc/profile file

vi /etc/profile

Add the hadoop path to the end of the profile file

##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

[root@hadoop101 hadoop-2.7.2]# source /etc/profile
[root@hadoop101 hadoop-2.7.2]# hadoop version
Hadoop 2.7.2
Subversion Unknown -r Unknown
Compiled by root on 2017-05-22T10:49Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar

If the hadoop command does not work, see the Problems and Solutions section.

1）Hadoop Local Mode

Official Grep Case

Create an input folder under the hadoop-2.7.2 file

mkdir input

Copy the Hadoop xml configuration file to input

cp etc/hadoop/*.xml input

Execute the MapReduce program in the share directory

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'

View output results

cat output/*

Official WordCount Case

Create a wcinput folder under the hadoop-2.7.2 file

mkdir wcinput

Create a wc.input file under the wcinput file

cd wcinput
touch wc.input

Edit the wc.input file

vi wc.input

Enter the following into the file

hadoop yarn
hadoop mapreduce
All I can tell you is it's all show biz.

Save to exit

:wq

Go back to the Hadoop directory /opt/module/hadoop-2.7.2

cd ../

Implementation Procedures

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput

View Results

cat wcoutput/part-r-00000

2）Hadoop Pseudo Distributed Configuration

Configuring the cluster
To obtain the installation path of the JDK on a Linux system:

echo $JAVA_HOME

Configure hadoop-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configuration: core-site.xml

<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://hadoop101:9000</value>
</property>

<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

Configuration: hdfs-site.xml

<!-- 指定HDFS副本的数量 -->
<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>

Configuring yarn-site.xml

<!-- Reducer获取数据的方式 -->
<property>
 		<name>yarn.nodemanager.aux-services</name>
 		<value>mapreduce_shuffle</value>
</property>

<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop101</value>
</property>

Configuration: mapred-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configuration: (rename mapred-site.xml.template to) mapred-site.xml

mv mapred-site.xml.template mapred-site.xml

vi mapred-site.xml

<!-- 指定MR运行在YARN上 -->
<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
</property>

Format the NameNode (format it the first time you start it, and don't always format it later)

bin/hdfs namenode -format

sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode

Check to see if it works

stop-all.sh

start-all.sh

Check to see if it works

Example 1

Pseudo distributed reads the data on HDFS. To use HDFS, first need to create a user directory in HDFS:

./bin/hdfs dfs -mkdir -p /user/hadoop

Then copy the XML file in ./etc/hadoop as the input file to the distributed file system,
That is copy /usr/local/hadoop/etc/hadoop to the distributed file system /user/hadoop/input.

./bin/hdfs dfs -mkdir -p input
./bin/hdfs dfs -put ./etc/hadoop/*.xml input

After copying, view the file list through the following command:

./bin/hdfs dfs -ls input

The way of pseudo distributed running MapReduce job is the same as that of stand-alone mode. The difference is that pseudo distributed reads the files in HDFS

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'

Use the following command to view the operation results (the output results located in HDFS are viewed):

./bin/hdfs dfs -cat output/*

Example 2

The same procedure as Example 1.
Example 2 run the wordcount example.
We take hamlet.txt in the wcinput folder as input and count the number of each word occurrences.
Finally, we output the results to the wcoutput folder.

./bin/hdfs dfs -mkdir wcinput

./bin/hdfs dfs -put ./wcinput/wc.input wcinput

./bin/hdfs dfs -ls wcinput

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount wcinput wcoutput

./bin/hdfs dfs -cat wcoutput/*

3）Web UI

http://hadoop101:50070/dfshealth.html#tab-overview，如果打不开，参见Problems and Solutions.
网页捕获_23-4-2023_20535_hadoop101.jpeg
http://hadoop101:50070/explorer.html#/

http://hadoop101:8088/cluster
网页捕获_23-4-2023_2058_hadoop101.jpeg

3.Problems and Solutions

Configuring a static network

When configuring a static network, if you find the same error after pinging: Name or service not know
It should be a DNS configuration problem:

vi /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4

After that you should be able to ping again without any problems

java commands and hadoop commands do not work after installation

sync
reboot

Can't visit Web UI

Solution1: Check the IP and hostname mapping in your windows local profile

C:\Windows\System32\drivers\etc

Find the hosts file in the above path, copy it out and add the following content

YourHadoop101IPAddress hadoop101

Then save it and paste it back.
Solution2:

vi /etc/selinux/config

let

SELINUX=disabled

Solution3:
Check whether core-site.xml and hdfs-site.xml under your $HADOOP_HOME/etc/hadoop are configured
Solution4:
The absolute path to Java must be set in the hadoop-env.sh file
Solution4:
Whether to turn off the firewall of linux system

posted @ 2023-06-22 22:42 LateSpring 阅读(22) 评论(0) 编辑收藏举报

刷新页面返回顶部

Loading

Jinyu's Blog