CentOS配置hadoop以及基本操作实验(英文版)【大数据处理技术】
The virtual machine image, hadoop installation package and JAVA installation package used in this article are
链接:https://pan.baidu.com/s/1qaA2DxPmwm8eN2qCl18jQQ?pwd=7zik
提取码:7zik
Experimental environment
Version | |
---|---|
OS | CentOS Linux release 7.9.2009 (Core) |
JDK | 1.8.0_144 |
Hadoop | 2.7.2 |
Experiment Step
Show how you conduct the experiment (step by step) and show your codes/commands, results, and screenshots.
1 Linux commands Practice
1.1 cd
cd: change directory
(1) change to directory /usr/local
cd /usr/local
(2) move up/back one directory
cd ..
(3) move to your home directory
cd ~
1.2 ls
ls: lists files
(4) lists all of the files in /usr directory.
ls /usr
1.3 mkdir
mkdir: make a new directory
(5) change to /tmp directory, and make a new directory named ‘new’
cd /tmp
mkdir new
(6) make a directory a1/a2/a3/a4
mkdir -p a1/a2/a3/a4
1.4 rmdir
rmdir: remove empty directories
(7) remove the ‘new’ directory (created in 5)
cd /tmp
rmdir new
(8) remove a1/a2/a3/a4
rmdir -p a1/a2/a3/a4
1.5 cp
cp: copy files or directory
(9) copy .bashrc file (under your home folder) to /usr, and name it as ‘bashrc1’
sudo cp ~/.bashrc /usr
sudo mv /usr/.bashrc /usr/bashrc1
(10) create a new directory ‘test’ under /tmp, and copy this directory to /usr
mkdir /tmp/test
sudo cp -r /tmp/test /usr
1.6 mv
mv: move or rename files
(11) move file bashrc1 (created in 9) to /usr/test directory
sudo mv /usr/bashrc1 /usr/test
(12) rename test directory (created in 10) to test2
sudo mv /usr/test /usr/test2
1.7 rm
rm: remove files or directory
(13) remove file bashrc1
sudo rm -rf /usr/test2/bashrc1
(14) remove directory test2
sudo rm -rf /usr/test2
1.8 cat
cat: display file content
(15) view the content of file .bashrc
cat ~/.bashrc
1.9 tac
tac: display file content in reverse
(16) print the content of file .bashrc in reverse
tac ~/.bashrc
1.10 more
more: displays output one screenful at a time
(17) use more command the view the content of file .bashrc
more ~/.bashrc
1.11 head
head: view the top few lines of a file
(18) view the first 20 lines of file .bashrc
head -n 20 ~/.bashrc
(19) view the first few lines of file .bashrc, do not display the last 50 lines
head -n -50 ~/.bashrc
1.12 tail
tail: view the last few lines of a file
(20) view the last 20 lines of file .bashrc
tail -n 20 ~/.bashrc
(21) view the content of file .bashrc, only display the content after line 50
tail -n 50 ~/.bashrc
1.13 chown
chown: change ownership
(22) change the ownership of any file and view the permissions
vim hello.txt
sudo chown root hello.txt
1.14 chmod
chmod: change the permissions of a file
(23) change the permissions of any file
sudo chmod -R 777 hello.txt
1.15 find/locate
find/locate: search for files
(24) find file .bashrc, state the difference between find and locate command
find .bashrc
1.16 grep
grep: search through the text in a given file
(25) search for “examples” from /.bashrc file
grep examples ~/.bashrc
2. Hadoop installation and configuration
Install CentOS Virtual Machine
Click Create Virtual Machine
Click Customize>Next>Install the operating system later>Next>Set the virtual machine storage location>Set the virtual machine name>Number of processors:2>Allocate memory>Select NAT for network type>Finish>
Edit the virtual machine settings>Use ISO image file>Select the image file you have downloaded
Run the virtual machine>Install CentOS 7>Select language: Chinese>Continue>
Date and Time>Settings in the upper right corner>Cancel all previous ticks>Select three new ones from the following>Check the box and OK
ntp1.aliyun.com
ntp2.aliyun.com
ntp3.aliyun.com
ntp4.aliyun.com
ntp5.aliyun.com
ntp6.aliyun.com
ntp7.aliyun.com
Software selection > select "Infrastructure Server" in "Basic Environment" > select "Debugging Tools" in "Additional Options for Selected Environment" > Finish
Installation Location>Select "Auto-configure Partitions">Finish
Network and Hostname>Open Network>Finish
Start installation>ROOT password>Set ROOT password>Finish
Reboot after installation is complete
Configuring a static network
Enter root>Enter password
ping www.baidu.com
Check if the network can ping through, Ctrl+C to stop.
vi /etc/sysconfig/network-scripts/ifcfg-ens33
Press i to enter edit mode, change 2 main things.
- BOOTPROTO="dhcp" to BOOTPROTO="static"
- ONBOOT="no" to ONBOOT="yes", if it is already yes, don't touch it.
Go to VMware interface > Edit > Virtual Network Editor > VMnet8 > NAT Settings > View Subnet IP, Subnet Mask and Gateway
Then add the following lines in the last line, IPADDR can be in the same network segment as the viewed subnet IP, GATEWAY is the same as the viewed
IPADDR="192.168.19.11"
NETMASK="255.255.255.0"
GATEWAY="192.168.19.2"
When you are done, press esc to exit edit mode, then hold shift+: and type wq to save and exit.
After saving and exiting we need to restart the service::
service network restart
Then ping, can ping through, configure the network this step even if no problem. If you have problems, go to Problems and Solutions.
Turn off the firewall
systemctl stop firewalld
systemctl disable firewalld
If you are not sure, you can check it by entering the following command:
systemctl status firewalld
Seeing inactive followed by dead indicates a successful closure.
Configure the host name
hostnamectl set-hostname hadoop101
Enter reboot to reboot and find that it has changed
reboot
Connect to this virtual machine in Xshell
Install JAVA
Create a folder in the /opt directory
Transferring files with Xshell
New file transfer > transfer jdk installation package and hadoop installation package to virtual machine
Check whether the packages are imported successfully in the opt directory under Linux
tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/
View jdk path
/opt/module/jdk1.8.0_144
pwd
Open the /etc/profile file
vi /etc/profile
Add the JDK path at the end of the profile file
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
Save and exit
:wq
Make the modified file effective
source /etc/profile
Test if the JDK is installed successfully
java -version
If the java command does not work, then see the Problems and Solutions section.
Install Hadoop
cd /opt/software
Unzip the installation file under /opt/module
tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
Check if the decompression is successful
ls /opt/module/
Add hadoop to the environment variables
[root@hadoop101 hadoop-2.7.2]# pwd
/opt/module/hadoop-2.7.2
Open the /etc/profile file
vi /etc/profile
Add the hadoop path to the end of the profile file
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
[root@hadoop101 hadoop-2.7.2]# source /etc/profile
[root@hadoop101 hadoop-2.7.2]# hadoop version
Hadoop 2.7.2
Subversion Unknown -r Unknown
Compiled by root on 2017-05-22T10:49Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
If the hadoop command does not work, see the Problems and Solutions section.
1)Hadoop Local Mode
Official Grep Case
Create an input folder under the hadoop-2.7.2 file
mkdir input
Copy the Hadoop xml configuration file to input
cp etc/hadoop/*.xml input
Execute the MapReduce program in the share directory
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
View output results
cat output/*
Official WordCount Case
Create a wcinput folder under the hadoop-2.7.2 file
mkdir wcinput
Create a wc.input file under the wcinput file
cd wcinput
touch wc.input
Edit the wc.input file
vi wc.input
Enter the following into the file
hadoop yarn
hadoop mapreduce
All I can tell you is it's all show biz.
Save to exit
:wq
Go back to the Hadoop directory /opt/module/hadoop-2.7.2
cd ../
Implementation Procedures
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput
View Results
cat wcoutput/part-r-00000
2)Hadoop Pseudo Distributed Configuration
Configuring the cluster
To obtain the installation path of the JDK on a Linux system:
echo $JAVA_HOME
Configure hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
Configuration: core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop101:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
Configuration: hdfs-site.xml
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Configuring yarn-site.xml
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop101</value>
</property>
Configuration: mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
Configuration: (rename mapred-site.xml.template to) mapred-site.xml
mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<!-- 指定MR运行在YARN上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Format the NameNode (format it the first time you start it, and don't always format it later)
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode
Check to see if it works
stop-all.sh
start-all.sh
Check to see if it works
Example 1
Pseudo distributed reads the data on HDFS. To use HDFS, first need to create a user directory in HDFS:
./bin/hdfs dfs -mkdir -p /user/hadoop
Then copy the XML file in ./etc/hadoop
as the input file to the distributed file system,
That is copy /usr/local/hadoop/etc/hadoop
to the distributed file system /user/hadoop/input
.
./bin/hdfs dfs -mkdir -p input
./bin/hdfs dfs -put ./etc/hadoop/*.xml input
After copying, view the file list through the following command:
./bin/hdfs dfs -ls input
The way of pseudo distributed running MapReduce job is the same as that of stand-alone mode. The difference is that pseudo distributed reads the files in HDFS
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
Use the following command to view the operation results (the output results located in HDFS are viewed):
./bin/hdfs dfs -cat output/*
Example 2
The same procedure as Example 1.
Example 2 run the wordcount example.
We take hamlet.txt in the wcinput folder as input and count the number of each word occurrences.
Finally, we output the results to the wcoutput folder.
./bin/hdfs dfs -mkdir wcinput
./bin/hdfs dfs -put ./wcinput/wc.input wcinput
./bin/hdfs dfs -ls wcinput
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount wcinput wcoutput
./bin/hdfs dfs -cat wcoutput/*
3)Web UI
http://hadoop101:50070/dfshealth.html#tab-overview,如果打不开,参见Problems and Solutions.
http://hadoop101:50070/explorer.html#/
http://hadoop101:8088/cluster
3.Problems and Solutions
- Configuring a static network
When configuring a static network, if you find the same error after pinging: Name or service not know
It should be a DNS configuration problem:
vi /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4
After that you should be able to ping again without any problems
- java commands and hadoop commands do not work after installation
sync
reboot
- Can't visit Web UI
Solution1: Check the IP and hostname mapping in your windows local profile
C:\Windows\System32\drivers\etc
Find the hosts file in the above path, copy it out and add the following content
YourHadoop101IPAddress hadoop101
Then save it and paste it back.
Solution2:
vi /etc/selinux/config
let
SELINUX=disabled
Solution3:
Check whether core-site.xml and hdfs-site.xml under your $HADOOP_HOME/etc/hadoop are configured
Solution4:
The absolute path to Java must be set in the hadoop-env.sh file
Solution4:
Whether to turn off the firewall of linux system