目录
03-01-Hadoop的目录结构和本地模式
解压安装包
tar -zxvf hadoop-2.7.3.tar.gz -C /root/training
tar -zxvf jdk-8u144-linux-x64.tar.gz -C /root/training
tar -zxvf apache-hive-2.3.0-bin.tar.gz -C /root/training
tar -zxvf hbase-1.3.1-bin.tar.gz -C /root/training
环境变量/etc/profile
JAVA_HOME=/root/training/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH
HADOOP_HOME=/root/training/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
HBASE_HOME=/root/training/hbase-1.3.1
export HBASE_HOME
PATH=$HBASE_HOME/bin:$PATH
export PATH
HIVE_HOME=/root/training/apache-hive-2.3.0-bin
export HIVE_HOME
PATH=$HIVE_HOME/bin:$PATH
export PATH
使环境变量生效:
source /etc/profile
查看目录:
[rootebigdatalil training]# tree -d -L 2
-d 表示只查看目录
-L 查看深度为两级
Hadoop的目录结构.png
本地模式:
特点:没有HDFS,只能测试MapReduce程序(不是运行在Yarn中,做一个独立的Java程序来运行)
搭建步骤:修改 /root/training/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
改为
export JAVA_HOME=/root/training/jdk1.8.0_144
测试本地模式MapReduce程序
rm -rf * 表示删除当前目录下的所有文件。
root@ubuntu:~/temp# pwd
/root/temp
root@ubuntu:~/temp# nano data.txt
root@ubuntu:~/temp# nano data.txt
root@ubuntu:~/temp# cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# ls hadoop-mapreduce-examples-2.7.3.jar
hadoop-mapreduce-examples-2.7.3.jar
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /root/temp/input/data.txt /root/temp/output/wc
查看结果:
root@ubuntu:~/training/hadoop-2.7.3/share/hadoop/mapreduce# cd /root/temp/output/wc
root@ubuntu:~/temp/output/wc# ls -al
total 20
drwxr-xr-x 2 root root 4096 Oct 16 11:17 .
drwxr-xr-x 3 root root 4096 Oct 16 11:17 ..
-rw-r--r-- 1 root root 55 Oct 16 11:17 part-r-00000
-rw-r--r-- 1 root root 12 Oct 16 11:17 .part-r-00000.crc
-rw-r--r-- 1 root root 0 Oct 16 11:17 _SUCCESS
-rw-r--r-- 1 root root 8 Oct 16 11:17 ._SUCCESS.crc
root@ubuntu:~/temp/output/wc# ls
part-r-00000 _SUCCESS
root@ubuntu:~/temp/output/wc# nano part-r-00000
root@ubuntu:~/temp/output/wc# echo part-r-00000
part-r-00000
root@ubuntu:~/temp/output/wc# cat part-r-00000
Beijing 2
China 2
I 2
capital 1
is 1
love 2
of 1
the 1
查看结果.png
hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /root/temp/input/data.txt /root/temp/output/wc
其中/root/temp/input/data.txt 可以写目录,路径都是本地Linux的路径
03-02-配置Hadoop的伪分布模式
特点:在单机上,模拟一个分布式的环境,具备Hadoop的所有功能
HDFS:NameNode + DataNode + SecondaryNameNode
Yarn:ResourceManager + NodeManager
解压安装包
同上
环境变量/etc/profile
同上
配置文件.png
(1)修改 /root/training/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
改为
export JAVA_HOME=/root/training/jdk1.8.0_144
(2)hdfs-site.xml
<!--配置数据块的冗余度,默认是3-->
<!--原则冗余度跟数据节点个数保持一致,最大不要超过3-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--是否开启HDFS的权限检查,默认是true-->
<!--使用默认值,后面会改为false-->
<!--
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
-->
~~~
(3)core-site.xml
~~~
<!--配置HDFS主节点的地址,就是NameNode的地址-->
<!--9000是RPC通信的端口-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.16.143:9000</value>
</property>
<!--HDFS数据块和元信息保存在操作系统的目录位置-->
<!--默认是Linux的tmp目录,一定要修改-->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/training/hadoop-2.7.3/tmp</value>
</property>
~~~
自己创建/root/training/hadoop-2.7.3/tmp目录
(4)mapred-site.xml(默认没有这个文件)
~~~
<!--MR程序运行容器或者框架-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
~~~
(5)yarn-site.xml
~~~
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.16.143</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
~~~
(6)对HDFS的NameNode进行格式化
~~~
命令:hdfs namenode -format
日志:Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
~~~
(7)启动:
~~~
HDFS:start-dfs.sh
Yarn: start-yarn.sh
统一的:start-all.sh
~~~
~~~
root@bigdata00:~/training/hadoop-2.7.3# jps
2690 NameNode
3219 ResourceManager
3544 NodeManager
3582 Jps
2863 DataNode
3071 SecondaryNameNode
root@bigdata00:~/training/hadoop-2.7.3#
~~~
(8)web console 访问
Web Console访问:
hdfs: 端口: 192.168.16.143:50070
yarn: 端口:192.168.16.143:8088
##### hdfs: 端口50070.png
![](0003.搭建Hadoop的环境.assets/50070.png)
##### yarn: 端口8088.png
![](0003.搭建Hadoop的环境.assets/8088.png)
-----------------------------------------------------------------
#### 03-03-免密码登录的原理和配置
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.143
~~~
root@bigdata00:~# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
9b:77:c7:5c:ef:b3:85:ac:61:24:4d:30:dc:1c:f3:ed root@bigdata00
The key's randomart image is:
+--[ RSA 2048]----+
| .ooo. |
| .ooo . |
| . . .|
| o . |
| S . o E|
| o o + o.|
| o . + * o|
| . o + o.|
| . .+|
+-----------------+
root@bigdata00:~# ls
tools training
root@bigdata00:~# ls -al
total 40
drwx------ 7 root root 4096 Oct 16 12:19 .
drwxr-xr-x 23 root root 4096 Oct 15 20:44 ..
-rw------- 1 root root 55 Oct 16 08:11 .Xauthority
-rw-r--r-- 1 root root 3106 Apr 19 2012 .bashrc
drwx------ 2 root root 4096 Oct 16 07:46 .cache
drwxr-xr-x 2 root root 4096 Oct 16 12:15 .oracle_jre_usage
-rw-r--r-- 1 root root 140 Apr 19 2012 .profile
drwx------ 2 root root 4096 Oct 16 12:42 .ssh
drwxr-xr-x 2 root root 4096 Oct 16 07:57 tools
drwxr-xr-x 6 root root 4096 Oct 16 11:39 training
root@bigdata00:~# cd .ssh
root@bigdata00:~/.ssh# ls -al
total 20
drwx------ 2 root root 4096 Oct 16 12:42 .
drwx------ 7 root root 4096 Oct 16 12:19 ..
-rw------- 1 root root 1675 Oct 16 12:42 id_rsa
-rw-r--r-- 1 root root 396 Oct 16 12:42 id_rsa.pub
-rw-r--r-- 1 root root 666 Oct 16 12:20 known_hosts
root@bigdata00:~/.ssh# cd ..
root@bigdata00:~# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.143
root@192.168.16.143's password:
Now try logging into the machine, with "ssh 'root@192.168.16.143'", and check in:
~/.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
root@bigdata00:~# ls .ssh/
authorized_keys id_rsa id_rsa.pub known_hosts
root@bigdata00:~# more .ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/XCppmAEL6AnXoYXlmTr639AupthLny6JQ4zF9Jpg
S4mhycZCrHpVCxhERV9p+HzNFPRZBaWluseCOkzbAXbmMsXSucXcrbV+wyg0el+CHuDopJZ4JiAPjK8t
AnSPK1bdggCAVGaI138pU81YMgOntX3gV49CcIEGx9KFF4wLaPMq/PJrr9+omYhkTF50i+oHwl+bG2DL
GZFmJuk3nxF+rsGEHwdDCfBtcoa1f7Si4BA7gf0dEXBlydPMeYM48rgK0XAgNReBZJWBTooGkSXuxHy1
jccIiwH9G+mlZI38WI7YRIx6HZIwzfpG8yVTXahdPamC2MJ+w54dj0jKyVUL root@bigdata00
root@bigdata00:~# ssh 192.168.16.143
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.11.0-15-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Fri Oct 16 12:47:00 CST 2020
System load: 0.0 Processes: 395
Usage of /: 41.5% of 6.50GB Users logged in: 2
Memory usage: 74% IP address for eth0: 192.168.16.143
Swap usage: 0%
Graph this data and manage this system at:
https://landscape.canonical.com/
0 packages can be updated.
0 updates are security updates.
Last login: Fri Oct 16 08:11:46 2020 from 192.168.16.1
root@bigdata00:~# stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [192.168.16.143]
192.168.16.143: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
root@bigdata00:~# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [192.168.16.143]
192.168.16.143: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata00.out
localhost: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata00.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdata00.out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata00.out
localhost: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata00.out
~~~
##### 免密码登录的原理.png
![](0003.搭建Hadoop的环境.assets/免密码登录的原理.png)
##### 伪分布模式wordcount
主要命令:
~~~
hdfs dfs -put data.txt /input
cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc2
其中/input/data.txt /output/wc2 为hdfs 地址,其中/output/wc2不能事先存在。
~~~
~~~
root@bigdata00:~# jps
1710 Jps
root@bigdata00:~# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [192.168.16.143]
192.168.16.143: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata00.out
localhost: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata00.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdata00.out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata00.out
localhost: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata00.out
root@bigdata00:~# jps
2291 SecondaryNameNode
1894 NameNode
2887 Jps
2602 NodeManager
2447 ResourceManager
2047 DataNode
root@bigdata00:~# hdfs dfs -ls /
root@bigdata00:~# hdfs dfs -mkdir /input
root@bigdata00:~# hdfs dfs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2020-10-16 13:35 /input
root@bigdata00:~# cd /root/temp/input
root@bigdata00:~/temp/input# ls
data.txt
root@bigdata00:~/temp/input# hdfs dfs -put data.txt /input
root@bigdata00:~/temp/input# hdfs dfs -ls /input
Found 1 items
-rw-r--r-- 1 root supergroup 60 2020-10-16 13:36 /input/data.txt
root@bigdata00:~/temp/input# cd /root/training/hadoop-2.7.3/share/hadoop/mapreduce
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc2
20/10/16 13:37:52 INFO client.RMProxy: Connecting to ResourceManager at /192.168.16.143:8032
20/10/16 13:37:54 INFO input.FileInputFormat: Total input paths to process : 1
20/10/16 13:37:54 INFO mapreduce.JobSubmitter: number of splits:1
20/10/16 13:37:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1602826392719_0001
20/10/16 13:37:55 INFO impl.YarnClientImpl: Submitted application application_1602826392719_0001
20/10/16 13:37:55 INFO mapreduce.Job: The url to track the job: http://192.168.16.143:8088/proxy/application_1602826392719_0001/
20/10/16 13:37:55 INFO mapreduce.Job: Running job: job_1602826392719_0001
20/10/16 13:38:13 INFO mapreduce.Job: Job job_1602826392719_0001 running in uber mode : false
20/10/16 13:38:13 INFO mapreduce.Job: map 0% reduce 0%
20/10/16 13:38:25 INFO mapreduce.Job: map 100% reduce 0%
20/10/16 13:38:34 INFO mapreduce.Job: map 100% reduce 100%
20/10/16 13:38:36 INFO mapreduce.Job: Job job_1602826392719_0001 completed successfully
20/10/16 13:38:36 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=93
FILE: Number of bytes written=237535
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=166
HDFS: Number of bytes written=55
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=9394
Total time spent by all reduces in occupied slots (ms)=6930
Total time spent by all map tasks (ms)=9394
Total time spent by all reduce tasks (ms)=6930
Total vcore-milliseconds taken by all map tasks=9394
Total vcore-milliseconds taken by all reduce tasks=6930
Total megabyte-milliseconds taken by all map tasks=9619456
Total megabyte-milliseconds taken by all reduce tasks=7096320
Map-Reduce Framework
Map input records=3
Map output records=12
Map output bytes=108
Map output materialized bytes=93
Input split bytes=106
Combine input records=12
Combine output records=8
Reduce input groups=8
Reduce shuffle bytes=93
Reduce input records=8
Reduce output records=8
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=270
CPU time spent (ms)=3420
Physical memory (bytes) snapshot=286212096
Virtual memory (bytes) snapshot=4438401024
Total committed heap usage (bytes)=138043392
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=60
File Output Format Counters
Bytes Written=55
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hadoop jar hadooproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# cd /root/trainingroot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /inproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /inproot@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /in
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /output
Found 1 items
drwxr-xr-x - root supergroup 0 2020-10-16 13:38 /output/wc2
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -ls /output/wc2
Found 2 items
-rw-r--r-- 1 root supergroup 0 2020-10-16 13:38 /output/wc2/_SUCCESS
-rw-r--r-- 1 root supergroup 55 2020-10-16 13:38 /output/wc2/part-r-00000
root@bigdata00:~/training/hadoop-2.7.3/share/hadoop/mapreduce# hdfs dfs -cat /output/wc2/part-r-00000
Beijing 2
China 2
I 2
capital 1
is 1
love 2
of 1
the 1
~~~
-----------------------------------------------------------------
#### 03-04-搭建Hadoop的全分布模式
##### SecureCRT同时给多个Session.png
![](0003.搭建Hadoop的环境.assets/SecureCRT同时给多个Session.png)
##### ##### 设置主机名和IP nano /etc/hosts
##### 至少需要3台机器集群的规划
192.168.16.141 bigdata01 NameNode + SecondaryNameNode + ResourceManager
192.168.16.138 bigdata02 DataNode + NodeManager
192.168.16.139 bigdata03 DataNode + NodeManager
##### 配置免密码登录:两两之间的免密码登录
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.141
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.138
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.16.139
##### 配置
###### 全分布模式的主节点配置.png
![](0003.搭建Hadoop的环境.assets/全分布模式的配置.png)
0. 解压java、hadoop
1. 3 台的java、hadoop环境变量,使生效
2. hadoop-env.sh
3. hdfs-site.xml
4. core-site.xml
5. mapred-site.xml
6. yarn-site.xml
7. slaves 配置从节点地址
* 192.168.16.138
* 192.168.16.139
8. 对namenode进行格式化
* hdfs namenode -format
9. 把192.168.16.141上安装好的目录复制到从节点上
* scp -r /root/training/jdk1.8.0_144 root@192.168.16.138:/root/training
* scp -r /root/training/jdk1.8.0_144 root@192.168.16.139:/root/training
* scp -r /root/training/hadoop-2.7.3/ root@192.168.16.138:/root/training
* scp -r /root/training/hadoop-2.7.3/ root@192.168.16.139:/root/training
10.start-all.sh
~~~
[ rootebigdatal12 training]# start-all.sh
This script is Deprecated. Instead use start-dfs. sh and start-yarn. sh
Starting namenodes on [ bigdatal12]
bigdatal12: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdatal12. out
bigdatal13: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdatal13. out
bigdatal14: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdatal14. out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdatal12. out
starting yarn daemons
starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdatal12. out bigdatal14: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdatal14. outbigdatal13: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdatal13. out
~~~
~~~
[root@bigdatal12 training]#jps
13254 NameNode
13433 SecondaryNameNode
13578 ResourceManager
13835 Jps
~~~
~~~
[rootebigdata113 training]# jps
11847 DataNode
11943 Nodelanager
12043 Jps
~~~
~~~
[root@bigdata114 training# jps
11744 Jps
11548 Datalode
11644 Nodelanager
~~~
-----------------------------------------------------------------
#### 03-05-主从结构的单点故障