大数据4、5、6、7、8、9、10、13章
第4章 Hadoop文件参数配置
实验一:hadoop 全分布配置
1.1 实验目的
完成本实验,您应该能够:
- 掌握 hadoop 全分布的配置
- 掌握 hadoop 全分布的安装
- 掌握 hadoop 配置文件的参数意义
1.2 实验要求
- 熟悉 hadoop 全分布的安装
- 了解 hadoop 配置文件的意义
1.3 实验过程
1.3.1 实验任务一:在 Master 节点上安装 Hadoop
1.3.1.1 步骤一:解压缩 hadoop-2.7.1.tar.gz 安装包到/usr 目录下
[root@master ~]# tar zvxf jdk-8u152-linux-x64.tar.gz -C /usr/local/src/
[root@master ~]# tar zvxf hadoop-2.7.1.tar.gz -C /usr/local/src/
1.3.1.2 步骤二:将 hadoop-2.7.1 文件夹重命名为 hadoop
[root@master ~]# cd /usr/local/src/
[root@master src]# ls
hadoop-2.7.1 jdk1.8.0_152
[root@master src]# mv hadoop-2.7.1/ hadoop
[root@master src]# mv jdk1.8.0_152/ jdk
[root@master src]# ls
hadoop jdk
1.3.1.3 步骤三:配置 Hadoop 环境变量
[root@master ~]# vi /etc/profile.d/hadoop.sh
注意:在第二章安装单机 Hadoop 系统已经配置过环境变量,先删除之前配置后添加
#写入以下信息
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
1.3.1.4 步骤四:使配置的 Hadoop 的环境变量生效
[root@master ~]# source /etc/profile.d/hadoop.sh
[root@master ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
1.3.1.5 步骤五:执行以下命令修改 hadoop-env.sh 配置文件
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
#写入以下信息
export JAVA_HOME=/usr/local/src/jdk
1.3.2 实验任务二:配置 hdfs-site.xml 文件参数
执行以下命令修改 hdfs-site.xml 配置文件。
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
#在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/src/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/src/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
创建目录
[root@master ~]# mkdir -p /usr/local/src/hadoop/dfs/{name,data}
对于 Hadoop 的分布式文件系统 HDFS 而言,一般都是采用冗余存储,冗余因子通常为3,也就是说,一份数据保存三份副本。所以,修改 dfs.replication 的配置,使 HDFS 文件的备份副本数量设定为2个。
1.3.3 实验任务三:配置 core-site.xml 文件参数
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
#在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/src/hadoop/tmp</value>
</property>
</configuration>
#保存以上配置后创建目录
[root@master ~]# mkdir -p /usr/local/src/hadoop/tmp
如没有配置 hadoop.tmp.dir 参数,此时系统默认的临时目录为:/tmp/hadoop-hadoop。该目录在每次 Linux 系统重启后会被删除,必须重新执行 Hadoop 文件系统格式化命令,否则 Hadoop 运行会出错。
1.3.4 实验任务四:配置 mapred-site.xml
[root@master ~]# cd /usr/local/src/hadoop/etc/hadoop/
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
#在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
1.3.5 实验任务五:配置 yarn-site.xml
[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml
#在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>arn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
1.3.6 实验任务六:Hadoop 其它相关配置
1.3.6.1 步骤一:配置 masters 文件
#修改 masters 配置文件
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/masters
#加入以下配置信息
10.10.10.128
1.3.6.2 步骤二:配置 slaves 文件
#修改 slaves 配置文件
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/slaves
#删除 localhost,加入以下配置信息
10.10.10.129
10.10.10.130
1.3.6.3 步骤三:新建用户以及修改目录权限
#新建用户
[root@master ~]# useradd hadoop
[root@master ~]# echo 'hadoop' | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
#修改目录权限
[root@master ~]# chown -R hadoop.hadoop /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# ll
total 0
drwxr-xr-x 11 hadoop hadoop 171 Mar 27 01:51 hadoop
drwxr-xr-x 8 hadoop hadoop 255 Sep 14 2017 jdk
1.3.6.4 步骤四:配置master能够免密登录所有slave节点
[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Ibeslip4Bo9erREJP37u7qhlwaEeMOCg8DlJGSComhk root@master
The key's randomart image is:
+---[RSA 2048]----+
|B.oo |
|Oo.o |
|=o=. . o|
|E.=.o + o |
|.* BS|
|* o = o |
| * * o+ |
|o O *o |
|.=.+== |
+----[SHA256]-----+
[root@master ~]# ssh-copy-id root@slave1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'slave1 (10.10.10.129)' can't be established.
ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@slave1'"
and check to make sure that only the key(s) you wanted were added.
[root@master ~]# ssh-copy-id root@slave2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (10.10.10.130)' can't be established.
ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@slave2'"
and check to make sure that only the key(s) you wanted were added.
[root@master ~]# ssh slave1
Last login: Sun Mar 27 02:58:38 2022 from master
[root@slave1 ~]# exit
logout
Connection to slave1 closed.
[root@master ~]# ssh slave2
Last login: Sun Mar 27 00:26:12 2022 from 10.10.10.1
[root@slave2 ~]# exit
logout
Connection to slave2 closed.
1.3.6.5 步骤五:同步/usr/local/src/目录下所有文件至所有slave节点
[root@master ~]# scp -r /usr/local/src/* root@slave1:/usr/local/src/
[root@master ~]# scp -r /usr/local/src/* root@slave2:/usr/local/src/
[root@master ~]# scp /etc/profile.d/hadoop.sh root@slave1:/etc/profile.d/
hadoop.sh 100% 151 45.9KB/s 00:00
[root@master ~]# scp /etc/profile.d/hadoop.sh root@slave2:/etc/profile.d/
hadoop.sh 100% 151 93.9KB/s 00:00
1.3.6.6 步骤六:在所有slave节点执行以下命令
(1)在slave1
[root@slave1 ~]# useradd hadoop
[root@slave1 ~]# echo 'hadoop' | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
[root@slave1 ~]# chown -R hadoop.hadoop /usr/local/src/
[root@slave1 ~]# ll /usr/local/src/
total 0
drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:07 hadoop
drwxr-xr-x 8 hadoop hadoop 255 Mar 27 03:07 jdk
[root@slave1 ~]# source /etc/profile.d/hadoop.sh
[root@slave1 ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
(2)在slave2
[root@slave2 ~]# useradd hadoop
[root@slave2 ~]# echo 'hadoop' | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
[root@slave2 ~]# chown -R hadoop.hadoop /usr/local/src/
[root@slave2 ~]# ll /usr/local/src/
total 0
drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:09 hadoop
drwxr-xr-x 8 hadoop hadoop 255 Mar 27 03:09 jdk
[root@slave2 ~]# source /etc/profile.d/hadoop.sh
[root@slave2 ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
第5章 Hadoop集群运行
实验一:hadoop 集群运行
1.1 实验目的
完成本实验,您应该能够:
- 掌握 hadoop 的运行状态
- 掌握 hadoop 文件系统格式化配置
- 掌握 hadoop java 运行状态查看
- 掌握 hadoop hdfs 报告查看
- 掌握 hadoop 节点状态查看
- 掌握停止 hadoop 进程操作
1.2 实验要求
- 熟悉如何查看 hadoop 的运行状态
- 熟悉停止 hadoop 进程的操作
1.3 实验过程
1.3.1 实验任务一:配置 Hadoop 格式化
1.3.1.1 步骤一:NameNode 格式化
将 NameNode 上的数据清零,第一次启动 HDFS 时要进行格式化,以后启动无需再格式化,否则会缺失 DataNode 进程。另外,只要运行过 HDFS,Hadoop 的工作目录(本书设置为/usr/local/src/hadoop/tmp)就会有数据,如果需要重新格式化,则在格式化之前一定要先删除工作目录下的数据,否则格式化时会出问题。
执行如下命令,格式化 NameNode
[root@master ~]# su - hadoop
Last login: Fri Apr 1 23:34:46 CST 2022 on pts/1
[hadoop@master ~]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ ./bin/hdfs namenode -format
22/04/02 01:22:42 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
1.3.1.2 步骤二:启动 NameNode
[hadoop@master hadoop]$ hadoop-daemon.sh start namenode
namenode running as process 11868. Stop it first.
1.3.2 实验任务二:查看 Java 进程
启动完成后,可以使用 JPS 命令查看是否成功。JPS 命令是 Java 提供的一个显示当前所有 Java 进程 pid 的命令。
[hadoop@master hadoop]$ jps
12122 Jps
11868 NameNode
1.3.2.1 步骤一:切换到Hadoop用户
[hadoop@master ~]$ su - hadoop
Password:
Last login: Sat Apr 2 01:22:13 CST 2022 on pts/1
Last failed login: Sat Apr 2 04:47:08 CST 2022 on pts/1
There was 1 failed login attempt since the last successful login.
1.3.3 实验任务三:查看 HDFS 的报告
[hadoop@master ~]$ hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
1.3.3.1 步骤一:生成密钥
[hadoop@master ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:nW/cVxmRp5Ht9TKGT61OmGbhQtkBdpHyS5prGhx24pI hadoop@master.example.com
The key's randomart image is:
+---[RSA 2048]----+
| o.oo +.|
| ...o o.=|
| = o *+|
| .o.* * *|
|S.+= O =.|
| = ++oB.+ .|
| E + =+o. .|
| . .o. .. |
|.o |
+----[SHA256]-----+
[hadoop@master ~]$ ssh-copy-id slave1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave1 (10.10.10.129)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'slave1'"
and check to make sure that only the key(s) you wanted were added.
[hadoop@master ~]$ ssh-copy-id slave2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (10.10.10.130)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'slave2'"
and check to make sure that only the key(s) you wanted were added.
[hadoop@master ~]$ ssh-copy-id master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'master (10.10.10.128)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@master's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'master'"
and check to make sure that only the key(s) you wanted were added.
1.3.4 实验任务四:停止dfs.sh
[hadoop@master ~]$ stop-dfs.sh
Stopping namenodes on [master]
master: stopping namenode
10.10.10.129: no datanode to stop
10.10.10.130: no datanode to stop
Stopping secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: no secondarynamenode to stop
1.3.4.1 重启并验证
[hadoop@master ~]$ start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out
[hadoop@master ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.example.com.out
10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
[hadoop@master ~]$ jps
12934 NameNode
13546 Jps
13131 SecondaryNameNode
13291 ResourceManager
如果在master上看到ResourceManager,并且在slave上看到NodeManager就表示成功
[hadoop@master ~]$ jps
12934 NameNode
13546 Jps
13131 SecondaryNameNode
13291 ResourceManager
[root@slave1 ~]# jps
11906 NodeManager
11797 DataNode
12037 Jps
[root@slave2 ~]# jps
12758 NodeManager
12648 DataNode
12889 Jps
[hadoop@master ~]$ hdfs dfs -mkdir /input
[hadoop@master ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2022-04-02 05:18 /input
[hadoop@master ~]$ mkdir ~/input
[hadoop@master ~]$ vim ~/input/data.txt
Hello World
Hello Hadoop
Hello Huasan
~
[hadoop@master ~]$ hdfs dfs -put ~/input/data.txt
.bash_logout .bashrc .oracle_jre_usage/ .viminfo
.bash_profile input/ .ssh/
[hadoop@master ~]$ hdfs dfs -put ~/input/data.txt /input
[hadoop@master ~]$ hdfs dfs -cat /input/data.txt
Hello World
Hello Hadoop
Hello Huasan
[hadoop@master ~]$ hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
22/04/02 05:31:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/02 05:31:21 INFO input.FileInputFormat: Total input paths to process : 1
22/04/02 05:31:21 INFO mapreduce.JobSubmitter: number of splits:1
22/04/02 05:31:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1648846845675_0001
22/04/02 05:31:22 INFO impl.YarnClientImpl: Submitted application application_1648846845675_0001
22/04/02 05:31:22 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1648846845675_0001/
22/04/02 05:31:22 INFO mapreduce.Job: Running job: job_1648846845675_0001
22/04/02 05:31:30 INFO mapreduce.Job: Job job_1648846845675_0001 running in uber mode : false
22/04/02 05:31:30 INFO mapreduce.Job: map 0% reduce 0%
22/04/02 05:31:38 INFO mapreduce.Job: map 100% reduce 0%
22/04/02 05:31:42 INFO mapreduce.Job: map 100% reduce 100%
22/04/02 05:31:42 INFO mapreduce.Job: Job job_1648846845675_0001 completed successfully
22/04/02 05:31:42 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=56
FILE: Number of bytes written=230931
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=136
HDFS: Number of bytes written=34
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5501
Total time spent by all reduces in occupied slots (ms)=1621
Total time spent by all map tasks (ms)=5501
Total time spent by all reduce tasks (ms)=1621
Total vcore-seconds taken by all map tasks=5501
Total vcore-seconds taken by all reduce tasks=1621
Total megabyte-seconds taken by all map tasks=5633024
Total megabyte-seconds taken by all reduce tasks=1659904
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=62
Map output materialized bytes=56
Input split bytes=98
Combine input records=6
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=572
CPU time spent (ms)=1860
Physical memory (bytes) snapshot=428474368
Virtual memory (bytes) snapshot=4219695104
Total committed heap usage (bytes)=284164096
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=34
[hadoop@master ~]$ hdfs dfs -cat /output/part-r-00000
Hadoop 1
Hello 3
Huasan 1
World 1
第6章 Hive组建安装配置
实验一:Hive 组件安装配置
1.1. 实验目的
完成本实验,您应该能够:
- 掌握Hive 组件安装配置
- 掌握Hive 组件格式化和启动
1.2. 实验要求
- 熟悉Hive 组件安装配置
- 了解Hive 组件格式化和启动
1.3. 实验过程
1.3.1. 实验任务一:下载和解压安装文件
1.3.1.1. 步骤一:基础环境和安装准备
Hive 组件需要基于Hadoop 系统进行安装。因此,在安装 Hive 组件前,需要确保 Hadoop 系统能够正常运行。本章节内容是基于之前已部署完毕的 Hadoop 全分布系统,在 master 节点上实现 Hive 组件安装。
Hive 组件的部署规划和软件包路径如下:
(1)当前环境中已安装 Hadoop 全分布系统。
(2)本地安装 MySQL 数据库(账号 root,密码 Password123$), 软件包在/opt/software/mysql-5.7.18 路径下。
(3)MySQL 端口号(3306)。
(4)MySQL 的 JDBC 驱动包/opt/software/mysql-connector-java-5.1.47.jar, 在此基础上更新 Hive 元数据存储。
(5)Hive 软件包/opt/software/apache-hive-2.0.0-bin.tar.gz。
1.3.1.2. 步骤二:解压安装文件
(1)使用 root 用户,将 Hive 安装包
/opt/software/apache-hive-2.0.0-bin.tar.gz 路解压到/usr/local/src 路径下。
[root@master ~]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src/
(2)将解压后的 apache-hive-2.0.0-bin 文件夹更名为 hive;
[root@master ~]# mv /usr/local/src/apache-hive-2.0.0-bin/ /usr/local/src/hive/
(3)修改 hive 目录归属用户和用户组为 hadoop
[root@master ~]# chown -R hadoop:hadoop /usr/local/src/hive
1.3.2. 实验任务二:设置 Hive 环境
1.3.2.1. 步骤一:卸载MariaDB 数据库
Hive 元数据存储在 MySQL 数据库中,因此在部署 Hive 组件前需要首先在 Linux 系统下安装 MySQL 数据库,并进行 MySQL 字符集、安全初始化、远程访问权限等相关配置。需要使用 root 用户登录,执行如下操作步骤:
(1)关闭 Linux 系统防火墙,并将防火墙设定为系统开机并不自动启动。
[root@master ~]# systemctl stop firewalld
[root@master ~]# systemctl disable firewalld
(2)卸载 Linux 系统自带的 MariaDB。
-
首先查看 Linux 系统中 MariaDB 的安装情况。
[root@master ~]# rpm -qa | grep mariadb
2)卸载 MariaDB 软件包。
我这里没有就不需要卸载
1.3.2.2. 步骤二:安装MySQL 数据库
(1)按如下顺序依次按照 MySQL 数据库的 mysql common、mysql libs、mysql client 软件包。
[root@master ~]# cd /opt/software/mysql-5.7.18/
[root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-common-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
package mysql-community-common-5.7.18-1.el7.x86_64 is already installed
[root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
package mysql-community-libs-5.7.18-1.el7.x86_64 is already installed
[root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-client-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
package mysql-community-client-5.7.18-1.el7.x86_64 is already installed
(2)安装 mysql server 软件包。
[root@master mysql-5.7.18]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-server-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
package mysql-community-server-5.7.18-1.el7.x86_64 is already installed
(3)修改 MySQL 数据库配置,在/etc/my.cnf 文件中添加如表 6-1 所示的 MySQL 数据库配置项。
将以下配置信息添加到/etc/my.cnf 文件 symbolic-links=0 配置信息的下方。
default-storage-engine=innodb
innodb_file_per_table
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'
character-set-server=utf8
(4)启动 MySQL 数据库。
[root@master ~]# systemctl start mysqld
(5)查询 MySQL 数据库状态。mysqld 进程状态为 active (running),则表示 MySQL 数据库正常运行。
如果 mysqld 进程状态为 failed,则表示 MySQL 数据库启动异常。此时需要排查/etc/my.cnf 文件。
[root@master ~]# systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2022-04-10 22:54:39 CST; 1h 0min ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Main PID: 929 (mysqld)
CGroup: /system.slice/mysqld.service
└─929 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/my...
Apr 10 22:54:35 master systemd[1]: Starting MySQL Server...
Apr 10 22:54:39 master systemd[1]: Started MySQL Server.
(6)查询 MySQL 数据库默认密码。
[root@master ~]# cat /var/log/mysqld.log | grep password
2022-04-08T16:20:04.456271Z 1 [Note] A temporary password is generated for root@localhost: 0yf>>yWdMd8_
MySQL 数据库是安装后随机生成的,所以每次安装后生成的默认密码不相同。
(7)MySQL 数据库初始化。 0yf>>yWdMd8_
执行 mysql_secure_installation 命令初始化 MySQL 数据库,初始化过程中需要设定数据库 root 用户登录密码,密码需符合安全规则,包括大小写字符、数字和特殊符号, 可设定密码为 Password123$。
在进行 MySQL 数据库初始化过程中会出现以下交互确认信息:
1)Change the password for root ? ((Press y|Y for Yes, any other key for No)表示是否更改 root 用户密码,在键盘输入 y 和回车。
2)Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No)表示是否使用设定的密码继续,在键盘输入 y 和回车。
3)Remove anonymous users? (Press y|Y for Yes, any other key for No)表示是否删除匿名用户,在键盘输入 y 和回车。
4)Disallow root login remotely? (Press y|Y for Yes, any other key for No) 表示是否拒绝 root 用户远程登录,在键盘输入 n 和回车,表示允许 root 用户远程登录。
5)Remove test database and access to it? (Press y|Y for Yes, any other key for No)表示是否删除测试数据库,在键盘输入 y 和回车。
6)Reload privilege tables now? (Press y|Y for Yes, any other key for No) 表示是否重新加载授权表,在键盘输入 y 和回车。
mysql_secure_installation 命令执行过程如下:
[root@master ~]# mysql_secure_installation
Securing the MySQL server deployment.
Enter password for user root:
The 'validate_password' plugin is installed on the server.
The subsequent steps will run with the existing configuration
of the plugin.
Using existing password for root.
Estimated strength of the password: 100
Change the password for root ? ((Press y|Y for Yes, any other key for No) : y
New password:
Re-enter new password:
Estimated strength of the password: 100
Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.
Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
Success.
Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.
Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n
... skipping.
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.
Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
- Dropping test database...
Success.
- Removing privileges on test database...
Success.
Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.
Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
Success.
All done!
(7) 添加 root 用户从本地和远程访问 MySQL 数据库表单的授权。
[root@master ~]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.18 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> grant all privileges on *.* to root@'localhost' identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> grant all privileges on *.* to root@'%' identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> select user,host from mysql.user where user='root';
+------+-----------+
| user | host |
+------+-----------+
| root | % |
| root | localhost |
+------+-----------+
2 rows in set (0.00 sec)
mysql> exit;
Bye
1.3.2.3. 步骤三:配置 Hive 组件
(1)设置 Hive 环境变量并使其生效。
[root@master ~]# vim /etc/profile
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin
[root@master ~]# source /etc/profile
(2)修改 Hive 组件配置文件。
切换到 hadoop 用户执行以下对 Hive 组件的配置操作。
将/usr/local/src/hive/conf 文件夹下 hive-default.xml.template 文件,更名为hive-site.xml。
[root@master ~]# su - hadoop
Last login: Sun Apr 10 23:27:25 CS
[hadoop@master ~]$ cp /usr/local/src/hive/conf/hive-default.xml.template /usr/local/src/hive/conf/hive-site.xml
(3)通过 vi 编辑器修改 hive-site.xml 文件实现 Hive 连接 MySQL 数据库,并设定Hive 临时文件存储路径。
[hadoop@master ~]$ vi /usr/local/src/hive/conf/hive-site.xml
1)设置 MySQL 数据库连接。
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us eSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
2)配置 MySQL 数据库 root 的密码。
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Password123$</value>
<description>password to use against s database</description>
</property>
3)验证元数据存储版本一致性。若默认 false,则不用修改。
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
4)配置数据库驱动。
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
5)配置数据库用户名 javax.jdo.option.ConnectionUserName 为 root。
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
6 )将以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替换为“/usr/local/src/hive/tmp”目录及其子目录。
需要替换以下 4 处配置内容:
<name>hive.querylog.location</name>
<value>/usr/local/src/hive/tmp</value>
<description>Location of Hive run time structured log file</description>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/src/hive/tmp</value>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/src/hive/tmp/resources</value>
<name>hive.server2.logging.operation.log.location</name>
<value>/usr/local/src/hive/tmp/operation_logs</value>
7)在Hive安装目录中创建临时文件夹 tmp。
[hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
至此,Hive 组件安装和配置完成。
1.3.2.4. 步骤四:初始化 hive 元数据
1)将 MySQL 数据库驱动(/opt/software/mysql-connector-java-5.1.46.jar)拷贝到Hive 安装目录的 lib 下;
[hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
2)重新启动 hadooop 即可
[hadoop@master ~]$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
10.10.10.129: stopping datanode
10.10.10.130: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
10.10.10.129: stopping nodemanager
10.10.10.130: stopping nodemanager
no proxyserver to stop
[hadoop@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
3)初始化数据库
[hadoop@master ~]$ schematool -initSchema -dbType mysql
which: no hbase in (/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/src/hive/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us eSSL=false
Metastore Connection Driver :com.mysql.jdbc.Driver
Metastore connection User: root
Mon Apr 11 00:46:32 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.mysql.sql
Password123$
Password123$
No current connection
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
4)启动 hive
[hadoop@master hive]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
第7章 ZooKeeper组件安装配置
实验一:ZooKeeper 组件安装配置
1.1.实验目的
完成本实验,您应该能够:
- 掌握下载和安装 ZooKeeper
- 掌握 ZooKeeper 的配置选项
- 掌握启动 ZooKeeper
1.2.实验要求
- 了解 ZooKeeper 的配置选项
- 熟悉启动 ZooKeeper
1.3.实验过程
1.3.1 实验任务一:配置时间同步
[root@master ~]# yum -y install chrony
[root@master ~]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst
[root@master ~]# systemctl restart chronyd.service
[root@master ~]# systemctl enable chronyd.service
[root@master ~]# date
Fri Apr 15 15:40:14 CST 2022
[root@slave1 ~]# yum -y install chrony
[root@slave1 ~]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst
[root@slave1 ~]# systemctl restart chronyd.service
[root@slave1 ~]# systemctl enable chronyd.service
[root@slave1 ~]# date
Fri Apr 15 15:40:17 CST 2022
[root@slave2 ~]# yum -y install chrony
[root@slave2 ~]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst
[root@slave2 ~]# systemctl restart chronyd.service
[root@slave2 ~]# systemctl enable chronyd.service
[root@slave2 ~]# date
Fri Apr 15 15:40:20 CST 2022
1.3.2 实验任务二:下载和安装 ZooKeeper
ZooKeeper最新的版本可以通过官网http://hadoop.apache.org/zookeeper/来获取,安装 ZooKeeper 组件需要与 Hadoop 环境适配。
注意,各节点的防火墙需要关闭,否则会出现连接问题。
1.ZooKeeper 的安装包 zookeeper-3.4.8.tar.gz 已放置在 Linux系统/opt/software
目录下。
2.解压安装包到指定目标,在 Master 节点执行如下命令。
[root@master ~]# tar xf /opt/software/zookeeper-3.4.8.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv zookeeper-3.4.8/ zookeeper
1.3.3 实验任务三:ZooKeeper的配置选项
1.3.3.1 步骤一:Master节点配置
(1)在 ZooKeeper 的安装目录下创建 data 和 logs 文件夹。
[root@master src]# cd /usr/local/src/zookeeper/
[root@master zookeeper]# mkdir data logs
(2)在每个节点写入该节点的标识编号,每个节点编号不同,master节点写入 1,slave1 节点写入2,slave2 节点写入3。
[root@master zookeeper]# echo '1' > /usr/local/src/zookeeper/data/myid
(3)修改配置文件 zoo.cfg
[root@master zookeeper]# cd /usr/local/src/zookeeper/conf/
[root@master conf]# cp zoo_sample.cfg zoo.cfg
修改 dataDir 参数内容如下:
[root@master conf]# vi zoo.cfg
dataDir=/usr/local/src/zookeeper/data
(4)在 zoo.cfg 文件末尾追加以下参数配置,表示三个 ZooKeeper 节点的访问端口号。
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
(5)修改ZooKeeper安装目录的归属用户为 hadoop 用户。
[root@master conf]# chown -R hadoop:hadoop /usr/local/src/
1.3.3.2 步骤二:Slave 节点配置
(1)从 Master 节点复制 ZooKeeper 安装目录到两个 Slave 节点。
[root@master ~]# scp -r /usr/local/src/zookeeper node1:/usr/local/src/
[root@master ~]# scp -r /usr/local/src/zookeeper node2:/usr/local/src/
(2)在slave1节点上修改 zookeeper 目录的归属用户为 hadoop 用户。
[root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/
[root@slave1 ~]# ll /usr/local/src/
total 4
drwxr-xr-x. 12 hadoop hadoop 183 Apr 2 18:11 hadoop
drwxr-xr-x 9 hadoop hadoop 183 Apr 15 16:37 hbase
drwxr-xr-x. 8 hadoop hadoop 255 Apr 2 18:06 jdk
drwxr-xr-x 12 hadoop hadoop 4096 Apr 22 15:31 zookeeper
(3)在slave1节点上配置该节点的myid为2。
[root@slave1 ~]# echo 2 > /usr/local/src/zookeeper/data/myid
(4)在slave2节点上修改 zookeeper 目录的归属用户为 hadoop 用户。
[root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/
(5)在slave2节点上配置该节点的myid为3。
[root@slave2 ~]# echo 3 > /usr/local/src/zookeeper/data/myid
1.3.3.3 步骤三:系统环境变量配置
在 master、slave1、slave2 三个节点增加环境变量配置。
[root@master conf]# vi /etc/profile.d/zookeeper.sh
export ZOOKEEPER_HOME=/usr/local/src/zookeeper
export PATH=${ZOOKEEPER_HOME}/bin:$PATH
[root@master ~]# scp /etc/profile.d/zookeeper.sh node1:/etc/profile.d/
zookeeper.sh 100% 8742.3KB/s 00:00
[root@master ~]# scp /etc/profile.d/zookeeper.sh node2:/etc/profile.d/
zookeeper.sh 100% 8750.8KB/s 00:00
1.3.4 实验任务四:启动 ZooKeeper
启动ZooKeeper需要使用Hadoop用户进行操作。
(1)分别在 master、slave1、slave2 三个节点使用 zkServer.sh start 命令启动ZooKeeper。
[root@master ~]# su - hadoop
Last login: Fri Apr 15 21:54:17 CST 2022 on pts/0
[hadoop@master ~]$ jps
3922 Jps
[hadoop@master ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@master ~]$ jps
3969 Jps
3950 QuorumPeerMain
[root@slave1 ~]# su - hadoop
Last login: Fri Apr 15 22:06:47 CST 2022 on pts/0
[hadoop@slave1 ~]$ jps
1370 Jps
[hadoop@slave1 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@slave1 ~]$ jps
1395 QuorumPeerMain
1421 Jps
[root@slave2 ~]# su - hadoop
Last login: Fri Apr 15 16:25:52 CST 2022 on pts/1
[hadoop@slave2 ~]$ jps
1336 Jps
[hadoop@slave2 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@slave2 ~]$ jps
1361 QuorumPeerMain
1387 Jps
(2)三个节点都启动完成后,再统一查看 ZooKeeper 运行状态。
[hadoop@master conf]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[hadoop@slave1 ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[hadoop@slave2 conf]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower
第8章 HBase组件安装配置
实验一:HBase 组件安装与配置
1.1实验目的
完成本实验,您应该能够:
-
掌握HBase 安装与配置
-
掌握HBase 常用 Shell 命令
1.2实验要求
-
了解HBase 原理
-
熟悉HBase 常用 Shell 命令
1.3实验过程
1.3.1 实验任务一:配置时间同步
[root@master ~]# yum -y install chrony
[root@master ~]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst
[root@master ~]# systemctl restart chronyd.service
[root@master ~]# systemctl enable chronyd.service
[root@master ~]# date
Fri Apr 15 15:40:14 CST 2022
[root@slave1 ~]# yum -y install chrony
[root@slave1 ~]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst
[root@slave1 ~]# systemctl restart chronyd.service
[root@slave1 ~]# systemctl enable chronyd.service
[root@slave1 ~]# date
Fri Apr 15 15:40:17 CST 2022
[root@slave2 ~]# yum -y install chrony
[root@slave2 ~]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst
[root@slave2 ~]# systemctl restart chronyd.service
[root@slave2 ~]# systemctl enable chronyd.service
[root@slave2 ~]# date
Fri Apr 15 15:40:20 CST 2022
1.3.2 实验任务二:HBase 安装与配置
1.3.2.1 步骤一:解压缩 HBase 安装包
[root@master ~]# tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local/src/
1.3.2.2 步骤二:重命名 HBase 安装文件夹
[root@master ~]# cd /usr/local/src/
[root@master src]# mv hbase-1.2.1 hbase
1.3.2.3 步骤三:在所有节点添加环境变量
[root@master ~]# cat /etc/profile
# set hbase environment
export HBASE_HOME=/usr/local/src/hbase
export PATH=$HBASE_HOME/bin:$PATH
[root@slave1 ~]# cat /etc/profile
# set hbase environment
export HBASE_HOME=/usr/local/src/hbase
export PATH=$HBASE_HOME/bin:$PATH
[root@slave2 ~]# cat /etc/profile
# set hbase environment
export HBASE_HOME=/usr/local/src/hbase
export PATH=$HBASE_HOME/bin:$PATH
1.3.2.4 步骤四:在所有节点使环境变量生效
[root@master ~]# source /etc/profile
[root@master ~]# echo $PATH
/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/src/hive/bin:/root/bin:/usr/local/src/hive/bin:/usr/local/src/hive/bin
[root@slave1 ~]# source /etc/profile
[root@slave1 ~]# echo $PATH
/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
[root@slave2 ~]# source /etc/profile
[root@slave2 ~]# echo $PATH
/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
1.3.2.5 步骤五:在 master 节点进入配置文件目录
[root@master ~]# cd /usr/local/src/hbase/conf/
1.3.2.6 步骤六:在 master 节点配置 hbase-env.sh 文件
[root@master conf]# cat hbase-env.sh
export JAVA_HOME=/usr/local/src/jdk
export HBASE_MANAGES_ZK=true
export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/
1.3.2.7 步骤七:在 master 节点配置 hbase-site.xml
[root@master conf]# cat hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,node1,node2</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/src/hbase/tmp</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
1.3.2.8 步骤八:在master节点修改 regionservers 文件
[root@master conf]# cat regionservers
node1
node2
1.3.2.9 步骤九:在master节点创建 hbase.tmp.dir 目录
[root@master ~]# mkdir /usr/local/src/hbase/tmp
1.3.2.10 步骤十:将master上的hbase安装文件同步到 node1 node2
[root@master ~]# scp -r /usr/local/src/hbase/ root@node1:/usr/local/src/
[root@master ~]# scp -r /usr/local/src/hbase/ root@node2:/usr/local/src/
1.3.2.11 步骤十一:在所有节点修改 hbase 目录权限
[root@master ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
[root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
[root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
1.3.2.12 步骤十二:在所有节点切换到hadoop用户
[root@master ~]# su - hadoop
Last login: Mon Apr 11 00:42:46 CST 2022 on pts/0
[root@slave1 ~]# su - hadoop
Last login: Fri Apr 8 22:57:42 CST 2022 on pts/0
[root@slave2 ~]# su - hadoop
Last login: Fri Apr 8 22:57:54 CST 2022 on pts/0
1.3.2.13 步骤十三:启动 HBase
先启动 Hadoop,然后启动 ZooKeeper,最后启动 HBase。
[hadoop@master ~]$ start-all.sh
[hadoop@master ~]$ jps
2130 SecondaryNameNode
1927 NameNode
2554 Jps
2301 ResourceManager
[hadoop@slave1 ~]$ jps
1845 NodeManager
1977 Jps
1725 DataNode
[hadoop@slave2 ~]$ jps
2080 Jps
1829 DataNode
1948 NodeManager
1.3.2.14 步骤十四:在 master节点启动HBase
[hadoop@master conf]$ start-hbase.sh
[hadoop@master conf]$ jps
2130 SecondaryNameNode
3572 HQuorumPeer
1927 NameNode
5932 HMaster
2301 ResourceManager
6157 Jps
[hadoop@slave1 ~]$ jps
2724 Jps
1845 NodeManager
1725 DataNode
2399 HQuorumPeer
2527 HRegionServer
[root@slave2 ~]# jps
3795 Jps
1829 DataNode
3529 HRegionServer
1948 NodeManager
3388 HQuorumPeer
1.3.2.15 步骤十五:修改windows上的hosts文件
(C:\Windows\System32\drivers\etc\hosts)
把hots文件拖到桌面上,然后编辑它加入master的主机名与P地址的映射关系后在浏览器上输入http//:master:60010访问hbase的web界面
1.3.3 实验任务三:HBase常用Shell命令
1.3.3.1 步骤一:进入 HBase 命令行
[hadoop@master ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
hbase(main):001:0>
1.3.3.2 步骤二:建立表 scores,两个列簇:grade 和 course
hbase(main):001:0> create 'scores','grade','course'
0 row(s) in 1.4400 seconds
=> Hbase::Table - scores
1.3.3.3 步骤三:查看数据库状态
hbase(main):002:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.5000 average load
1.3.3.4 步骤四:查看数据库版本
hbase(main):003:0> version
1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
1.3.3.5 步骤五:查看表
hbase(main):004:0> list
TABLE
scores
1 row(s) in 0.0150 seconds
=> ["scores"]
1.3.3.6 步骤六:插入记录 1:jie,grade: 143cloud
hbase(main):005:0> put 'scores','jie','grade:','146cloud'
0 row(s) in 0.1060 seconds
1.3.3.7 步骤七:插入记录 2:jie,course:math,86
hbase(main):006:0> put 'scores','jie','course:math','86'
0 row(s) in 0.0120 seconds
1.3.3.8 步骤八:插入记录 3:jie,course:cloud,92
hbase(main):009:0> put 'scores','jie','course:cloud','92'
0 row(s) in 0.0070 seconds
1.3.3.9 步骤九:插入记录 4:shi,grade:133soft
hbase(main):010:0> put 'scores','shi','grade:','133soft'
0 row(s) in 0.0120 seconds
1.3.3.10 步骤十:插入记录 5:shi,grade:math,87
hbase(main):011:0> put 'scores','shi','course:math','87'
0 row(s) in 0.0090 seconds
1.3.3.11 步骤十一:插入记录 6:shi,grade:cloud,96
hbase(main):012:0> put 'scores','shi','course:cloud','96'
0 row(s) in 0.0100 seconds
1.3.3.12 步骤十二:读取 jie 的记录
hbase(main):013:0> get 'scores','jie'
COLUMN CELL
course:cloud timestamp=1650015032132, value=92
course:mathtimestamp=1650014925177, value=86
grade: timestamp=1650014896056, value=146cloud
3 row(s) in 0.0250 seconds
1.3.3.13 步骤十三:读取 jie 的班级
hbase(main):014:0> get 'scores','jie','grade'
COLUMN CELL
grade: timestamp=1650014896056, value=146cloud
1 row(s) in 0.0110 seconds
1.3.3.14 步骤十四:查看整个表记录
hbase(main):001:0> scan 'scores'
ROW COLUMN+CELL
jie column=course:cloud, timestamp=1650015032132, value=92
jie column=course:math, timestamp=1650014925177, value=86
jie column=grade:, timestamp=1650014896056, value=146cloud
shi column=course:cloud, timestamp=1650015240873, value=96
shi column=course:math, timestamp=1650015183521, value=87
2 row(s) in 0.1490 seconds
1.3.3.15 步骤十五:按例查看表记录
hbase(main):002:0> scan 'scores',{COLUMNS=>'course'}
ROW COLUMN+CELL
jie column=course:cloud, timestamp=1650015032132, value=92
jie column=course:math, timestamp=1650014925177, value=86
shi column=course:cloud, timestamp=1650015240873, value=96
shi column=course:math, timestamp=1650015183521, value=87
2 row(s) in 0.0160 seconds
1.3.3.16 步骤十六:删除指定记录shell
hbase(main):003:0> delete 'scores','shi','grade'
0 row(s) in 0.0560 seconds
1.3.3.17 步骤十七:删除后,执行scan 命令
hbase(main):004:0> scan 'scores'
ROW COLUMN+CELL
jie column=course:cloud, timestamp=1650015032132, value=92
jie column=course:math, timestamp=1650014925177, value=86
jie column=grade:, timestamp=1650014896056, value=146cloud
shi column=course:cloud, timestamp=1650015240873, value=96
shi column=course:math, timestamp=1650015183521, value=87
2 row(s) in 0.0130 seconds
1.3.3.18 步骤十八:增加新的列簇
hbase(main):005:0> alter 'scores',NAME=>'age'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.0110 seconds
1.3.3.19 步骤十九:查看表结构
hbase(main):006:0> describe 'scores'
Table scores is ENABLED
scores
COLUMN FAMILIES DESCRIPTION
{NAME => 'age', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', C
OMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER'
, COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'grade', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER',
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
3 row(s) in 0.0230 seconds
1.3.3.20 步骤二十:删除列簇
hbase(main):007:0> alter 'scores',NAME=>'age',METHOD=>'delete'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.1990 seconds
1.3.3.21 步骤二十一:删除表
hbase(main):008:0> disable 'scores'
0 row(s) in 2.3190 seconds
1.3.3.22 步骤二十二:退出
hbase(main):009:0> quit
1.3.3.23 步骤二十三:关闭 HBase
[hadoop@master ~]$ stop-hbase.sh
stopping hbase.................
master: stopping zookeeper.
node2: stopping zookeeper.
node1: stopping zookeeper.
在 master 节点关闭 Hadoop。
[hadoop@master ~]$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
10.10.10.130: stopping datanode
10.10.10.129: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
10.10.10.129: stopping nodemanager
10.10.10.130: stopping nodemanager
no proxyserver to stop
[hadoop@master ~]$ jps
3820 Jps
[hadoop@slave1 ~]$ jps
2220 Jps
[root@slave2 ~]# jps
2082 Jps
完结,撒花
附件:
第9章 Sqoop组件安装配置
实验一:Sqoop 组件安装与配置
1.1.实验目的
完成本实验,您应该能够:
-
下载和解压 Sqoop
-
配置Sqoop 环境
-
安装Sqoop
-
Sqoop 模板命令
1.2.实验要求
-
熟悉Sqoop 环境
-
熟悉Sqoop 模板命令
1.3.实验过程
1.3.1.实验任务一:下载和解压 Sqoop
安装Sqoop 组件需要与Hadoop 环境适配。使用 root 用户在Master 节点上进行部署, 将 /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 压 缩 包 解 压 到/usr/local/src 目录下。
[root@master ~]# tar xf /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
将解压后生成的 sqoop-1.4.7.bin hadoop-2.6.0 文件夹更名为 sqoop。
[root@master ~]# cd /usr/local/src/
[root@master src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop
1.3.2.实验任务二:配置 Sqoop 环境
1.3.2.1.步骤一:创建 Sqoop 的配置文件 sqoop-env.sh。
复制 sqoop-env-template.sh 模板,并将模板重命名为 sqoop-env.sh。
[root@master src]# cd /usr/local/src/sqoop/conf/
[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh
1.3.2.2.步骤二:修改 sqoop-env.sh 文件,添加 Hdoop、Hbase、Hive 等组件的安装路径。
注意,下面各组件的安装路径需要与实际环境中的安装路径保持一致。
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
export HBASE_HOME=/usr/local/src/hbase
export HIVE_HOME=/usr/local/src/hive
1.3.2.3.步骤三:配置 Linux 系统环境变量,添加 Sqoop 组件的路径。
vim /etc/profile.d/sqoop.sh
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
[root@master conf]# source /etc/profile.d/sqoop.sh
[root@master conf]# echo $PATH
/usr/local/src/sqoop/bin:/usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/src/hive/bin:/root/bin
1.3.2.4.步骤四:连接数据库
为了使 Sqoop 能够连接 MySQL 数据库,需要将/opt/software/mysql-connector-jav a-5.1.46.jar 文件放入 sqoop 的 lib 目录中。该 jar 文件的版本需要与 MySQL 数据库的版本相对应,否则 Sqoop 导入数据时会报错。(mysql-connector-java-5.1.46.jar 对应的是 MySQL 5.7 版本)若该目录没有 jar 包,则使用第 6 章导入 home 目录的jar包
[root@master conf]# cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
1.3.3.实验任务三:启动Sqoop
1.3.3.1.步骤一:执行 Sqoop 前需要先启动 Hadoop 集群。
在 master 节点切换到 hadoop 用户执行 start-all.sh 命令启动 Hadoop 集群。
[root@master conf]# su - hadoop
Last login: Fri Apr 22 16:21:25 CST 2022 on pts/0
[hadoop@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
1.3.3.2.步骤二:检查 Hadoop 集群的运行状态。
[hadoop@master ~]$ jps
1653 SecondaryNameNode
2086 Jps
1450 NameNode
1822 ResourceManager
[root@slave1 ~]# jps
1378 NodeManager
1268 DataNode
1519 Jps
[root@slave2 ~]# jps
1541 Jps
1290 DataNode
1405 NodeManager
1.3.3.3.步骤三:测试Sqoop是否能够正常连接MySQL 数据库。
Sqoop 连接 MySQL 数据库 P 大写 密码 Password123$
[hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306 --username root -P
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 15:25:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
22/04/29 15:25:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Apr 29 15:25:58 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys
1.3.3.4.步骤四:连接 hive
为了使 Sqoop 能够连接 Hive,需要将 hive 组件/usr/local/src/hive/lib 目录下的
hive-common-2.0.0.jar 也放入 Sqoop 安装路径的 lib 目录中。
[hadoop@master ~]$ cp /usr/local/src/hive/lib/hive-common-2.0.0.jar /usr/local/src/sqoop/lib/
1.3.4.实验任务四:Sqoop 模板命令
1.3.4.1.步骤一:创建MySQL数据库和数据表。
创建 sample 数据库,在 sample 中创建 student 表,在 student 表中插入了 3 条数据。
# 登录 MySQL 数据库
[hadoop@master ~]$ mysql -uroot -pPassword123$
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.18 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# 创建 sample 库
mysql> create database sample;
Query OK, 1 row affected (0.00 sec)
# 使用 sample 库
mysql> use sample;
Database changed
# 创建 student 表,该数据表有number学号和name姓名两个字段
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.01 sec)
# 向 student 表插入几条数据
mysql> insert into student values('01','zhangsan'),('02','lisi'),('03','wangwu');
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
# 查询 student 表的数据
mysql> select * from student;
+--------+----------+
| number | name |
+--------+----------+
| 01 | zhangsan |
| 02 | lisi |
| 03 | wangwu |
+--------+----------+
3 rows in set (0.00 sec)
mysql> quit
Bye
1.3.4.2.步骤二:在Hive中创建sample数据库和student数据表。
hive>
> create database sample;
OK
Time taken: 0.528 seconds
hive> use sample;
OK
Time taken: 0.019 seconds
hive> create table student(number STRING,name STRING);
OK
Time taken: 0.2 seconds
hive> exit;
[hadoop@master conf]$
1.3.4.3.步骤三:从MySQL 导出数据,导入 Hive。
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student
hive>
> select * from sample.student;
OK
01|zhangsan NULL
02|lisi NULL
03|wangwu NULL
Time taken: 1.238 seconds, Fetched: 3 row(s)
hive>
> exit;
1.3.4.4.步骤四:sqoop常用命令
#列出所有数据库
[hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 16:55:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/04/29 16:55:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/29 16:55:40 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Apr 29 16:55:40 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sample
sys
# 连接 MySQL 并列出 sample 数据库中的表
[hadoop@master ~]$ sqoop list-tables --connect "jdbc:mysql://master:3306/sample?useSSL=false" --username root --password Password123$
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 16:56:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/04/29 16:56:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/29 16:56:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
student
# 将关系型数据的表结构复制到 hive 中,只是复制表的结构,表中的内容没有复制过去
[hadoop@master ~]$ sqoop create-hive-table --connect jdbc:mysql://master:3306/sample --table student --username root --password Password123$ --hive-table test
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 16:57:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/04/29 16:57:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/29 16:57:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
22/04/29 16:57:42 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
22/04/29 16:57:42 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Apr 29 16:57:42 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/04/29 16:57:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/29 16:57:43 INFO hive.HiveImport: Loading uploaded data into Hive
22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
22/04/29 16:57:46 INFO hive.HiveImport:
22/04/29 16:57:46 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 16:57:50 INFO hive.HiveImport: OK
22/04/29 16:57:50 INFO hive.HiveImport: Time taken: 0.853 seconds
22/04/29 16:57:51 INFO hive.HiveImport: Hive import complete.
# 如果执行以上命令之后显示hive.HiveImport: Hive import complete.则表示成功
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table test
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 17:00:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/04/29 17:00:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/29 17:00:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/29 17:00:06 INFO tool.CodeGenTool: Beginning code generation
Fri Apr 29 17:00:06 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/04/29 17:00:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/04/29 17:00:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/src/hadoop
Note: /tmp/sqoop-hadoop/compile/556af862aa5bc04a542c14f0741f7dc6/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/29 17:00:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/556af862aa5bc04a542c14f0741f7dc6/student.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/29 17:00:07 INFO tool.ImportTool: Destination directory student is not present, hence not deleting.
22/04/29 17:00:07 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/29 17:00:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/29 17:00:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/29 17:00:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/29 17:00:07 INFO mapreduce.ImportJobBase: Beginning import of student
22/04/29 17:00:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/29 17:00:07 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/29 17:00:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Fri Apr 29 17:00:09 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:09 INFO db.DBInputFormat: Using read commited transaction isolation
22/04/29 17:00:09 INFO mapreduce.JobSubmitter: number of splits:1
22/04/29 17:00:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1651221174197_0003
22/04/29 17:00:09 INFO impl.YarnClientImpl: Submitted application application_1651221174197_0003
22/04/29 17:00:09 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1651221174197_0003/
22/04/29 17:00:09 INFO mapreduce.Job: Running job: job_1651221174197_0003
22/04/29 17:00:13 INFO mapreduce.Job: Job job_1651221174197_0003 running in uber mode : false
22/04/29 17:00:13 INFO mapreduce.Job: map 0% reduce 0%
22/04/29 17:00:17 INFO mapreduce.Job: map 100% reduce 0%
22/04/29 17:00:17 INFO mapreduce.Job: Job job_1651221174197_0003 completed successfully
22/04/29 17:00:17 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=134261
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=30
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=1731
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1731
Total vcore-seconds taken by all map tasks=1731
Total megabyte-seconds taken by all map tasks=1772544
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=35
CPU time spent (ms)=1010
Physical memory (bytes) snapshot=179433472
Virtual memory (bytes) snapshot=2137202688
Total committed heap usage (bytes)=88604672
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=30
22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 9.8777 seconds (3.0371 bytes/sec)
22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Retrieved 3 records.
22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table student
Fri Apr 29 17:00:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:17 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/04/29 17:00:17 INFO hive.HiveImport: Loading uploaded data into Hive
22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
22/04/29 17:00:20 INFO hive.HiveImport:
22/04/29 17:00:20 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:00:24 INFO hive.HiveImport: OK
22/04/29 17:00:24 INFO hive.HiveImport: Time taken: 0.713 seconds
22/04/29 17:00:24 INFO hive.HiveImport: Loading data to table default.test
22/04/29 17:00:25 INFO hive.HiveImport: OK
22/04/29 17:00:25 INFO hive.HiveImport: Time taken: 0.42 seconds
22/04/29 17:00:25 INFO hive.HiveImport: Hive import complete.
22/04/29 17:00:25 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
hive> show tables;
OK
test
Time taken: 0.558 seconds, Fetched: 1 row(s)
hive> exit;
# 从mysql中导出表内容到HDFS文件中
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --num-mappers 1 --target-dir /user/test
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/29 17:03:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/04/29 17:03:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/29 17:03:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/29 17:03:13 INFO tool.CodeGenTool: Beginning code generation
Fri Apr 29 17:03:14 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:03:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/04/29 17:03:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/04/29 17:03:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/src/hadoop
Note: /tmp/sqoop-hadoop/compile/eab748b8f3fb956072f4877fdf4bf23a/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/29 17:03:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/eab748b8f3fb956072f4877fdf4bf23a/student.jar
22/04/29 17:03:15 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/29 17:03:15 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/29 17:03:15 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/29 17:03:15 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/29 17:03:15 INFO mapreduce.ImportJobBase: Beginning import of student
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/29 17:03:15 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/29 17:03:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/29 17:03:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Fri Apr 29 17:03:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/29 17:03:17 INFO db.DBInputFormat: Using read commited transaction isolation
22/04/29 17:03:17 INFO mapreduce.JobSubmitter: number of splits:1
22/04/29 17:03:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1651221174197_0004
22/04/29 17:03:17 INFO impl.YarnClientImpl: Submitted application application_1651221174197_0004
22/04/29 17:03:17 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1651221174197_0004/
22/04/29 17:03:17 INFO mapreduce.Job: Running job: job_1651221174197_0004
22/04/29 17:03:21 INFO mapreduce.Job: Job job_1651221174197_0004 running in uber mode : false
22/04/29 17:03:21 INFO mapreduce.Job: map 0% reduce 0%
22/04/29 17:03:25 INFO mapreduce.Job: map 100% reduce 0%
22/04/29 17:03:25 INFO mapreduce.Job: Job job_1651221174197_0004 completed successfully
22/04/29 17:03:25 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=134251
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=30
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=1945
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1945
Total vcore-seconds taken by all map tasks=1945
Total megabyte-seconds taken by all map tasks=1991680
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=69
CPU time spent (ms)=1050
Physical memory (bytes) snapshot=179068928
Virtual memory (bytes) snapshot=2136522752
Total committed heap usage (bytes)=88604672
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=30
22/04/29 17:03:25 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 10.2361 seconds (2.9308 bytes/sec)
22/04/29 17:03:25 INFO mapreduce.ImportJobBase: Retrieved 3 records.
# 执行以上命令后在浏览器上访问master_ip:50070然后点击Utilities下面的Browse the file system,要能看到user就表示成功
[hadoop@master ~]$ hdfs dfs -ls /user/test
Found 2 items
-rw-r--r-- 2 hadoop supergroup 0 2022-04-29 17:03 /user/test/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 30 2022-04-29 17:03 /user/test/part-m-00000
[hadoop@master ~]$ hdfs dfs -cat /user/test/part-m-00000
01,zhangsan
02,lisi
03,wangwu
第10章 Flume组件安装配置
实验一:Flume 组件安装配置
1.1. 实验目的
完成本实验,您应该能够:
-
掌握下载和解压 Flume
-
掌握 Flume 组件部署
-
掌握使用 Flume 发送和接受信息
1.2. 实验要求
- 了解 Flume 相关知识
- 熟悉 Flume 功能应用
- 熟悉 Flume 组件设置
1.3. 实验过程
1.3.1. 实验任务一:下载和解压 Flume
使用 root 用户解压 Flume 安装包到“/usr/local/src”路径,并修改解压后文件夹名
为 flume。
[root@master ~]# tar xf /opt/software/apache-flume-1.6.0-bin.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv apache-flume-1.6.0-bin/
flume
[root@master src]# chown -R hadoop.hadoop /usr/local/src/
1.3.2. 实验任务二:Flume 组件部署
1.3.2.1. 步骤一:使用 root 用户设置 Flume 环境变量,并使环境变量对所有用户生效。
[root@master src]# vim /etc/profile.d/flume.sh
export FLUME_HOME=/usr/local/src/flume
export PATH=${FLUME_HOME}/bin:$PATH
1.3.2.2. 步骤二:修改 Flume 相应配置文件。
首先,切换到 hadoop 用户,并切换当前工作目录到 Flume 的配置文件夹。
[hadoop@master ~]$ echo $PATH
/usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/sqoop/bin:/usr/local/src/hive/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/flume/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/src/hive/bin:/home/hadoop/.local/bin:/home/hadoop/bin
1.3.2.3. 步骤三:修改并配置 flume-env.sh 文件。
[hadoop@master ~]$ vim /usr/local/src/hbase/conf/hbase-env.sh
#export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/ #注释掉这一行的内容
export JAVA_HOME=/usr/local/src/jdk
[hadoop@master conf]$ start-all.sh
[hadoop@master ~]$ flume-ng version
Flume 1.6.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
From source with checksum b29e416802ce9ece3269d34233baf43f
1.3.3. 实验任务三:使用 Flume 发送和接受信息
通过 Flume 将 Web 服务器中数据传输到 HDFS 中。
1.3.3.1. 步骤一:在 Flume 安装目录中创建 simple-hdfs-flume.conf 文件。
[hadoop@master ~]$ cd /usr/local/src/flume/
[hadoop@master ~]$ vi /usr/local/src/flume/simple-hdfs-flume.conf
a1.sources=r1
a1.sinks=k1
a1.channels=c1
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/usr/local/src/hadoop/logs/
a1.sources.r1.fileHeader=true
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://master:9000/tmp/flume
a1.sinks.k1.hdfs.rollsize=1048760
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollInterval=900
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.channels.c1.type=file
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
1.3.3.2. 步骤二:使用 flume-ng agent 命令加载 simple-hdfs-flume.conf 配置信息,启 配置信息,启动flume 传输数据。
[hadoop@master ~]$ flume-ng agent --conf-file simple-hdfs-flume.conf --name a1
ctrl+c 退出 flume 传输
1.3.3.3. 步骤三:查看 Flume 传输到 HDFS 的文件,若能查看到 HDFS 上/tmp/flume目录有传输的数据文件,则表示数据传输成功。
[hadoop@master ~]$ hdfs dfs -ls /
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2022-04-15 22:04 /hbase
drwxr-xr-x - hadoop supergroup 0 2022-04-02 18:24 /input
drwxr-xr-x - hadoop supergroup 0 2022-04-02 18:26 /output
drwxr-xr-x - hadoop supergroup 0 2022-05-06 17:24 /tmp
drwxr-xr-x - hadoop supergroup 0 2022-04-29 17:03 /user
第13章 大数据平台监控命令
实验一:通过命令监控大数据平台运行状态
1.1. 实验目的
完成本实验,您应该能够:
-
掌握大数据平台的运行状况
-
掌握查看大数据平台运行状况的命令
1.2. 实验要求
-
熟悉查看大数据平台运行状态的方式
-
了解查看大数据平台运行状况的命令
1.3. 实验过程
1.3.1. 实验任务一:通过命令查看大数据平台状态
1.3.1.1. 步骤一: 查看 Linux 系统的信息( uname -a)
[root@master ~]# uname -a
Linux master 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
1.3.1.2. 步骤二:查看硬盘信息
(1)查看所有分区(fdisk -l)
[root@master ~]# fdisk -l
Disk /dev/sda: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x00096169
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 41943039 19921920 8e Linux LVM
Disk /dev/mapper/centos-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/centos-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
(2)查看所有交换分区(swapon -s)
[root@master ~]# swapon -s
Filename Type Size Used Priority
/dev/dm-1 partition 2097148 0 -
(3)查看文件系统占比(df -h)
[root@master ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 17G 4.8G 13G 28% /
devtmpfs 980M 0 980M 0% /dev
tmpfs 992M 0 992M 0% /dev/shm
tmpfs 992M 9.5M 982M 1% /run
tmpfs 992M 0 992M 0% /sys/fs/cgroup
/dev/sda1 1014M 130M 885M 13% /boot
tmpfs 199M 0 199M 0% /run/user/0
1.3.1.3. 步骤三: 查看网络 IP 地址( ifconfig)
[root@master ~]# ifconfig
ens32: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.10.128 netmask 255.255.255.0 broadcast 10.10.10.255
inet6 fe80::af34:1702:3972:2b64 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:2e:33:83 txqueuelen 1000 (Ethernet)
RX packets 342 bytes 29820 (29.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 257 bytes 26394 (25.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 4 bytes 360 (360.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4 bytes 360 (360.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
1.3.1.4. 步骤四:查看所有监听端口( netstat -lntp)
[root@master ~]# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 933/sshd
tcp6 0 0 :::3306 :::* LISTEN 1021/mysqld
tcp6 0 0 :::22 :::* LISTEN 933/sshd 、
1.3.1.5. 步骤五:查看所有已经建立的连接( netstat -antp)
[root@master ~]# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 933/sshd
tcp 0 52 10.10.10.128:22 10.10.10.1:59963 ESTABLISHED 1249/sshd: root@pts
tcp6 0 0 :::3306 :::* LISTEN 1021/mysqld
tcp6 0 0 :::22 :::* LISTEN 933/sshd
1.3.1.6. 步骤六:实时显示进程状态( top ),该命令可以查看进程对 CPU 、内存的占比等。
[root@master ~]# top
top - 16:09:46 up 47 min, 2 users, load average: 0.00, 0.01, 0.05
Tasks: 115 total, 1 running, 114 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 2030172 total, 1575444 free, 281296 used, 173432 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 1571928 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1021 mysql 20 0 1258940 191544 6840 S 0.3 9.4 0:01.71 mysqld
1 root 20 0 125456 3896 2560 S 0.0 0.2 0:00.96 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root rt 0 0 0 0 S 0.0 0.0 0:00.02 migration/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root 20 0 0 0 0 S 0.0 0.0 0:00.15 rcu_sched
10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain
11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
13 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H
17 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/2
18 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2
1.3.1.7. 步骤七:查看 U CPU 信息( cat /proc/cpuinfo )
[root@master ~]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
stepping : 10
microcode : 0xb4
cpu MHz : 3191.998
cache size : 12288 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
bogomips : 6383.99
clflush size : 64
cache_alignment : 64
address sizes : 45 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
stepping : 10
microcode : 0xb4
cpu MHz : 3191.998
cache size : 12288 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
bogomips : 6383.99
clflush size : 64
cache_alignment : 64
address sizes : 45 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
stepping : 10
microcode : 0xb4
cpu MHz : 3191.998
cache size : 12288 KB
physical id : 1
siblings : 2
core id : 0
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
bogomips : 6383.99
clflush size : 64
cache_alignment : 64
address sizes : 45 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
stepping : 10
microcode : 0xb4
cpu MHz : 3191.998
cache size : 12288 KB
physical id : 1
siblings : 2
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
bogomips : 6383.99
clflush size : 64
cache_alignment : 64
address sizes : 45 bits physical, 48 bits virtual
power management:
1.3.1.8. 步骤八:查看内存信息( cat /proc/meminfo ),该命令可以查看总内存、空闲内存等信息。
[root@master ~]# cat /proc/meminfo
MemTotal: 2030172 kB
MemFree: 1575448 kB
MemAvailable: 1571932 kB
Buffers: 2112 kB
Cached: 126676 kB
SwapCached: 0 kB
Active: 251708 kB
Inactive: 100540 kB
Active(anon): 223876 kB
Inactive(anon): 9252 kB
Active(file): 27832 kB
Inactive(file): 91288 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2097148 kB
SwapFree: 2097148 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 223648 kB
Mapped: 28876 kB
Shmem: 9668 kB
Slab: 44644 kB
SReclaimable: 18208 kB
SUnreclaim: 26436 kB
KernelStack: 4512 kB
PageTables: 4056 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3112232 kB
Committed_AS: 782724 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 180220 kB
VmallocChunk: 34359310332 kB
HardwareCorrupted: 0 kB
AnonHugePages: 178176 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 63360 kB
DirectMap2M: 2033664 kB
DirectMap1G: 0 kB
1.3.2. 实验任务二:通过命令查看 Hadoop 状态
1.3.2.1. 步骤一:切换到 hadoop 用户
若当前的用户为 root,请切换到 hadoop 用户进行操作。
[root@master ~]# su - hadoop
Last login: Tue May 10 14:33:03 CST 2022 on pts/0
[hadoop@master ~]$
1.3.2.2. 步骤二:切换到 Hadoop 的安装目录
[hadoop@master ~]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$
1.3.2.3. 步骤三:启动 Hadoop
[hadoop@master hadoop]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
[hadoop@master hadoop]$ jps
1697 SecondaryNameNode
2115 Jps
1865 ResourceManager
1498 NameNode
1.3.2.4. 步骤四:关闭 Hadoop
[hadoop@master hadoop]$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
10.10.10.130: stopping datanode
10.10.10.129: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
10.10.10.129: stopping nodemanager
10.10.10.130: stopping nodemanager
no proxyserver to stop
实验二:通过命令监控大数据平台资源状态
2.1 实验目标
完成本实验,您应该能够:
-
掌握大数据平台资源的运行状况
-
掌握查看大数据平台资源运行状况的命令
2.2. 实验要求
- 熟悉查看大数据平台资源运行状态的方式
- 了解查看大数据平台资源运行状况的命令
2.3. 实验过程
2.3.1. 实验任务一:看通过命令查看YARN状态
2.3.1.1. 步骤一:确认切换到目录 确认切换到目录 /usr/local/src/hadoop
[hadoop@master ~]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$
2.3.1.2. 步骤二:返回主机界面在在Master主机上执行 start-all.sh
[hadoop@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slav1.out
10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slav1.out
10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
[hadoop@master ~]$
#master 节点启动 zookeeper
[hadoop@master hadoop]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#slave1 节点启动 zookeeper
[hadoop@slav1 hadoop]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#slave2 节点启动 zookeeper
[hadoop@slave2 hadoop]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
2.3.1.3. 步骤三:执行JPS命令,发现Master上有NodeManager进程和ResourceManager进程,则YARN启动完成。
2817 NameNode
3681 ResourceManager
3477 NodeManager
3909 Jps
2990 SecondaryNameNode
2.3.2. 实验任务二:通过命令查看HDFS状态
2.3.2.1. 步骤一:目录操作
切换到 hadoop 目录,执行 cd /usr/local/src/hadoop 命令
[hadoop@master ~]$ cd /usr/local/src/hadoop
[hadoop@master hadoop]$
查看 HDFS 目录
[hadoop@master hadoop]$ ./bin/hdfs dfs –ls /
2.3.2.2. 步骤二:查看HDSF的报告,执行命令:bin/hdfs dfsadmin -report
[hadoop@master hadoop]$ bin/hdfs dfsadmin -report
Configured Capacity: 36477861888 (33.97 GB)
Present Capacity: 31767752704 (29.59 GB)
DFS Remaining: 31767146496 (29.59 GB)
DFS Used: 606208 (592 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 10.10.10.129:50010 (node1)
Hostname: node1
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 303104 (296 KB)
Non DFS Used: 2379792384 (2.22 GB)
DFS Remaining: 15858835456 (14.77 GB)
DFS Used%: 0.00%
DFS Remaining%: 86.95%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 20 18:31:48 CST 2022
Name: 10.10.10.130:50010 (node2)
Hostname: node2
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 303104 (296 KB)
Non DFS Used: 2330316800 (2.17 GB)
DFS Remaining: 15908311040 (14.82 GB)
DFS Used%: 0.00%
DFS Remaining%: 87.22%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 20 18:31:48 CST 2022
2.3.2.3. 步骤三:查看 HDFS 空间情况,执行命令:hdfs dfs -df
[hadoop@master hadoop]$ hdfs dfs -df
Filesystem Size Used Available Use%
hdfs://master:9000 36477861888 606208 31767146496 0%
2.3.3. 实验任务三:看通过命令查看HBase状态
2.3.3.1. 步骤一 :启动运行HBase
切换到 HBase 安装目录/usr/local/src/hbase,命令如下:
[hadoop@master hadoop]$ cd /usr/local/src/hbase
[hadoop@master hbase]$ hbase version
HBase 1.2.1
Source code repository git://asf-dev/home/busbey/projects/hbase revision=8d8a7107dc4ccbf36a92f64675dc60392f85c015
Compiled by busbey on Wed Mar 30 11:19:21 CDT 2016
From source with checksum f4bb4a14bb4e0b72b46f729dae98a772
结果显示 HBase1.2.1,说明 HBase 正在运行,版本号为 1.2.1。
2.3.3.2. 步骤二:查看HBase版本信息
执行命令hbase shell,进入HBase命令交互界面。
[hadoop@master hbase]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
输入version,查询 HBase 版本
hbase(main):001:0> version
1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
结果显示 HBase 版本为 1.2.1
2.3.3.3. 步骤三 :查询 HBase 状态,在 HBase 命令交互界面,执行 status 命令
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667
average load
我们还可以“简单”查询 HBase 的状态,执行命令 status 'simple'
active master: master:16000 1589125905790
0 backup masters
3 live servers
master:16020 1589125908065
requestsPerSecond=0.0, numberOfOnlineRegions=1,
usedHeapMB=28, maxHeapMB=1918, numberOfStores=1,
numberOfStorefiles=1, storefileUncompressedSizeMB=0,
storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
readRequestsCount=5, writeRequestsCount=1, rootIndexSizeKB=0,
totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
totalCompactingKVs=0, currentCompactedKVs=0,
compactionProgressPct=NaN, coprocessors=[MultiRowMutationEndpoint]
slave1:16020 1589125915820
requestsPerSecond=0.0, numberOfOnlineRegions=0,
usedHeapMB=17, maxHeapMB=440, numberOfStores=0,
numberOfStorefiles=0, storefileUncompressedSizeMB=0,
storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0,
totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
totalCompactingKVs=0, currentCompactedKVs=0,
compactionProgressPct=NaN, coprocessors=[]
slave2:16020 1589125917741
requestsPerSecond=0.0, numberOfOnlineRegions=1,
usedHeapMB=15, maxHeapMB=440, numberOfStores=1,
numberOfStorefiles=1, storefileUncompressedSizeMB=0,
storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
readRequestsCount=4, writeRequestsCount=0, rootIndexSizeKB=0,
totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
totalCompactingKVs=0, currentCompactedKVs=0,
compactionProgressPct=NaN, coprocessors=[]
0 dead servers
Aggregate load: 0, regions: 2
显示更多的关于 Master、Slave1和 Slave2 主机的服务端口、请求时间等详细信息。
如果需要查询更多关于 HBase 状态,执行命令 help 'status'
hbase(main):004:0> help 'status'
Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
default is 'summary'. Examples:
hbase> status
hbase> status 'simple'
hbase> status 'summary'
hbase> status 'detailed'
hbase> status 'replication'
hbase> status 'replication', 'source'
hbase> status 'replication', 'sink'
结果显示出所有关于 status 的命令。
2.3.3.4. 步骤四 停止HBase服务
停止HBase服务,则执行命令stop-hbase.sh。
[hadoop@master hbase]$ stop-hbase.sh
stopping hbasecat.........
2.4.4. 实验任务四:通过命令查看 Hive 状态
2.4.4.1. 步骤一:启动 Hive
切换到/usr/local/src/hive 目录,输入 hive,回车。
[hadoop@master ~]$ cd /usr/local/src/hive/
[hadoop@master hive]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
当显示 hive>时,表示启动成功,进入到了 Hive shell 状态。
2.4.4.2. 步骤二:Hive 操作基本命令
注意:Hive 命令行语句后面一定要加分号。
(1)查看数据库
hive> show databases;
OK
default
sample
Time taken: 0.596 seconds, Fetched: 2 row(s)
hive>
显示默认的数据库 default。
(2)查看 default 数据库所有表
hive> use default;
OK
Time taken: 0.018 seconds
hive> show tables;
OK
test
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive>
显示 default 数据中没有任何表。
(3)创建表 stu,表的 id 为整数型,name 为字符型
hive> create table stu(id int,name string);
OK
Time taken: 0.23 seconds
hive>
(4)为表 stu 插入一条信息,id 号为 001,name 为张三
hive> insert into stu values (1001,"zhangsan");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20220520185326_7c18630d-0690-4b35-8de8-423c9b901677
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1653042072571_0001, Tracking URL = http://master:8088/proxy/application_1653042072571_0001/
Kill Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1653042072571_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-05-20 18:56:05,436 Stage-1 map = 0%, reduce = 0%
2022-05-20 18:56:11,699 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.47 sec
MapReduce Total cumulative CPU time: 3 seconds 470 msec
Ended Job = job_1653042072571_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://master:9000/user/hive/warehouse/stu/.hive-staging_hive_2022-05-20_18-55-52_567_2370673334190980235-1/-ext-10000
Loading data to table default.stu
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.47 sec HDFS Read: 4138 HDFS Write: 81 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 470 msec
OK
Time taken: 20.438 seconds
按照以上操作,继续插入两条信息:id 和 name 分别为 1002、1003 和 lisi、wangwu。
(5)插入数据后查看表的信息
hive> show tables;
OK
stu
test
values__tmp__table__1
Time taken: 0.017 seconds, Fetched: 3 row(s)
hive>
(6)查看表 stu 结构
hive> desc stu;
OK
id int
name string
Time taken: 0.031 seconds, Fetched: 2 row(s)
hive>
(7)查看表 stu 的内容
hive> select * from stu;
OK
1001 zhangsan
Time taken: 0.077 seconds, Fetched: 1 row(s)
hive>
2.4.4.3. 步骤三:通过 Hive 命令行界面查看文件系统和历史命令
(1)查看本地文件系统,执行命令 ! ls /usr/local/src;
hive> ! ls /usr/local/src;
apache-hive-2.0.0-bin
flume
hadoop
hbase
hive
jdk
sqoop
zookeeper
(2)查看 HDFS 文件系统,执行命令 dfs -ls /;
hive> dfs -ls /;
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2022-04-15 22:04 /hbase
drwxr-xr-x - hadoop supergroup 0 2022-04-02 18:24 /input
drwxr-xr-x - hadoop supergroup 0 2022-04-02 18:26 /output
drwxr-xr-x - hadoop supergroup 0 2022-05-20 18:55 /tmp
drwxr-xr-x - hadoop supergroup 0 2022-04-29 17:03 /user
(3)查看在 Hive 中输入的所有历史命令
进入到当前用户 Hadoop 的目录/home/hadoop,查看.hivehistory 文件。
[hadoop@master ~]$ cd /home/hadoop
[hadoop@master ~]$ cat .hivehistory
create database sample;
use sample;
create table student(number STRING,name STRING);
exit;
select * from sample.student;
exit;
show tables;
exit;
show databases;
use default;
show tables;
create table stu(id int,name string);
insert into stu values (1001,"zhangsan");
show tables;
desc stu;
select * from stu;
! ls /usr/local/src;
dfs -ls /;
exit
;
结果显示,之前在 Hive 命令行界面下运行的所有命令(含错误命令)都显示了出来,有助于维护、故障排查等工作。
实验三 通过命令监控大数据平台服务状态
3.1. 实验目标
完成本实验,您应该能够:
-
掌握大数据平台服务的运行状况
-
掌握查看大数据平台服务运行状况的命令
3.2. 实验要求
-
熟悉查看大数据平台服务运行状态的方式
-
了解查看大数据平台服务运行状况的命令
3.3. 实验过程
3.3.1. 实验任务一: 通过命令查看 ZooKeeper 状态
3.3.1.1. 步骤一: 查看ZooKeeper状态,执行命令 zkServer.sh status,结果显示如下
[hadoop@master ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower
以上结果中,Mode:follower 表示为 ZooKeeper 的跟随者。
3.3.1.2. 步骤二: 查看运行进程
QuorumPeerMain:QuorumPeerMain 是 ZooKeeper 集群的启动入口类,是用来加载配置启动 QuorumPeer线程的。
执行命令 jps 以查看进程情况。
[hadoop@master ~]$ jps
5029 Jps
3494 SecondaryNameNode
3947 QuorumPeerMain
3292 NameNode
3660 ResourceManager
3.3.1.3. 步骤四: 在成功启动ZooKeeper服务后,输入命令 zkCli.sh,连接到ZooKeeper 服务。
[hadoop@master ~]$ zkCli.sh
Connecting to localhost:2181
2022-05-20 19:07:11,924 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT
2022-05-20 19:07:11,927 [myid:] - INFO [main:Environment@100] - Client environment:host.name=master
2022-05-20 19:07:11,927 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_152
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/local/src/jdk/jre
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/src/zookeeper/bin/../build/classes:/usr/local/src/zookeeper/bin/../build/lib/*.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/local/src/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/local/src/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/src/zookeeper/bin/../zookeeper-3.4.8.jar:/usr/local/src/zookeeper/bin/../src/java/lib/*.jar:/usr/local/src/zookeeper/bin/../conf::/usr/local/src/sqoop/lib
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-862.el7.x86_64
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:user.name=hadoop
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/hadoop
2022-05-20 19:07:11,929 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/hadoop
2022-05-20 19:07:11,930 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@69d0a921
Welcome to ZooKeeper!
2022-05-20 19:07:11,946 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2022-05-20 19:07:11,984 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@876] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
2022-05-20 19:07:11,991 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x180e0fed4990001, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
3.3.1.4. 步骤五: 使用 Watch 监听/hbase 目录,一旦/hbase 内容有变化,将会有提 内容有变化,将会有提示。打开监视,执行命令 示。打开监视,执行命令 get /hbase 1 。
cZxid = 0x100000002
ctime = Thu Apr 23 16:02:29 CST 2022
mZxid = 0x100000002
mtime = Thu Apr 23 16:02:29 CST 2022
pZxid = 0x20000008d
cversion = 26
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 16
[zk: localhost:2181(CONNECTED) 1] set /hbase value-update
WATCHER::cZxid = 0x100000002
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/hbase
ctime = Thu Apr 23 16:02:29 CST 2022
mZxid = 0x20000c6d3
mtime = Fri May 15 15:03:41 CST 2022
pZxid = 0x20000008d
cversion = 26
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 12
numChildren = 16
[zk: localhost:2181(CONNECTED) 2] get /hbase
value-update
cZxid = 0x100000002
ctime = Thu Apr 23 16:02:29 CST 2022
mZxid = 0x20000c6d3
mtime = Fri May 15 15:03:41 CST 2022
pZxid = 0x20000008d
cversion = 26
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 12
numChildren = 16
[zk: localhost:2181(CONNECTED) 3] quit
结果显示,当执行命令 set /hbase value-update 后,数据版本由 0 变成 1,说明/hbase 处于监控中。
3.3.2. 实验任务二:通过命令查看 Sqoop 状态
3.3.2.1. 步骤一: 查询 Sqoop 版本号,验证 Sqoop 是否启动成功。
首先切换到/usr/local/src/sqoop 目录,执行命令:./bin/sqoop-version
[hadoop@master ~]$ cd /usr/local/src/sqoop
[hadoop@master sqoop]$ ./bin/sqoop-version
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/05/20 19:10:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
结果显示:Sqoop 1.4.7,说明 Sqoop 版本号为 1.4.7,并启动成功。
3.3.2.2. 步骤二: 测试 Sqoop 是否能够成功连接数据库
切换到Sqoop 的 目 录 , 执 行 命 令 bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$,命令中“master:3306”为数据库主机名和端口。
[hadoop@master sqoop]$ bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/05/20 19:13:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/05/20 19:13:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/05/20 19:13:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri May 20 19:13:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sample
sys
结果显示,可以连接到 MySQL,并查看到 Master 主机中 MySQL 的所有库实例,如information_schema、hive、mysql、performance_schema 和 sys 等数据库。
3.3.2.3. 步骤三: 执行命令sqoop help ,可以看到如下内容,代表Sqoop 启动成功。
[hadoop@master sqoop]$ sqoop help
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/05/20 19:14:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
结果显示了 Sqoop 的常用命令和功能,如下表所示。
3.3.3. 实验任务三:通过命令查看Flume状态
3.3.3.1. 步骤一: 检查 Flume安装是否成功,执行flume-ng version 命令,查看 Flume的版本。
[hadoop@master ~]$ cd /usr/local/src/flume
[hadoop@master flume]$ flume-ng version
Flume 1.6.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
From source with checksum b29e416802ce9ece3269d34233baf43f
3.3.3.2. 步骤二: 添加 example.conf 到/usr/local/src/flume
[hadoop@master flume]$ cat /usr/local/src/flume/example.conf
a1.sources=r1
a1.sinks=k1
a1.channels=c1
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/usr/local/src/flume/
a1.sources.r1.fileHeader=true
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://master:9000/flume
a1.sinks.k1.hdfs.rollsize=1048760
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollInterval=900
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.channels.c1.type=file
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3.4.3.3. 步骤三:启动Flume Agent a1 日志控制台
[hadoop@master flume]$ /usr/local/src/flume/bin/flume-ng agent --conf ./conf --conf-file ./example.conf --name a1 -Dflume.root.logger=INFO,console
3.4.3.4. 步骤四: 查看结果
[hadoop@master flume]$ hdfs dfs -lsr /flume
drwxr-xr-x - hadoop supergroup 0 2022-05-20 15:16
/flume/20220520
-rw-r--r-- 2 hadoop supergroup 11 2022-05-20 15:16
/flume/20220520/events-.
本文来自博客园,作者:Cloudservice,转载请注明原文链接:https://www.cnblogs.com/whwh/p/16294629.html,只要学不死,就往死里学!