一、Hadoop课程

Hadoop课程

2.1 初始设置

初始环境这里平台已设置好,同学们需要了解一下如何设置。

1. 修改主机名,以master节点为例

[ec2-user@ip-172-31-32-47 ~]$ sudo vi /etc/hostname 
#在里面删去所有内容,在首行添加 master作为自己新的主机名。
#重启虚拟机,使配置生效
[ec2-user@ip-172-31-32-47 ~]$ sudo reboot

2. 修改hosts映射,以master节点为例

#查看所有节点的IP
[ec2-user@master ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.31.32.47  netmask 255.255.240.0  broadcast 172.31.47.255
        inet6 fe80::8b2:80ff:fe01:e5c2  prefixlen 64  scopeid 0x20<link>
        ether 0a:b2:80:01:e5:c2  txqueuelen 1000  (Ethernet)
        RX packets 3461  bytes 687720 (671.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3262  bytes 544011 (531.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
[ec2-user@slave1 ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.31.36.81  netmask 255.255.240.0  broadcast 172.31.47.255
        inet6 fe80::87d:36ff:fe72:bc0c  prefixlen 64  scopeid 0x20<link>
        ether 0a:7d:36:72:bc:0c  txqueuelen 1000  (Ethernet)
        RX packets 2195  bytes 543199 (530.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2178  bytes 361053 (352.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
[ec2-user@slave2 ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.31.46.142  netmask 255.255.240.0  broadcast 172.31.47.255
        inet6 fe80::850:68ff:fe8c:6c5e  prefixlen 64  scopeid 0x20<link>
        ether 0a:50:68:8c:6c:5e  txqueuelen 1000  (Ethernet)
        RX packets 2284  bytes 547630 (534.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2241  bytes 375782 (366.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

#以IP 主机名格式写道hosts文件中
[ec2-user@master ~]$ sudo vi /etc/hosts
#查看修改结果,注意:所有节点都要修改hosts文件
[ec2-user@master ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost6 localhost6.localdomain6
172.31.32.47 master
172.31.36.81 slave1
172.31.46.142 slave2

2.2 安装Java环境

我们先来了解一下为什么要安装JDK,JDK是 Java 语言的软件开发工具包,提供给程序员使用。主要用于移动设备、嵌入式设备上的java应用程序。JDK是整个java开发的核心,它包含了JAVA的运行环境(JVM+Java系统类库)和JAVA工具。

1. 解压jdk1.8

#将jdk解压到指定路径
[ec2-user@master ~]$ sudo tar -zxvf hadoop/jdk-8u144-linux-x64.tar.gz -C /usr/local/src/
#查看目标目录下是否有jdk解压包
[ec2-user@master ~]$ ls /usr/local/src/
jdk1.8.0_144

2. 重命名为jdk

[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
jdk1.8.0_144
[ec2-user@master src]$ sudo mv jdk1.8.0_144/ jdk
[ec2-user@master src]$ ls
jdk

3. 添加环境变量(所有节点)–以master为例

[ec2-user@master src]$ sudo vi /etc/profile
#在文件末尾添加如下内容
export JAVA_HOME=/usr/local/src/jdk
export PATH=$PATH:$JAVA_HOME/bin
#刷新环境变量
[ec2-user@master src]$ source /etc/profile

4. 查看jdk版本,验证是否安装成功

[ec2-user@master src]$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

5. 修改权限(所有节点,以master为例)

因为我们的实验是采用普通用户执行的,但是/usr/local/src/目录需要root权限才能操作,如果不修改权限,在分发文件时会显示权限不足。

[ec2-user@master ~]$ ll /usr/local/
total 0
drwxr-xr-x 2 root root  6 Apr  9  2019 bin
drwxr-xr-x 2 root root  6 Apr  9  2019 etc
drwxr-xr-x 2 root root  6 Apr  9  2019 games
drwxr-xr-x 2 root root  6 Apr  9  2019 include
drwxr-xr-x 2 root root  6 Apr  9  2019 lib
drwxr-xr-x 2 root root  6 Apr  9  2019 lib64
drwxr-xr-x 2 root root  6 Apr  9  2019 libexec
drwxr-xr-x 2 root root  6 Apr  9  2019 sbin
drwxr-xr-x 5 root root 49 Mar  4 20:51 share
drwxr-xr-x 4 root root 31 Mar 19 06:54 src
#把/usr/local/src/目录和子文件夹的所属用户以及所属组设置为ec2-user用户
[ec2-user@master ~]$ sudo chown -R ec2-user:ec2-user /usr/local/src/
#再次查看/usr/local/src/目录所属用户以及所属组
[ec2-user@master ~]$ ll /usr/local/
total 0
drwxr-xr-x 2 root     root      6 Apr  9  2019 bin
drwxr-xr-x 2 root     root      6 Apr  9  2019 etc
drwxr-xr-x 2 root     root      6 Apr  9  2019 games
drwxr-xr-x 2 root     root      6 Apr  9  2019 include
drwxr-xr-x 2 root     root      6 Apr  9  2019 lib
drwxr-xr-x 2 root     root      6 Apr  9  2019 lib64
drwxr-xr-x 2 root     root      6 Apr  9  2019 libexec
drwxr-xr-x 2 root     root      6 Apr  9  2019 sbin
drwxr-xr-x 5 root     root     49 Mar  4 20:51 share
drwxr-xr-x 4 ec2-user ec2-user 31 Mar 19 06:54 src

6. 远程分发到其他节点

[ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave1:/usr/local/src/
[ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave2:/usr/local/src/

2.3 安装Hadoop集群

1. 解压

[ec2-user@master src]$tar -zxvf /home/ec2-user/hadoop/hadoop-2.9.1.tar.gz -C /usr/local/src/
[ec2-user@master src]$ ls
hadoop-2.9.1  jdk

2. 重命名为Hadoop

[ec2-user@master src]$ pwd
/usr/local/src
[ec2-user@master src]$ mv hadoop-2.9.1/ hadoop
[ec2-user@master src]$ ls
hadoop  jdk

3. 添加环境变量(所有节点)–以master为例

[ec2-user@master ~]$ sudo vi /etc/profile
#在文件末尾添加如下内容
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=/usr/local/src/hadoop/lib/*
#刷新环境变量
[ec2-user@master ~]$ source /etc/profile

4. 修改core-site.xml配置文件

[ec2-user@master ~]$ cd /usr/local/src/hadoop/etc/hadoop/
[ec2-user@master hadoop]$ vi core-site.xml 

<configuration></configuration>标签中添加如下内容:

	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://master:9000</value>
	</property>

	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/src/hadoop/tmp</value>
	</property>

5. 修改hdfs-site.xml配置文件

[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi hdfs-site.xml 

<configuration></configuration>标签中添加如下内容:

<property>
	<name>dfs.replication</name>
	<value>3</value>
</property>

<!-- 指定Hadoop辅助名称节点主机配s置 -->
<property>
	<name>dfs.namenode.secondary.http-address</name>
	<value>slave1:50090</value>
</property>

<property>
	<name>dfs.namenode.name.dir</name>
	<value>/usr/local/src/hadoop/tmp/dfs/name</value>
</property>

<property>
	<name>dfs.datanode.data.dir</name>
	<value>/usr/local/src/hadoop/tmp/dfs/data</value>
</property>

<property>
	<name>dfs.webhdfs.enabled</name>
	<value>true</value>
</property>

6. 修改yarn-site.xml配置文件

[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi yarn-site.xml 

<configuration></configuration>标签中添加如下内容:

	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>

	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>master</value>
	</property>

	<property>
		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>

7. 修改mapred-site.xml配置文件

[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
[ec2-user@master hadoop]$ vi mapred-site.xml

<configuration></configuration>标签中添加如下内容:

    <property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>

8. 修改hadoop-env.sh配置文件

[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi hadoop-env.sh 

配置jdk路径:

export JAVA_HOME=/usr/local/src/jdk

注意:要根据自己路径来修改。

9. 修改slaves配置文件

[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi slaves 
[ec2-user@master hadoop]$ cat slaves 
slave1
slave2

10. 远程分发到其他节点

[ec2-user@master hadoop]$ cd /usr/local/src/
[ec2-user@master src]$ scp -r hadoop/ slave1:/usr/local/src/
[ec2-user@master src]$ scp -r hadoop/ slave2:/usr/local/src/

11. 在namenode节点格式化namenode

[ec2-user@master src]$ hdfs namenode -format

img

12. 启动hadoop集群

#在namenode节点启动Hadoop集群
[ec2-user@master src]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
The authenticity of host 'master (172.31.32.47)' can't be established.
ECDSA key fingerprint is SHA256:Tueyo4xR8lsxmdA11GlXAO3w44n6T75dYHe9flk8Y70.
ECDSA key fingerprint is MD5:22:9b:6d:f2:f3:11:a2:6d:4d:dd:ec:25:56:3b:2d:b2.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,172.31.32.47' (ECDSA) to the list of known hosts.
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-namenode-master.out
slave2: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave1.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-secondarynamenode-slave1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave2.out
#jps查看进程
[ec2-user@master src]$ jps
31522 Jps
31256 ResourceManager
30973 NameNode
[ec2-user@master src]$ ssh slave1
Last login: Fri Mar 19 06:15:47 2021 from 219.153.251.37

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
[ec2-user@slave1 ~]$ jps
29424 DataNode
29635 NodeManager
29544 SecondaryNameNode
29789 Jps
[ec2-user@slave1 ~]$ ssh slave2
Last login: Fri Mar 19 06:15:57 2021 from 219.153.251.37

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
[ec2-user@slave2 ~]$ jps
29633 Jps
29479 NodeManager
29354 DataNode

13. 查看hadoop集群状态

[ec2-user@master ~]$ hdfs dfsadmin -report
Configured Capacity: 17154662400 (15.98 GB)
Present Capacity: 11389693952 (10.61 GB)
DFS Remaining: 11389685760 (10.61 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 172.31.36.81:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 8577331200 (7.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2882510848 (2.68 GB)
DFS Remaining: 5694816256 (5.30 GB)
DFS Used%: 0.00%
DFS Remaining%: 66.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Mar 19 07:45:06 UTC 2021
Last Block Report: Fri Mar 19 07:41:00 UTC 2021


Name: 172.31.46.142:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 8577331200 (7.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2882457600 (2.68 GB)
DFS Remaining: 5694869504 (5.30 GB)
DFS Used%: 0.00%
DFS Remaining%: 66.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Mar 19 07:45:06 UTC 2021
Last Block Report: Fri Mar 19 07:41:00 UTC 2021

2.4 安装Hive

1. 安装MySQL

在安装hive前我们需要先安装MySQL数据库,用来存储hive的元数据。

1)下载mysql源安装包

[ec2-user@master ~]$ sudo wget http://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm

2)安装mysql源

[ec2-user@master ~]$ sudo yum localinstall mysql57-community-release-el7-8.noarch.rpm

3)检查mysql源是否安装成功

[ec2-user@master ~]$ sudo yum repolist enabled | grep "mysql.*.community.*"
mysql-connectors-community/x86_64     MySQL Connectors Community          146+39
mysql-tools-community/x86_64          MySQL Tools Community                  123
mysql57-community/x86_64              MySQL 5.7 Community Server             484

4)安装MySQL

[ec2-user@master ~]$ sudo yum install mysql-community-server

5)启动MySQL服务并查看运行状态

[ec2-user@master ~]$ sudo systemctl start mysqld
[ec2-user@master ~]$ sudo systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-03-19 07:56:43 UTC; 1s ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
  Process: 31978 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
  Process: 31927 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
 Main PID: 31981 (mysqld)
   CGroup: /system.slice/mysqld.service
           └─31981 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

Mar 19 07:56:39 master systemd[1]: Starting MySQL Server...
Mar 19 07:56:43 master systemd[1]: Started MySQL Server.

6)查看mysql初始密码

[ec2-user@master ~]$ sudo grep "password" /var/log/mysqld.log
2021-03-19T07:56:41.030922Z 1 [Note] A temporary password is generated for root@localhost: v=OKXu0laSo;

7)修改mysql登陆密码

先把之前我们查看到的初始密码复制下来,在进入mysql需要输入密码时粘贴下来,回车,就可以进入MySQL命令行。

[ec2-user@master ~]$ sudo mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.33

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

修改密码,设置MySQL登陆密码为1234:

mysql> set password for 'root'@'localhost'=password('1234');
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements

由上可知,新密码设置的时候如果设置的过于简单会报错。

这时我们需要修改密码规则:

mysql> set global validate_password_policy=0;
Query OK, 0 rows affected (0.00 sec)

mysql> set global validate_password_length=1;
Query OK, 0 rows affected (0.00 sec)

重新设置密码:

mysql> set password for 'root'@'localhost'=password('1234');
Query OK, 0 rows affected, 1 warning (0.00 sec)

8) 设置远程登陆

先退出MySQL,以新密码登陆MySQL。

[ec2-user@master ~]$ mysql -uroot -p1234
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.7.33 MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

创建用户:

mysql> create user 'root'@'172.%.%.%' identified by '1234';
Query OK, 0 rows affected (0.00 sec)

允许远程连接:

mysql> grant all privileges on *.* to 'root'@'172.%.%.%' with grant option;
Query OK, 0 rows affected (0.00 sec)

刷新权限:

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

至此,MySQL安装成功。

2. 把hive解压到指定位置

[ec2-user@master ~]$ tar -zxvf hadoop/apache-hive-1.1.0-bin.tar.gz -C /usr/local/src/

3. 重命名

[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
apache-hive-1.1.0-bin  hadoop  jdk
[ec2-user@master src]$ mv apache-hive-1.1.0-bin/ hive
[ec2-user@master src]$ ls
hadoop  hive  jdk

4. 添加环境变量

[ec2-user@master src]$ sudo vi /etc/profile
#在文件末尾添加如下内容
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/src/hive/lib/*
#刷新环境变量
[ec2-user@master src]$ source /etc/profile

5. 修改hive-site.xml配置文件

[ec2-user@master src]$ cd hive/conf/
#创建hive-site.xml文件
[ec2-user@master conf]$ touch hive-site.xml
[ec2-user@master conf]$ vi hive-site.xml 

在hive-site.xml文件中添加如下内容:

<configuration>
<property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
</property>

<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
</property>

<property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
</property>

<property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
</property>

<property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>1234</value>
</property>
</configuration>

注意:MySQL密码要改成自己设置的密码。

6. 修改hive-env.sh配置文件

[ec2-user@master conf]$ pwd
/usr/local/src/hive/conf
[ec2-user@master conf]$ cp hive-env.sh.template hive-env.sh
[ec2-user@master conf]$ vi hive-env.sh
#在里面添加如下配置
export HADOOP_HOME=/usr/local/src/hadoop
export HIVE_CONF_DIR=/usr/local/src/hive/conf

7. 添加MySQL连接包

把MySQL驱动放到hive的lib目录下。

[ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $HIVE_HOME/lib
[ec2-user@master conf]$ ls $HIVE_HOME/lib/mysql-connector-java-5.1.44-bin.jar 
/usr/local/src/hive/lib/mysql-connector-java-5.1.44-bin.jar

8. 启动Hadoop集群(hive需要hdfs分布式文件系统存储来数据)

如果Hadoop已启动,则不需要执行这一步。

start-all.sh

9. 初始化MySQL中的hive的数据库

[ec2-user@master conf]$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 1.1.0
Initialization script hive-schema-1.1.0.mysql.sql
Initialization script completed
schemaTool completed

10. 启动hive并测试

[ec2-user@master conf]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-1.1.0.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.587 seconds, Fetched: 1 row(s)

至此,hive安装成功。

2.5 安装Sqoop

1. 解压

[ec2-user@master ~]$ tar -zxvf hadoop/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/

2. 重命名为sqoop

[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
hadoop  hive  jdk  sqoop-1.4.7.bin__hadoop-2.6.0
[ec2-user@master src]$ mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop
[ec2-user@master src]$ ls
hadoop  hive  jdk  sqoop

3. 添加环境变量

[ec2-user@master src]$ sudo vi /etc/profile
#在里面添加如下代码
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
#刷新环境变量
[ec2-user@master src]$ source /etc/profile

4. 修改sqoop-env.sh配置文件

[ec2-user@master src]$ cd sqoop/conf/
[ec2-user@master conf]$ mv sqoop-env-template.sh sqoop-env.sh
[ec2-user@master conf]$ vi sqoop-env.sh 

在里面修改一下配置项,根据自己的环境来修改:

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/src/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop

#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/src/hive

5. 把mysql驱动放到sqoop的lib目录下

[ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $SQOOP_HOME/lib[ec2-user@master conf]$ ls $SQOOP_HOME/lib/mysql-connector-java-5.1.44-bin.jar 
/usr/local/src/sqoop/lib/mysql-connector-java-5.1.44-bin.jar

6. 验证sqoop是否配置成功

[ec2-user@master conf]$ sqoop help
Warning: /usr/local/src/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/src/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
21/03/19 08:53:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.
posted @ 2021-07-18 11:49  小石小石摩西摩西  阅读(218)  评论(0编辑  收藏  举报