hive learing_1
1、什么是hive
hive是基于Hadoop的一种数据库仓库工具,将结构化数据映射为一张数据表,并提供SQL查询,分析等功能
2、hive的特点
- 将数据结构(shema)存储在数据库中,数据存储在HDFS中
- 适用于联机分析处理(OLAP:on-line analytic process)
- 支持HQL查询
3、hive的缺点
- 不支持关系数据库
- 不支持联机事务处理(OLTP:on-line transaction process):如银行系统
- 不支持实时查询和行级更新
4、hive的安装配置
- 下载安装包,解压到需要安装的目录
- 配置环境变量 vi /etc/profile,并立即生效 source /etc/profile
- 修改配置文件
-
修改hive-site.xml
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.109.150:3306/hive_metastore?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.ds.connection.url.hook</name>
<value/>
<description>Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used</description>
</property>
-
将${system的地方改成具体的地址
-
将MySQL的驱动包mysql-connector-java-5.1.46.jar复制到/soft/hive/lib下面
-
启动hive,初始化元数据
[xiaoqiu@s150 /soft/hive/conf]$ schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/soft/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://192.168.109.150:3306/hive_metastore?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
-
创建数据库
hive> create database incubator;
OK
Time taken: 0.338 seconds
hive> show databases;
OK
default
incubator
mydb
Time taken: 0.014 seconds, Fetched: 3 row(s)
-
查看HDFS(可见hive创建的数据库直接同步于HDFS的hive数据仓库中了)
[xiaoqiu@s150 /soft/hive/conf]$ hadoop fs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwx-wx-wx - xiaoqiu supergroup 0 2018-07-02 02:21 /tmp
drwx-wx-wx - xiaoqiu supergroup 0 2018-07-02 10:20 /tmp/hive
drwx------ - root supergroup 0 2018-07-02 10:41 /tmp/hive/root
drwx------ - root supergroup 0 2018-07-02 10:41 /tmp/hive/root/f5b7d613-fb3b-4125-b2d7-ea643b199c2f
drwx------ - root supergroup 0 2018-07-02 10:41 /tmp/hive/root/f5b7d613-fb3b-4125-b2d7-ea643b199c2f/_tmp_space.db
drwx------ - xiaoqiu supergroup 0 2018-07-04 10:58 /tmp/hive/xiaoqiu
drwx------ - xiaoqiu supergroup 0 2018-07-04 10:58 /tmp/hive/xiaoqiu/9f4397b7-db00-4ad5-b5de-cc3fd0d72eb6
drwx------ - xiaoqiu supergroup 0 2018-07-04 10:58 /tmp/hive/xiaoqiu/9f4397b7-db00-4ad5-b5de-cc3fd0d72eb6/_tmp_space.db
drwx------ - xiaoqiu supergroup 0 2018-07-02 17:19 /tmp/hive/xiaoqiu/ae6a90bd-558d-482c-b408-dc0273bce93e
drwx------ - xiaoqiu supergroup 0 2018-07-02 17:19 /tmp/hive/xiaoqiu/ae6a90bd-558d-482c-b408-dc0273bce93e/_tmp_space.db
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:32 /user
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:32 /user/hive
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:34 /user/hive/warehouse
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:34 /user/hive/warehouse/incubator.db
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:32 /user/hive/warehouse/mydb.db
-
创建table
hive> create table t(id int,name string,age int);
OK
Time taken: 1.641 seconds
hive> show tables;
OK
t
Time taken: 1.312 seconds, Fetched: 1 row(s)
-
在MySQL中查看(其中TBL_TYPE为MANAGED_TABLE即托管表的意思,也就是删除表的时候,表的数据也会被删除掉,还有一种类型的表为EXTERNAL即外部表,删除表的时候不会删除数据)
MariaDB [hive_metastore]> select * from TBLS;
+--------+-------------+-------+------------------+---------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+---------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| 1 | 1530721536 | 3 | 0 | xiaoqiu | 0 | 1 | t | MANAGED_TABLE | NULL | NULL | |
+--------+-------------+-------+------------------+---------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
1 row in set (0.00 sec)
-
这个时候还可以在HDFS查看表
[xiaoqiu@s150 /soft/hive/conf]$ hadoop fs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwx-wx-wx - xiaoqiu supergroup 0 2018-07-02 02:21 /tmp
drwx-wx-wx - xiaoqiu supergroup 0 2018-07-02 10:20 /tmp/hive
drwx------ - root supergroup 0 2018-07-02 10:41 /tmp/hive/root
drwx------ - root supergroup 0 2018-07-02 10:41 /tmp/hive/root/f5b7d613-fb3b-4125-b2d7-ea643b199c2f
drwx------ - root supergroup 0 2018-07-02 10:41 /tmp/hive/root/f5b7d613-fb3b-4125-b2d7-ea643b199c2f/_tmp_space.db
drwx------ - xiaoqiu supergroup 0 2018-07-04 10:58 /tmp/hive/xiaoqiu
drwx------ - xiaoqiu supergroup 0 2018-07-04 10:58 /tmp/hive/xiaoqiu/9f4397b7-db00-4ad5-b5de-cc3fd0d72eb6
drwx------ - xiaoqiu supergroup 0 2018-07-04 10:58 /tmp/hive/xiaoqiu/9f4397b7-db00-4ad5-b5de-cc3fd0d72eb6/_tmp_space.db
drwx------ - xiaoqiu supergroup 0 2018-07-02 17:19 /tmp/hive/xiaoqiu/ae6a90bd-558d-482c-b408-dc0273bce93e
drwx------ - xiaoqiu supergroup 0 2018-07-02 17:19 /tmp/hive/xiaoqiu/ae6a90bd-558d-482c-b408-dc0273bce93e/_tmp_space.db
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:32 /user
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:32 /user/hive
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:34 /user/hive/warehouse
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 12:25 /user/hive/warehouse/incubator.db
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 12:25 /user/hive/warehouse/incubator.db/t
drwxr-xr-x - xiaoqiu supergroup 0 2018-07-04 11:32 /user/hive/warehouse/mydb.db
-
向表t中插入数据(直接在hive中使用insert into语句进行插入,会先调用mapreduce)
欢迎关注我的公众号:小秋的博客
CSDN博客:https://blog.csdn.net/xiaoqiu_cr
github:https://github.com/crr121
联系邮箱:rongchen633@gmail.com
有什么问题可以给我留言噢~