Hive基本操作

Hive

　Hive 是建立在 Hadoop 上的数据仓库基础构架。它提供了一系列的工具，可以用来进行数据提取转化加载（ETL），这是一种可以存储、查询和分析存储在 Hadoop 中的大规模数据的机制。Hive 定义了简单的类 SQL 查询语言，称为 HQL，它允许熟悉 SQL 的用户查询数据。同时，这个语言也允许熟悉 MapReduce 开发者的开发自定义的 mapper 和 reducer 来处理内建的 mapper 和 reducer 无法完成的复杂的分析工作。

基本操作

$ hive # 进入 hive 的命令行 cli ，这是在 shell 中执行命令；
hive > show databases; # 查询有哪些数据库；
hive > use test; # 切换到 test 数据库；
hive > show tables; # 查询有哪些数据表；
hive > show tables "user*"; # 查询有哪些以 user 作为前缀的数据表，比如 user_tmp、user_info 数据表；
hive > describe user; 或是 desc user; # 查询表 user 表的结构，包括哪些字段以及是神马类型；
hive > desc extended user; # 查询关于 user 表的更多信息；

查看建表语句

	show CREATE TABLE pdata.dw_t01_province_transcode

Hive的存储架构与HQL语法

1 常用Hive命令

	hive> show databases;
	hive> show tables;
	hive> create database test_database;
	hive> create table  test_inner_table (user_id int, cid string, ckid string, username string) row format delimited fields terminated by '\t' lines terminated by '\n';
	hive>

2 首先在/tmp/目录下面建一个文件load.txt

	sudo nano load.txt
	hive> load data local inpath '/tmp/load.txt' into table test_external_table;
	hive> select * from test_external_table;
	hive> select count(*) from test_external_table;

3 分区表简单示例：

	hive> create table logs(ts bigint,line string) partitioned by(dt string,country string) row format delimited fields terminated by '$' lines terminated by '\n';
	
	  data.txt内容:
	hadoop@hadoopmaster:/tmp$ more data1.txt
	1$1
	2$3

4 加载数据：

	hive> load data local inpath '/tmp/data.txt' into table logs partition(dt='2015-01-01',country='zh');
	hive> load data local inpath '/tmp/data.txt' into table logs partition(dt='2015-04-05',country='jp');
	hive> load data local inpath '/tmp/data.txt' into table logs partition(dt='2015-04-05',country='zh');
	
	  查看数据：
	0: jdbc:hive2://localhost:10000/default> select * from logs;
	OK

5 显示所有表

	hive> show tables;

6 按正则表达式显示表

	hive> show tables '.*s';

7 给表增加一列

	hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
	OK
	Time taken: 0.238 seconds
	hive> desc pokes;
	OK
	foo                 	int                 	                    
	bar                 	string              	                    
	new_col             	int                 	                    
	Time taken: 0.275 seconds, Fetched: 3 row(s)

8 给表添加一列并添加字段解析

	hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');
	OK
	Time taken: 0.151 seconds
	hive> desc invites;
	OK

9 更改表名字

	hive> ALTER TABLE invites RENAME TO 3koobecaf;
	OK
	Time taken: 0.189 seconds
	hive> show tables;
	OK

10 删除表

	hive> hive> DROP TABLE pokes;

11 创建数据库

	hive> CREATE DATABASE chu888chu888;

参考网址

posted @ 2018-04-20 09:43 银河统计阅读(407) 评论(0) 收藏举报

银河统计

哈尔滨商业大学银河统计工作室

Hive基本操作

Hive

基本操作

查看建表语句

参考网址