hive学习（初级）

hive数据类型

tinyint
smallint
int
bigint
String //''|""
varchar //1-65535
char //255
timestamp //format "YYYY-MM-DD HH:MM:SS.fffffffff"
Date //form{{YYYY-MM-DD}}
decimal
UNIONTYPE //联合类型
NULL

hive的基本操作

create database if not exists 库名; #创建库
alter database dbname set dbproperties('edited-by'='joe'); #修改库(不能删除或“重置”数据库属性)
describe database extended dbname; #查询库
drop database [if exists] dbname; #删除库
desc database extended 库; #显示库的扩展信息

hive>create external table dat0204(filmname string ,filmdate date ,filmscore string)
>comment '别名'
>row format delimited
>fields terminated by '\t'
>lines terminated by '\n'
>stored as textfile; #创建外部表，外部表相对来说更加安全些，数据组织也更加灵活，方便共享源数据。

hive>create table if not exists dat0204(id int , name string , age int)
>comment '别名'
>row format delimited
>fields terminated by '\t'
>lines terminated by '\n'
>stored as textfile; #创建内部表

desc 表; #表的描述
desc formatted 表; #查询表的结构
desc extended 表; #显示表的扩展信息
select * from 表; #查询表的信息
create table 库名1.表名1 like 库名2.表名2; #复制表(表结构+数据)

alter table hive1.test2 add partition(province='hebei',city='baoding') #添加分区

show partitions hive1; #查看表的分区

insert overwrite table test2 partition(provice='hebei',city='shijiazhuang') select id , name , age from test1; #增加数据

drop table 表; #删除空表
drop table 表 cascade; #删除非空表

show tables like '*name*'; #模糊搜索表

插入数据(加载到HDFS)

hive>load data local inpath 'path/filename' overwrite into table 表名; #从本地数据导入Hive表

hive>load data inpath 'path/filename' into table 表名; #HDFS上导入数据到Hive表

hive> insert overwrite directory "hodoop目录" select user, login_time from user_login; #将查询数据输出hdfs目录

$ hive -e "sql语句" > /tmp/out.txt #保存sql语句查询信息到本地文件

hive命令模型

hive>dfs -lsr / //显示dfs下文件：路径/库/表/文件
hive>dfs -rmr /目录 //dfs命令，删除目录
hive>!clear ; //hive中执行shell命令
hive>!dfs -lsr / ; //hive中执行hdfs命令

元数据都储存在mysql

use hive用户库;

select * from VERSION; #查看hive版本
select * from TBLS \G; #查看有哪些表,易区分各表。
select * from SDS \G; #查看表对应的hdfs目录的metedata
select * from PARTITIONS where TBL_ID=1 \G; #查看某个表的partitions：
select * from COLUMNS_V2; #查看某个表的列：
select * from PARTITION_KEYS; #查看某个表的partition
select * from DBS; #查看数据仓库信息

调优
1.explain———解释执行计划
explain select sum(*) from test2 ;

hive的起源与应用

1、起源:由Facebook开源用于解决海量结构化日志的数据统计;
2、结构:Hive是基于hadoop的一个数据仓库工具,可以将结构化的数据文件映射成一张表，并提供类SQL查询功能;
(使用HQL作为查询接口；使用HDFS存储；使用MapReduce计算；)

本质是:将HQL转化成MapReduce程序。
3、应用:适合离线数据处理。
4、schema(模式，元信息存放到数据库中)
5、数据库和表都是路径。

6、hive在写操作是不校验，读时校验。

posted @ 2017-06-13 10:23 兰昌阅读(999) 评论(0) 编辑收藏举报

刷新页面返回顶部

兰昌