hive

1.mapreduce

Map中setup运行在map之前

2. hive：
数据仓库
hive：解释器，编译器，优化器等
hive运行时，元数据存储在关系型数据库里面

？？？netstat -nplt | grep 3306 _________
mysql运行时，监听本地3306端口，
？？？mysql 的 grant 命令？

3. hive创建表：
create table t_emp(
id int,
name string,
//like array<string>,
//tedian map<string,string>,
age int,
dept_name string
)
row format delimited
fields terminated by ',' 元素按‘,’分割
//collection items terminated by '_' 集合数据按_隔开
//map keys terminated by ':' map类型使用 : ，比如，tedian数据格式可以是：sex:男_color:red，
stored as text;默认按文本保存hdfs，可写可不写
注意：1.如何创建表，可以参考官方文档，导入的数据需要与创建的表规定的格式相同
2.load data local inpath '/root/emp.txt' into table t_emp 将本地文件导入hive中t_emp表中，其实也是将该文件存到hdfs中，在hdfs创建路径/user/hive/warehours/t_emp/emp.txt
3.hive中的数据其实是存在在hdfs中，读取数据时，回将hivesql脚本转化成为mapreduce运行

4. 当hdfs中有些文件要使用hive查询时，hive不会拷贝数据，而是更改该数据文件的路径名，所以，如果自己程序中需要使用该文件时，需要提前设计好哪些文件需要使用hive，将数据文件直接导入到hive中去，程序查询时，需要使用hive的路径

5.dest table; 查看表结构

6. hive_dml 数据操作语言：
首先创建表：create table dept_count(num int) partitioned by (dname string)
insert： insert into table dept_count partition (dname='销售部') select count(1) from t_emp where dept_name='销售部' group by dept_name;
import／export(导入／导出)：export table t_emp to '/user/input/emp.txt' 将数据导入到指定路径中

7. hive server2服务器
执行HQL脚本有三种方式：
1. 通过hive-e ‘hql’
2. Hive -f ‘hql.file’
3. hive jdbc 代码执行脚本
8. 如何通过hive jdbc代码执行脚本
9. Hive有两种函数，UDF，UDAF
1. UDF，输入数据为一条数据，输出数据也是一条数据，
2. UDAF，输入数据为多条数据，如：count聚合函数，输出数据也为一条函数

posted @ 2018-01-04 11:47 颜子阅读(124) 评论(0) 编辑收藏举报

刷新页面返回顶部

颜子

hive

公告