Hive - 随笔分类 - tonglin0325

Hive学习笔记——metastore listener

摘要：除了使用hive hook来记录hive上用户的操作之外，还可以使用hive metastore listener来进行记录，参考： https://towardsdatascience.com/apache-hive-hooks-and-metastore-listeners-a-tale-of- 阅读全文

posted @ 2021-12-26 22:03 tonglin0325 阅读(877) 评论(0) 推荐(0) 编辑

Hive学习笔记——fetch

摘要：在美团点评的文章中，介绍了HiveSQL转化为MapReduce的过程 1、Antlr定义SQL的语法规则，完成SQL词法，语法解析，将SQL转化为抽象语法树AST Tree 2、遍历AST Tree，抽象出查询的基本组成单元QueryBlock 3、遍历QueryBlock，翻译为执行操作树Ope 阅读全文

posted @ 2021-01-21 00:19 tonglin0325 阅读(331) 评论(0) 推荐(0) 编辑

Hive学习笔记——SerDe

摘要：SerDe 是Serializer 和 Deserializer 的简称，它提供了Hive和各种数据格式交互的方式。 Amazon的Athena可以理解是Amazon对标hive的一款产品，其中对SerDe的介绍如下 https://docs.aws.amazon.com/zh_cn/athena/ 阅读全文

posted @ 2020-11-17 11:04 tonglin0325 阅读(593) 评论(0) 推荐(0) 编辑

Hive学习笔记——hive hook

摘要：Hive hook是hive的钩子函数，可以嵌入HQL执行的过程中运行，比如下面的这几种情况参考 https://www.slideshare.net/julingks/apache-hive-hooksminwookim130813 有了Hook，可以实现例如非法SQL拦截，SQL收集和审计等功阅读全文

posted @ 2020-03-21 22:00 tonglin0325 阅读(7945) 评论(0) 推荐(3) 编辑

Hive学习笔记——parser

摘要：Hive是如何解析SQL的呢,首先拿hive的建表语句来举例,比如下面的建表语句 create table test(id int,name string)row format delimited fields terminated by '\t'; 然后使用hive的show create tab 阅读全文

posted @ 2019-09-15 17:23 tonglin0325 阅读(2676) 评论(0) 推荐(1) 编辑

Hive学习笔记——metadata

摘要：Hive结构体系 https://blog.csdn.net/zhoudaxia/article/details/8855937 依赖 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artif 阅读全文

posted @ 2019-09-03 11:23 tonglin0325 阅读(4101) 评论(0) 推荐(0) 编辑

Superset配置hive数据源

摘要：1.在uri中配置 hive://localhost:10000/default 2.查询 3.如果你的hive集群是带有kerberos认证的,hive数据源需要这样配置 hive://xxx:xxx/default?auth=KERBEROS&kerberos_service_name=hive 阅读全文

posted @ 2019-07-16 15:20 tonglin0325 阅读(6394) 评论(4) 推荐(0) 编辑

Hive学习笔记——安装和内部表CRUD

摘要：1.首先需要安装Hadoop和Hive 安装的时候参考 http://blog.csdn.net/jdplus/article/details/46493553 安装的版本是apache-hive-2.1.1-bin.tar.gz,解压到/usr/local目录下然后在/etc/profile文件阅读全文

posted @ 2017-06-24 11:59 tonglin0325 阅读(475) 评论(0) 推荐(0) 编辑

Hive任务如何计算生成的map和reduce任务

摘要：在使用hive时候，需要关注hive任务所消耗的资源，否则可能会出现hive任务过于低效，或者把所查询的数据源拉胯的情况 1.查看当前hive所使用的引擎和配置使用set语句可以查看当前hive的配置 set; 查看hive当前使用的engine set hive.execution.engine 阅读全文

posted @ 2016-12-31 11:21 tonglin0325 阅读(321) 评论(0) 推荐(0) 编辑

Datagrip查询开启kerberos的hive

摘要：1.添加driver hive集群的版本是1.1.0-cdh5.16.2，而datagrip自带的hive driver版本是3.1.1和3.1.2，所以需要自行添加driver 参考：kerberos-2.datagrip（jdbc）连接hive kerberos add custome JARs 阅读全文

posted @ 2016-11-08 15:23 tonglin0325 阅读(1389) 评论(0) 推荐(0) 编辑

Hive学习笔记——beeline

摘要：使用beeline连接hive kinit -kt xxx.keytab xxx beeline -u "jdbc:hive2://10.65.13.98:10000/default;principal=hive/_HOST@CLOUDERA.SITE" 参考： https://docs.cloud 阅读全文

posted @ 2016-10-29 12:06 tonglin0325 阅读(248) 评论(0) 推荐(0) 编辑

Hive学习笔记——函数

摘要：1.cast函数数据类型转换函数比如date的值为参考：Hive中CAST（）函数用法 2.explode函数 explode() 函数接收一个 array 或 map 作为输入，然后将 array 或 map 里面的元素按照每行的形式输出。其可以配合 LATERAL VIEW 一起使用参考阅读全文

posted @ 2016-05-21 10:48 tonglin0325 阅读(841) 评论(0) 推荐(0) 编辑

Hive学习笔记——常用SQL

摘要：1.查询第二高的值输入：Salary表 + + + | Column Name | Type | + + + | id | int | | salary | int | + + + 使用limit+offset语法来限制结果数量，其中 limit N,1 等于 limit 1 offset N s 阅读全文

posted @ 2016-05-19 10:57 tonglin0325 阅读(806) 评论(0) 推荐(0) 编辑

Hive学习笔记——UDF开发

摘要：实现一个UDF函数可以继承 org.apache.hadoop.hive.ql.exec.UDF，也可以继承 org.apache.hadoop.hive.ql.udf.generic.GenericUDF 1.继承UDF，参考 https://docs.microsoft.com/en-us/az 阅读全文

posted @ 2016-03-26 23:40 tonglin0325 阅读(230) 评论(0) 推荐(0) 编辑

Hive学习笔记——在Hive中使用AvroSerde

摘要：Hive支持使用avro serde作为序列化的方式，参考： https://cwiki.apache.org/confluence/display/hive/avroserde https://www.docs4dev.com/docs/zh/apache-hive/3.1.1/reference 阅读全文

posted @ 2016-03-04 14:15 tonglin0325 阅读(248) 评论(0) 推荐(0) 编辑

数据治理基本概念

摘要：1.数据治理解决的问题 1.数据易用性（取数复杂度&速度，需要数据建模，不能都从原始表来查，需要数据仓库设计） 2.数据质量（日志定义口径，指标定义，数据波动报警，和钱相关的一般使用阻塞式） 3.研发成本（研发复杂度&周期，历史负担，数据地图） 4.数据的安全性（加密&脱敏&审计）阅读全文

posted @ 2016-02-05 14:59 tonglin0325 阅读(367) 评论(0) 推荐(0) 编辑

Hive学习笔记——常用语法

摘要：1.查看表的列表 show tables 2.创建表多个字段的时候需要指定用什么来分隔 create table test(id int,name string)row format delimited fields terminated by '\t'; create table test(id 阅读全文

posted @ 2015-07-23 21:29 tonglin0325 阅读(654) 评论(0) 推荐(0) 编辑

Hive学习笔记——安装hive客户端

摘要：hive client安装文档 https://cwiki.apache.org/confluence/display/Hive/GettingStarted hive 配置官方文档 https://cwiki.apache.org/confluence/display/hive/configura 阅读全文

posted @ 2015-07-12 16:29 tonglin0325 阅读(172) 评论(0) 推荐(0) 编辑

Hive学习笔记——执行计划

摘要：Hive学习笔记——执行计划阅读全文

posted @ 2015-07-11 13:55 tonglin0325 阅读(142) 评论(0) 推荐(0) 编辑

数据仓库建模的一些理论

摘要：1.数据分层数据明细层：DWD（Data Warehouse Detail）数据中间层：DWM（Data WareHouse Middle）数据服务层：DWS（Data WareHouse Servce）数据应用层：ADS（Application Data Service） 2.数仓建模方法阅读全文

posted @ 2015-06-19 19:48 tonglin0325 阅读(238) 评论(0) 推荐(0) 编辑

tonglin0325.github.io

随笔分类 - Hive

公告

搜索

常用链接

最新随笔

积分与排名

随笔分类 (614)

随笔档案 (587)

阅读排行榜

评论排行榜

推荐排行榜

最新评论