Hive sql常用函数公式整理

一、日期时间函数

1.日期格式转化

date_format(active_date,'%Y-%m') ——2020-01
year(active_date)*100 + month(active_date) —— 202001
date_format(active_date,"yMM") ——202001
其他日期格式:
　　　　yyyymmdd，年月日；
　　　　yyyymm，年月；
　　　　mm，月
　　　　dd，日
　　　　yyyy-mm-dd
　　　　yyyy-mm
　　　　yyyymmddhh24miss，年月日时分秒
　　　　yyyy-mm-dd hh24:mi:ss
　　　　hh24miss
　　　　yyyymmddhh24missff3，年月日时分秒毫秒
年月日
to_date('2019-09-01 18:30:00') 返回2019-09-01

2.日期计算

2.1.间隔天数：
datediff('2019-09-01','2019-08-01') 返回31

2.2返回日期的时分秒

hour：返回日期中的小时
select hour('2020-04-20 14:20:03');
输出：14

minute：返回日期中的分钟
select minute('2020-04-20 14:20:03');
输出：20

second：返回日期中的秒
select second('2020-04-20 14:20:03');
输出：3

2.3日期在当前周数

weekofyear：返回日期在当前周数
select weekofyear('2020-04-20 14:20:03');
输出：17

2.4日期n小时后

8个小时后

——timestampadd函数：

timestampadd('hour', 8, event_time)

——addtime函数：

addtime(event_time','8:0:0');

2.5日期n小时前

subtime(event_time','8:0:0');

2.6.当月1号：
trunc(current_date,'MM') 返回本月1号
返回上月末：
last_day(add_months(current_date(),-1))

2.7.n天前：
date_sub(2019-09-01 18:30:00,31) 返回2019-08-01

date_sub('2016-08-01',1) 表示 2016-07-31

date_add('2016-08-01',-1) 输出：2016-07-31

2.8.n个月前：
add_months(current_date(),-1) 前移一个月

2.9.上月放款：
active_date>=trunc(add_months(current_date(),-1),"MM") and active_date<=last_day(add_months(current_date(),-1))

3.时间戳

from_unixtime：转化unix时间戳到当前时区的时间格式
select from_unixtime(1587361271,'yyyyMMdd') from ccs_acct
输出：20200420
时间戳转化：from_unixtime(time/1000,'yyyy-MM-dd HH:mm:ss')

unix_timestamp：获取当前unix时间戳
select unix_timestamp();
输出：1587361271
select unix_timestamp('2020-04-20 14:20:03') from ccs_acct
输出：1587363603
select unix_timestamp('2020-04-19 14:20:03') from ccs_acct
输出：1587277203

三、空值处理

COALESCE

select coalesce(a,b,c);

参数说明： 如果a 是null,则选择b；如果b 是null,则选择c；

a.NVL函数

nvl(x,y) Returns y if x is null else return x

NVL函数的格式如下：NVL(expr1,expr2)

含义是：如果第一个参数为空那么显示第二个参数的值，如果第一个参数的值不为空，则显示第一个参数本来的值。

b.NVL2函数

NVL2函数的格式如下：NVL2(expr1,expr2, expr3)

含义是：如果该函数的第一个参数为空那么显示第二个参数的值，如果第一个参数的值不为空，则显示第三个参数的值。

四、字符串函数

4.1拼接：
concat("a",m.loan_init_term) as `期数`
concat('2018','08') 返回201808
concat_ws(string SEP, string A, string B...)返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
hive> select concat_ws(',','ab','cd','e') from test1;
返回：ab,cd,e

4.2切片：
substr(string A, int start, int len)或者substring(string A, int start, int len)
hive> select substr(‘string’,3,3) from test;
rin
hive>select substring(‘string’,3,3) from test;
ing

4.3替换：
regexp_replace(string A, string B, string C)
返回值: string，说明：将字符串A中的符合Java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符
举例：hive> select regexp_replace('string','rin','ron') from ccs_acct
返回strong

4.4字符串截取：

substring_index(active_name, '-', 1)

示例：截取json存储里边active_name键对应的值

substring_index(substring_index(p_describe, 'active_name: ', -1), ', ', 1) active_name, -- 方法一

regexp_extract(p_describe,"active_name:(.{0,10}),",1) as active_name1, -- 方法二

regexp_extract(p_describe,"(active_name\\:)(.*?)(,)",2) as active_name2 -- 方法三

4.5长链接参数提取

--长字符串
示例1：utm_param：'utm_source=tf_weixinmp_01&utm_medium=mp_pengyouquan&utm_campaign=yike16'
函数写法：str_to_map(utm_param,'&','=')['utm_campaign'] as utm_campaign
返回结果 yike16

--url
示例2:p__url：https://fenqi.**.com/goods/ra**ard?entry=popup&_pid=paaa
函数写法1：str_to_map(substring_index(p__url, '?', -1), '&', '=') ['entry'] as entryflag1,
函数写法2：substring_index(substring_index(p__url, 'entry=', -1), '&', 1) as entryflag2
返回结果：popup

五、开窗函数

row_number() over(partition by acct_nbr order by create_time desc) as rn
where rn=1

六、json解析

1.get_json_object

字典型数据：

get_json_object(p_output_params, '$.data.fetchStatus') as p_output_params1

get_json_object(get_json_object(p_output_params, '$.data'), '$.isAuditPass' ) as p_output_params-- 解析两层

数组：get_json_object(output_params, '$.data[0].addAmountType')

2.json_tuple

函数的作用：用来解析json字符串中的多个字段，不使用'$'

select ylzc from rulengine_decision_rawdata_biz_28_adx as a lateral view json_tuple(kg3rd_data, 'yinlianzhice_xiaodai.js') b as ylzc

where partition_date = "2021-05-23"

limit 1

七、其他

1.等额本息（excel）

每月还款额=-PMT(月利率,月数,放款总额,0)

2.等额本金

每月还款额=月均本金+（放款金额-月均本金*i）*月利率

3.余数
mod（n1,n2)
返回n1 除以n2 的余数。返回值的正负和n1相关

4.余数绝对值
pmod（n1,n2)
返回n1 除以n2 的余数绝对值

5.通配符

where city like '[ALN]%'----筛选city 以‘A’,‘L’,‘N’开头的记录 where city like 'ne%'-----筛选city以'ne'开头的记录
where city like '%ne%'-----筛选city包含'ne'的记录
where city like '_enan'-----筛选city第一个字符之后是'enan'的记录
where city rlike 'an'----筛选city包含'an'的数据
where city rlike 'A|B'----筛选city包含'A'或者'B'的数据

6.连续登录

LAG 和 LEAD 函数(用于连续登录天数等场景)

1.LAG(col,n,DEFAULT)

用于统计窗口内往上第n行值

例子：LAG(time,1) OVER(parition by id order by time)

LEAD 函数则与 LAG 相反：

2.LEAD(col,n,DEFAULT)

用于统计窗口内往下第n行值

第一个参数为列名，第二个参数为往下第n行（默认为1），第三个参数为默认值（当往下第n行为NULL时候，取默认值，如不指定，则为NULL）

7.累计求和

累计求和：

　　sum(num) over (order by event_date ) as `num_累计`

总和：

　　sum(num) over ( ) as num_total

event_date	num	num_累计	num_total
2021/11/1	10	10	47
2021/11/2	15	25	47
2021/11/3	2	27	47
2021/11/4	20	47	47

数值取整

【四舍五入取整截取】
select round(54.56,0)
round至少保留一位小数。——55.0

【向下取整截取】
SELECT FLOOR(54.56) ——54

【向上取整截取】
SELECT CEILING(13.15) ——14

【舍弃小数取整】
select cast(123.5678 as int) ——123

补充：

查询今天

SELECT * FROM tablename where DATEDIFF(day,inputdate,GETDATE())=0

查询昨天

SELECT * FROM tablename where DATEDIFF(day,inputdate,GETDATE())=1

查询本周

SELECT * FROM tablename where datediff(week,inputdate,getdate())=0

查询上周

SELECT * FROM tablename where datediff(week,inputdate,getdate())=1

查询本月

SELECT * FROM tablename where DATEDIFF(month,inputdate,GETDATE())=0

查询上月

SELECT * FROM tablename where DATEDIFF(month,inputdate,GETDATE())=1

posted @ 2020-04-20 14:33 lvzw 阅读(3489) 评论(0) 编辑收藏举报

刷新页面返回顶部

lvzw