hivesql笔记

一、常用聚合函数

count():计数 count(distinct 字段) 去重统计

sum():求合

avg():平均

max():最大值

min():最小值

二、hivesql执行顺序

from --> where --> group by --> having --> select--> order by--> limit

三、常用函数

1.毫秒时间戳转日期精确到秒

select from_unixtime(cast(1636462527000/1000 as int),'yyyy-MM-dd hh:mm:ss')

2.计算日期的间隔

select datediff("2021-10-01", "2021-10-30") 或 select datediff("2021-10-01", to_date("create_time"))

3.条件函数case when

select

case when age<20 then '20以下'

when age>=20 and age <=30 then '20～30岁'

else '40岁以上' end as age_type

from user_info

4.多重case when

select
source,
userid,
case when nvl(cast(source as bigint), 0) =1 then(
case when nvl(cast(userid as string), '') <>'' then'小程序来源真实用户'
else '小程序来源非真实用户' end)
when nvl(cast(source as bigint), 0) =2 then(
case when nvl(cast(userid as string), '')<>'' then'h5来源真实用户'
else 'h5来源非真实用户' end)
else "其他来源未知用户"
end as source_user
from ods.user_base
where dd='2021-10-28'

5.条件语句 if判断

>0定义为正规渠道否则是非正规渠道

select

if(cast(source_leve as bigint) >0 ,'正规渠道','非正规渠道') as source_qd,
count(1)
from
ods.xcx_base
where dd='2021-10-28'
group by if(cast(source_leve as bigint) >0 ,'正规渠道','非正规渠道')

6.字符串函数-字符串截取

select substr("2021-10-27",1,7) as month --从第一位街区到7位

substr("2021-10-27",2) as month --从第二位开始一值截取到最后一位

7.字符串拼接

concat('haode',userid,'okok') as new_str --拿表里字段左右拼接

8.map数据类型和字符串类型map数据获取

　　第一种 字符串类型的map数据{"phone":"1234", "name":"z张三"}

　　select get_json_object('{"phone":"1234","name":"za"}', '$.phone') as phone

　　第二种 map数据类型

　　select 字段["phone"] as phone

四、表连接

1.inner join 缩写join 查询交集的数据

例如 user表1、user表2 需要查出在user表1和user表2同时存在的userid

select * from user表1 a

join user表2 b

on a.userid=b.userid

join示例2: 客户表和粉丝表查询在2021-09-19这天既成为客户也成为了粉丝的用户id

select

a.userid

from

(select distinct userid from 客户表 where create_time='2021-09-19') a

join

(select distinct userid from 粉丝表表 where create_time='2021-09-19') b

on a.userid=b.userid

2.left join /right join 进行左链接/右链接没有匹配上显示null

3.full join 全链接互补left join 和left join过滤掉的数据

4.union all上下合并两张表注意：两张表的字段名一致、字段顺序一致、没有连接on条件、不去重不排序效率快

5.union：两张表的字段名一致、字段顺序一致、没有连接on条件表连接时去掉重复记录，按照字段顺序进行排序效率比较慢

五、 row_number()、DENSE_RANK()、RANK()窗口函数

row_number() 相同的值也会分第一第二

DENSE_RANK() 相同的值排名是一样的 10个一样的值那么就有十个第一

RANK() 相同的值如果占用了一个序号还么会跳过比如取top3 100 100 80 那么排序名次则是第一第一第三

例：根据日期分组和店铺id分组取出每个天没个店铺 top3的应收金额

select
*
from(
select
dd,
mid,
should_pay_amt,
row_number() over(partition by dd,mid order by should_pay_amt desc) as rn
from ods.pay_order_detail
where dd between "2021-10-28" and "2021-10-30"
) as a
where rn<=3

posted @ 2021-11-11 11:59 t_zcx 阅读(186) 评论(0) 编辑收藏举报

刷新页面返回顶部

t_zcx的博客

hivesql笔记

公告