[Hive_10] Hive 的分析函数


0. 说明

  Hive 的分析函数 窗口函数  | 排名函数 | 最大值 | 分层次 | lead && lag 统计活跃用户 | cume_dist

 

 

 

 


1. 窗口函数(开窗函数) over

  1.1 说明

  1 preceding  //前一个
  1 following  //后一个
  current row  //当前行
  unbounded preceding  //无上限
  unbounded following  //无下限

  1.2 测试

# 以行定义窗口界限
select id, name, age , sum(age)over(order by id rows between current row and 2 following) from user_par;

# 以值定义窗口界限
select id, name, age , sum(age)over(order by age range between current row and 10 following) from user_par;

 


 

2. 排名函数

  2.1 并列跳跃

  113
  rank

select id, name, province, age , rank()over(partition by province order by age desc) from user_par;

 

  2.2 不跳跃

  112
  dense_rank

select id, name, province, age , dense_rank()over(partition by province order by age desc) from user_par;

 

  2.3 顺序

  123
  row_number

select id, name, province, age , row_number()over(partition by province order by age desc) from user_par;

 


3. 最大值

  first_value()

select id, name, province, age , first_value()over(partition by province order by age desc) from user_par;

 


 

4. 分层次

  按照三六九等进行平均分层

  ntile()

select id, name, age , ntile(3)over(order by age desc) from user_par;

 



5. lead && lag

  5.1 lead()

  将列向上提

select id, name, province, age , lead(age)over(partition by province order by age asc) from user_par;

 

  5.2 lag()

  将列向下沉

select id, name, province, age , lag(age)over(partition by province order by age asc) from user_par;

 

  5.3 统计连续活跃

  1. 准备数据

  

 

  2. 建表

create table active(id string, month int) 
row format delimited
fields terminated by '\t';

 

  3. 加载数据

load data local inpath '/home/centos/files/active.txt' into  table active;

 

  4. 统计连续两月活跃用户

select id from (select id, month, lead(month)over(partition by id order by month desc) as month2 from active)a where month=month2+1;

 


 

6. cume_dist()

  指定值占总数的百分比

 

  Demo

select id,name,age, cume_dist()over(order by age desc) from user_nopar;

   

 

 


 

posted @ 2019-01-21 14:02  山间一棵松  阅读(342)  评论(0编辑  收藏  举报