hive-同比环比
HIVE-同比环比
定义
详情:
(1)同比:本期与同期做对比。
(2)环比:本期与上期做对比。
同比:通常是指今年第n月与去年第n月比。同比发展速度主要是为了消除季节变动的影响,用以说明本期发展水平与去年同期发展水平对比而达到的相对发展速度。常用于分析数据的长期趋势。
环比:通常是指表示连续2个单位周期(比如连续两月)内的量的变化比。环比包括两种:环比增长速度和环比发展速度。常用于分析数据的短期趋势。
EG:
CREATE TABLE ods_saleorder (
order_id int ,
order_time date ,
order_num int
)ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
;
LOAD DATA LOCAL INPATH '/Users/liuwenqiang/workspace/hive/saleorder.txt' OVERWRITE INTO TABLE ods.ods_saleorder;
测试数据:
order_id |
order_time |
order_num |
1 |
2020-04-20 |
420 |
2 |
2020-04-04 |
800 |
3 |
2020-03-28 |
500 |
4 |
2020-03-13 |
100 |
5 |
2020-02-27 |
300 |
6 |
2020-01-07 |
450 |
7 |
2019-04-07 |
800 |
8 |
2019-03-15 |
1200 |
9 |
2019-02-17 |
200 |
10 |
2019-02-07 |
600 |
11 |
2019-01-13 |
300 |
销售量的月年占比:
select
a.m_num,a.cmonth,b.y_num,b.cyear,round( m_num / y_num, 2 ) AS ratio
from(
select
sum(order_num) as m_num,
DATE_FORMAT(order_time,'yyyy-MM') as cmonth
from
ods_saleorder
group by
DATE_FORMAT(order_time,'yyyy-MM')
) a
inner join
(
select
sum(order_num) as y_num,
DATE_FORMAT(order_time,'yyyy') as cyear
from
ods_saleorder
group by
DATE_FORMAT(order_time,'yyyy')
) b
on
substring(a.cmonth,1,4)=b.cyear
;
备注:
- date_format函数如果第一个参数是字符串,连接符只能是-,别的识别不了
窗口函数实现
SELECT
order_month,
num,
total,
round( num / total, 2 ) AS ratio
FROM
(
select
substr(order_time, 1, 7) as order_month,
sum(order_num) over (partition by substr(order_time, 1, 7)) as num,
sum(order_num) over (partition by substr( order_time, 1, 4 ) ) total,
row_number() over (partition by substr(order_time, 1, 7)) as rk
from ods_saleorder
) temp
where rk = 1;
结果:
同比环比
同比增长率计算公式
(当年值-上年值)/上年值x100%
环比增长率计算公式
(当月值-上月值)/上月值x100%
lead lag 的实现
---这里我们就用环比做个例子,同比类似
select
now_month,
now_num,
last_num,
round( (now_num-last_num) / last_num, 2 ) as ratio
FROM(
select
now_month,
now_num,
lag( t1.now_num, 1) over (order by t1.now_month ) as last_num
from
(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) t1
) t2;
结果:
我们看到有null 值,这里我们可以使用,lag的默认值做一次优化
select
now_month,
now_num,
last_num,
-- 分母是0的话返回值是null
nvl(round( (now_num-last_num) / last_num, 2 ),0)as ratio
FROM(
select
now_month,
now_num,
lag( t1.now_num, 1,0) over (order by t1.now_month ) as last_num
from
(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) t1
) t2;
结果:
其实到这里我们就处理完了,但是这样真的对吗,我们看到'2020-01' 的last_num 是800 也就是'2019-04',其实到这里我们就明白了,我们的数据是不连续的,所以我们这样计算是不行的,如果每个月都齐全,都有数据lag(num,12)就可以。那就只能做自关联了,这样的话我们可以对时间做精准的限制
自关联的实现
with a as (
select
now_month,
now_num,
substr(date(concat(now_month,'-','01')) - INTERVAL '1' month, 1, 7) as last_month
from(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) tmp
)
select
a1.now_month,a1.now_num,a1.last_month,a2.now_num,
nvl(round( (a1.now_num-a2.now_num) / a2.now_num, 2 ),0) as ratio
from
a a1
inner join
a a2
on
a1.last_month=a2.now_month
;
结果:
这里的时间计算INTERVAL 你也可以换成其他函数
with a as (
select
now_month,
now_num,
substr(add_months(concat(now_month,'-','01'),-1), 1, 7) as last_month
from(
select
substr(order_time, 1, 7) as now_month,
sum(order_num) as now_num
from ods_saleorder
group by
substr(order_time, 1, 7)
) tmp
)
select
a1.now_month,a1.now_num,a1.last_month,nvl(a2.now_num,0),
nvl(round( (a1.now_num-a2.now_num) / a2.now_num, 2 ),0) as ratio
from
a a1
left join
a a2
on
a1.last_month=a2.now_month
;
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了