Hive面试题——累计求和
需求:
有如下访客访问次数统计表 t_access_times
访客 |
月份 |
访问次数 |
A |
2015-01 |
5 |
A |
2015-01 |
15 |
B |
2015-01 |
5 |
A |
2015-01 |
8 |
B |
2015-01 |
25 |
A |
2015-01 |
5 |
A |
2015-02 |
4 |
A |
2015-02 |
6 |
B |
2015-02 |
10 |
B |
2015-02 |
5 |
…… |
…… |
…… |
需要输出报表:t_access_times_accumulate
访客 |
月份 |
月访问总计 |
累计访问总计 |
A |
2015-01 |
33 |
33 |
A |
2015-02 |
10 |
43 |
……. |
……. |
……. |
……. |
B |
2015-01 |
30 |
30 |
B |
2015-02 |
15 |
45 |
……. |
……. |
……. |
……. |
思路:
1、第一步,先求个用户的月总金额
select username,month,sum(salary) salary from t_access_times group by username,month;
+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
+-----------+----------+---------+--+
2、第二步,将月总金额表 自己连接自己
select A.*,B.* from (select username,month,sum(salary) salary from t_access_times group by username,month) A join (select username,month,sum(salary) salary from t_access_times group by username,month) B on A.username=B.username;
+-------------+----------+-----------+-------------+----------+-----------+--+
| A.username | A.month | A.salary | B.username | B.month | B.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
3、第三步,从上一步的结果中
进行分组查询,分组的字段是A.username,A.month
求月累计值: 将B.month <= A.month的所有B.salary求和即可
select A.username,A.month,max(A.salary) salary,sum(B.salary) accumulate from (select username,month,sum(salary) salary from t_access_times group by username,month) A join (select username,month,sum(salary) salary from t_access_times group by username,month) B on A.username=B.username where B.month <= A.month group by A.username,A.month order by A.username,A.month;