Mysql和Clickhouse数据查询-按照时间分组统计并且对无无数据的日期补0

  最近在做数据查询需求的时候,遇到按照时间分组查询统计指标的需求,比如说查询模块的最近 15 天访问数据量,没有数据的日期补 0,以前对于这种类似的需求都是通过代码来补数据,想试试 sql 实现这种查询,因此查询了不少文章,对于类似实现方法的文章网上也有很多,差异也很多,因此这篇文章只作为一个参考,提供一个思路.

 

对于 mysql

  比如刚才这个需求是是针对 MySQL 的,查询模块的最近 15 天访问数据量,没有数据的日期补 0。

  对于这个 sql 我们主要思路是按照table 种日期字段进行分组,然后用一个日期序列对结果进行 join,没有数据的日期直接补 0:

select 
	DATE_FORMAT(DATE_SUB(now(),INTERVAL xc day),'%Y-%m-%d') as date_str
from
(
-- 获取0-14 的序列  @xi:=@xi+1  每次执行都加一
select 
	@xi := @xi +1 as xc
	from
	(select 1 union select 2 union select 3) xc1,
	(select 1 union select 2 union select 3 union select 4 union select 5) xc2,
	(select @xi := -1) xc0
)xcxc 

执行结果如下:

然后使用结果 left join 查询结果,null 就补0 即可。

 

对于 clickhouse

  clickhouse提供的函数很多,也提供了类似需求的方案。对于 clickhouse 我接触不是很多,或许还有很多更好的方案。

  需求按照分钟或者小时聚合数据:

 

-- 按照分钟统计
select 
toStartOfInterval(start_t, INTERVAL 1 MINUTE) AS data_str,
count() AS records
FROM xxxx.xxxxxx
WHERE xxxxx
GROUP BY data_str
ORDER BY date_str 
WITH fill 
FROM toDateTime('2024-02-22 23:00:00')
TO toDateTime('2024-02-22 23:59:59')
step 1*60


-- 按照小时统计
select 
toStartOfInterval(start_t, INTERVAL 1 hour) AS data_str,
count() AS records
FROM xxxx.xxxxxx
WHERE xxxxx
GROUP BY data_str
ORDER BY date_str 
WITH fill 
FROM toDateTime('2024-02-22 23:00:00')
TO toDateTime('2024-02-21 23:59:59')
step 1*3600

 这里不贴图展示查询结果了。

主要是通过 2 个语法实现。

 toStartOfInterval()和 with fill 。建议查看官方文档了解

https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#filling-grouped-by-sorting-prefix

 

注意事项:ck 我的需求是使用 java 连接查询,使用 druid 连接 ck 的时候,需要去掉 wall filter,因为 ck 的语法 有些不太符合标准 sql,druid 的 sqlparser 会报错拦截。

 

以下来自官方文档

toStartOfInterval(date_or_date_with_time, INTERVAL x unit [, time_zone])

This function generalizes other toStartOf*() functions. For example,

  • toStartOfInterval(t, INTERVAL 1 year) returns the same as toStartOfYear(t),
  • toStartOfInterval(t, INTERVAL 1 month) returns the same as toStartOfMonth(t),
  • toStartOfInterval(t, INTERVAL 1 day) returns the same as toStartOfDay(t),
  • toStartOfInterval(t, INTERVAL 15 minute) returns the same as toStartOfFifteenMinutes(t).

The calculation is performed relative to specific points in time:

Interval Start
year year 0
quarter 1900 Q1
month 1900 January
week 1970, 1st week (01-05)
day 1970-01-01
hour (*)
minute 1970-01-01 00:00:00
second 1970-01-01 00:00:00
millisecond 1970-01-01 00:00:00
microsecond 1970-01-01 00:00:00
nanosecond 1970-01-01 00:00:00

(*) hour intervals are special: the calculation is always performed relative to 00:00:00 (midnight) of the current day. As a result, only hour values between 1 and 23 are useful.

 

ORDER BY Expr WITH FILL Modifier

This modifier also can be combined with LIMIT … WITH TIES modifier.

WITH FILL modifier can be set after ORDER BY expr with optional FROM exprTO expr and STEP expr parameters. All missed values of expr column will be filled sequentially and other columns will be filled as defaults.

To fill multiple columns, add WITH FILL modifier with optional parameters after each field name in ORDER BY section.

ORDER BY expr [WITH FILL] [FROM const_expr] [TO const_expr] [STEP const_numeric_expr], ... exprN [WITH FILL] [FROM expr] [TO expr] [STEP numeric_expr]
[INTERPOLATE [(col [AS expr], ... colN [AS exprN])]]
 

WITH FILL can be applied for fields with Numeric (all kinds of float, decimal, int) or Date/DateTime types. When applied for String fields, missed values are filled with empty strings. When FROM const_expr not defined sequence of filling use minimal expr field value from ORDER BY. When TO const_expr not defined sequence of filling use maximum expr field value from ORDER BY. When STEP const_numeric_expr defined then const_numeric_expr interprets as is for numeric types, as days for Date type, as seconds for DateTime type. It also supports INTERVAL data type representing time and date intervals. When STEP const_numeric_expr omitted then sequence of filling use 1.0 for numeric type, 1 day for Date type and 1 second for DateTime type. INTERPOLATE can be applied to columns not participating in ORDER BY WITH FILL. Such columns are filled based on previous fields values by applying expr. If expr is not present will repeat previous value. Omitted list will result in including all allowed columns.

Example of a query without WITH FILL:

SELECT n, source FROM (
   SELECT toFloat32(number % 10) AS n, 'original' AS source
   FROM numbers(10) WHERE number % 3 = 1
) ORDER BY n;
 

Result:

┌─n─┬─source───┐
│ 1 │ original │
│ 4 │ original │
│ 7 │ original │
└───┴──────────┘

 

posted @   嘟嘟小宝贝  阅读(1031)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· winform 绘制太阳,地球,月球 运作规律
· 上周热点回顾(3.3-3.9)
点击右上角即可分享
微信分享提示