9.ClickHouse系列之数据一致性保证

对于ReplacingMergeTree引擎,我们之前讲过在后台合并的时间不定,合并前查询会存在数据不一致问题。

CREATE TABLE study.house (
    id String,
    city String,
    region String,
    name String,
    price Float32,
    publish_date DateTime
) ENGINE=ReplacingMergeTree(publish_date) PARTITION BY toYYYYMMDD(publish_date) PRIMARY KEY(id) ORDER BY (id, city, region, name)
SETTINGS index_granularity=8192
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 60000, toDateTime('2022-05-01'));
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 60000, toDateTime('2022-05-01'));
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 61000, toDateTime('2022-06-01'));
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 61000, toDateTime('2022-06-01'));

查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
1,上海,静安,场中小区,60000,2022-05-01 00:00:00
1,上海,静安,场中小区,60000,2022-05-01 00:00:00

为保证数据一致性,有以下解决方案

1. 手动执行分区合并

optimize table study.house final;

查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
1,上海,静安,场中小区,60000,2022-05-01 00:00:00

可以看到,虽然id, city, region, name为order by的唯一键,但是price价格不一致,仍然会作为两个值进行保留

2. 通过Group By去重

SELECT id, city, region, name, argMax(price, publish_date), max(publish_date) FROM study.house GROUP BY id, city, region, name;

查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00

3. 通过final查询

在查询语句后增加FINAL修饰符,这样查询过程中将会执行Merge的特殊逻辑。FINAL在新版本中,支持多线程执行,可以通过max_threads设置

EXPLAIN SELECT id, city, region, name, price, publish_date FROM study.house FINAL SETTINGS max_threads=2;

查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00

欢迎关注公众号算法小生沈健的技术博客

posted @ 2022-10-29 15:41  算法小生  阅读(93)  评论(0编辑  收藏  举报