代码改变世界

ClickHouse 的 Parts 和 Partitions

2024-10-16 11:07  abce  阅读(234)  评论(0编辑  收藏  举报

在 ClickHouse 中,磁盘上存储表的部分数据的物理文件称为 "part"。"partition "是使用分区键创建的表数据的逻辑划分。分区可用于提高大型表的性能和可管理性,允许用户更有效地存储、查询和操作数据子集。

Parts

如前所述,"Parts"是磁盘上的物理文件。默认情况下,所有数据相关文件都位于"/var/lib/clickhouse"目录下。ClickHouse 中的每个合并树表都有唯一的存储目录路径来存储 Parts。你可以从 "system.parts "表中获取parts的实际位置、parts名称、分区信息(如果有的话)以及其他一些有价值的信息。

以下是从 system.parts 表查询结果的示例。

SELECT
    substr(table, 1, 22),
    partition AS prt,
    name,
    part_type,
    path
FROM system.parts
WHERE database = 'mytest'
ORDER BY
    table ASC,
    partition ASC,
    name ASC

Query id: e2f55694-fa55-48f3-99f7-74bca59aca48

┌─substring(table, 1, 22)─┬─prt──────┬─name─────────────────┬─part_type─┬─path──────────────────────────────────────────────────────────────────────────────────────┐
│ bucket_daily_stats_v2   │ tuple()  │ all_1_1389_6_695     │ Wide      │ /var/lib/clickhouse/store/7b0/7b0e6f30-114d-472d-8ff4-f4c9a4cf7cc1/all_1_1389_6_695/      │
│ corr_pair_value         │ tuple()  │ all_208611_209409_4  │ Wide      │ /var/lib/clickhouse/store/5d7/5d732078-0f2e-4d04-9848-d216a1839b32/all_208611_209409_4/   │
│ corr_pair_value         │ tuple()  │ all_209410_209634_3  │ Wide      │ /var/lib/clickhouse/store/5d7/5d732078-0f2e-4d04-9848-d216a1839b32/all_209410_209634_3/   │
│ corr_pair_value         │ tuple()  │ all_209635_210331_4  │ Wide      │ /var/lib/clickhouse/store/5d7/5d732078-0f2e-4d04-9848-d216a1839b32/all_209635_210331_4/   │
│ mytest_of_ti            │ 202408   │ 202408_4241_5001_5   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_4241_5001_5/    │
│ mytest_of_ti            │ 202408   │ 202408_5002_5008_1   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_5002_5008_1/    │
│ mytest_of_ti            │ 202408   │ 202408_5009_5045_2   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_5009_5045_2/    │
│ mytest_of_ti            │ 202408   │ 202408_5046_5046_0   │ Compact   │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202408_5046_5046_0/    │
│ mytest_of_ti            │ 202409   │ 202409_5013_5624_5   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5013_5624_5/    │
│ mytest_of_ti            │ 202409   │ 202409_5625_5661_2   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5625_5661_2/    │
│ mytest_of_ti            │ 202409   │ 202409_5662_5699_2   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5662_5699_2/    │
│ mytest_of_ti            │ 202409   │ 202409_5700_5708_1   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5700_5708_1/    │
│ mytest_of_ti            │ 202409   │ 202409_5710_5793_4   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5710_5793_4/    │
│ mytest_of_ti            │ 202409   │ 202409_5794_5794_0   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5794_5794_0/    │
│ mytest_of_ti            │ 202409   │ 202409_5795_5795_0   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202409_5795_5795_0/    │
│ mytest_of_ti            │ 202410   │ 202410_5703_5810_7   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5703_5810_7/    │
│ mytest_of_ti            │ 202410   │ 202410_5811_5849_2   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5811_5849_2/    │
│ mytest_of_ti            │ 202410   │ 202410_5850_5882_2   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5850_5882_2/    │
│ mytest_of_ti            │ 202410   │ 202410_5883_5888_1   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5883_5888_1/    │
│ mytest_of_ti            │ 202410   │ 202410_5889_5894_1   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5889_5894_1/    │
│ mytest_of_ti            │ 202410   │ 202410_5895_5901_1   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5895_5901_1/    │
│ mytest_of_ti            │ 202410   │ 202410_5902_5902_0   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5902_5902_0/    │
│ mytest_of_ti            │ 202410   │ 202410_5903_5903_0   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5903_5903_0/    │
│ mytest_of_ti            │ 202410   │ 202410_5904_5904_0   │ Wide      │ /var/lib/clickhouse/store/32f/32ff94bb-43af-47c3-99b1-5c85e912483c/202410_5904_5904_0/    │
└─────────────────────────┴──────────┴──────────────────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────┘

在 "part_type" 列中,"Wide "表示每一列都存储在文件系统中的一个单独文件中,而 "Compact "表示所有列都存储在文件系统的一个文件中。此外,"partition "列中的 "tuple() "表示该表未分区。

 

也可以在目录“/var/lib/clickhouse/data/<DBNAME>/<TABLENAME>”中查看表的 parts,会发现该目录下存放都是符号链接,通过链接可以查看表包含的parts。比如,进入表 mytest_of_ti 所在目录查看:

# ls -la
......
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 10 15:10 202410_5811_5849_2
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 11 15:10 202410_5850_5882_2
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 11 15:10 202410_5883_5888_1
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 11 15:10 202410_5889_5894_1
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5895_5901_1
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5902_5902_0
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5903_5903_0
drwxr-x--- 2 clickhouse clickhouse 4096 Oct 13 15:10 202410_5904_5904_0
drwxr-x--- 2 clickhouse clickhouse   10 Nov 11  2023 detached
-rw-r----- 1 clickhouse clickhouse    1 Nov 11  2023 format_version.txt
-rw-r----- 1 clickhouse clickhouse  123 Dec 11  2023 mutation_252.txt
-rw-r----- 1 clickhouse clickhouse   96 Dec 11  2023 mutation_259.txt
-rw-r----- 1 clickhouse clickhouse  230 Dec 11  2023 mutation_264.txt

可以统计出表mytest_of_ti一共有个part:

# ls -la |grep 20 |wc -l
133

以列出的202410_5850_5882_2 该part为例,其中202410表示分区id;5850表示part中最小的块号;5882表示最大的块号;2表示块的级别(chunk level)。如果没有做分区,part 的名称以all_开头。

 

也可以从系统表 system.parts 中查看该part 的信息:

SELECT
    name,
    partition_id,
    min_block_number,
    max_block_number,
    level,
    data_version
FROM system.parts
WHERE (database = 'factor_data') AND (table = 'factor_value_of_trading') AND (name = '202410_5850_5882_2')

Query id: e6678b2e-31e2-4a9d-9c9b-16ff8bf9ecf0

┌─name───────────────┬─partition_id─┬─min_block_number─┬─max_block_number─┬─level─┬─data_version─┐
│ 202410_5850_5882_2 │ 202410       │             5850 │             5882 │     2 │         5850 │
└────────────────────┴──────────────┴──────────────────┴──────────────────┴───────┴──────────────┘

1 row in set. Elapsed: 0.006 sec. 

Partitions

与 parts 一样,也可以从 “system.parts ”表中访问合并树表的分区信息。不过,分区列现在不是用 “tuple() ”来表示了。要创建分区表,首先需要在创建表时使用 “PARTITION BY expr ”子句。例如,“PARTITION BY toYYYMMDD(start_time) ”子句创建了一个与 “start_time ”列相关的每日分区。在下面的示例中,可以看到Partitions名和parts名是不同的。分区类似于逻辑上的划分,但 parts是作为物理文件定位的。一个分区可以包含一个或多个 parts。

SELECT
    partition,
    name,
    active
FROM system.parts
WHERE (table = 'mytest_of_ti') AND (database = 'test')

Query id: 9c8c15bc-df95-4713-b001-39447010a1ab

┌─partition─┬─name─────────────────┬─active─┐
│ 201411    │ 201411_1_1_1_264     │      1 │
│ 201411    │ 201411_2_2_3_264     │      1 │
│ 201412    │ 201412_3_3_3_264     │      1 │
│ 201412    │ 201412_4_4_1_264     │      1 │
│ 201412    │ 201412_5_5_1_264     │      1 │
│ 202001    │ 202001_6_6_3_264     │      1 │
│ 202002    │ 202002_7_7_3_264     │      1 │
│ 202003    │ 202003_8_8_3_264     │      1 │
│ 202004    │ 202004_9_9_1_264     │      1 │
│ 202004    │ 202004_10_10_3_264   │      1 │
...
│ 202410    │ 202410_5934_5934_0   │      0 │
│ 202410    │ 202410_5935_5935_0   │      0 │
│ 202410    │ 202410_5936_5936_0   │      0 │
│ 202410    │ 202410_5937_5937_0   │      1 │
│ 202410    │ 202410_5938_5938_0   │      1 │
└───────────┴──────────────────────┴────────┘

167 rows in set. Elapsed: 0.005 sec. 

通常,分区是用来提升查询性能。便于用户灵活地管理数据子集。可以直接查询分区、删除分区等。

 

可以通过指定 where 子句或者使用隐藏列"_partition_id" 来查看指定的分区。当然最好使用官方推荐的方式,在where子句中添加分区条件比较好。 有些特殊情况下,需要使用隐藏列"_partition_id"。

 

现在,让我们看看分区表的查询示例。假设我们的表(test.mytest_of_ti)对 “created_at ”列进行了分区。我们可以使用分区键列和隐藏的“_partition_id ”列访问特定分区。此外,我们还可以使用“_partition_id ”查询前 10 个分区。

SELECT count()
FROM recoDB.mytest_of_ti
WHERE toDate(created_at) = '2023-01-21'

┌─count()─┐
│    9731 │
└─────────┘
#####################################

SELECT count()
FROM test.mytest_of_ti
WHERE _partition_id = '20230121'

┌─count()─┐
│   9731  │
└─────────┘
SELECT
    _partition_id,
    count()
FROM test.mytest_of_ti
GROUP BY _partition_id
ORDER BY 2 DESC
LIMIT 10

卸载或附加Parts/Partitions

detach 操作可以将指定的 parts/partitions 移动到 detached 目录。在重新附加之前,用户无法访问这些数据。缺省情况下,detached 目录位于"/var/lib/clickhouse/data/<DATABASE_NAME>/<TABLE_NAME>"中。

而 attach 操作可以将被卸载的parts/partitions 附加进来。语法如下所示:

#DETACH PART/PARTITION
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DETACH PARTITION|PART <PARTITION_EXPRESSION>

#ATTACH PART/PARTITION
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION|PART <PARTITION_EXPRESSION>

这里的 PARTITION_EXPRESSION 是分区名、分区id或表达式自身。

 

这里,分区表达式应该是分区名称、分区 id 或表达式本身。

 

假设我们要从 “test.mytest_of_ti ”表中卸载任何指定日期的分区。

 

首先,找到指定日期的分区和parts名称。如图所示,该日期位于分区 “20231013 ”中,该分区有两个part。

SELECT
    partition,
    name,
    partition_id
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti') AND (partition_id IN (
    SELECT DISTINCT _partition_id
    FROM test.mytest_of_ti
    WHERE toDate(created_at) = '2023-10-13'
))


┌─partition─┬─name─────────────┬─partition_id─┐
│ 20231013  │ 20231013_62_62_0 │ 20231013     │
│ 20231013  │ 20231013_78_78_0 │ 20231013     │
└───────────┴──────────────────┴──────────────┘

开始卸载分区"20231013",然后再附加进来。

-- Count related date before detached
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│   78536 │
└─────────┘

-- detach partition
ALTER TABLE test.mytest_of_ti DETACH PARTITION 20231013
Ok.

-- Count related date after detached
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│       0 │
└─────────┘

然后再重新附加进来:

--attach partition
ALTER TABLE test.mytest_of_ti ATTACH PARTITION 20231013

Ok.
--Count related date after attach partition
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│   78536 │
└─────────┘

 

同样对单个part做如上的操作。

--find parts for given date
SELECT
    partition,
    name,
    partition_id
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti') AND (partition_id IN (
    SELECT DISTINCT _partition_id
    FROM test.mytest_of_ti
    WHERE toDate(created_at) = '2023-10-13'
))
┌─partition─┬─name───────────────┬─partition_id─┐
│ 20231013  │ 20231013_142_142_0 │ 20231013     │
│ 20231013  │ 20231013_143_143_0 │ 20231013     │
└───────────┴────────────────────┴──────────────┘

-- Count before detach part operation
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│   78536 │
└─────────┘

-- detach any of the parts
ALTER TABLE test.mytest_of_ti DETACH PART '20231013_142_142_0'
Ok.

-- Count after detach part operation
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│   78406 │
└─────────┘

-- Attach related part
ALTER TABLE test.mytest_of_ti ATTACH PART '20231013_142_142_0'

-- Count after attach the part
SELECT count()
FROM test.mytest_of_ti
WHERE toDate(created_at) = '2023-10-13'
┌─count()─┐
│   78536 │
└─────────┘

 

也可以将源表的分区附加到目标表上,如下例所示,附加后数据不会从源表或目标表被删除:

ALTER TABLE <DESTINATION_TABLE> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION <PARTITION EXPRESSION> FROM <SOURCE_TABLE>

不过,该操作需要目标表和源表满足以下条件:

·结构相同

·分区键、排序键和主键相同

·存储策略相同

删除Parts/Partitions

删除的parts/Partitions 在系统表 system.parts 中被标记为不活跃的,且删除操作完成后,对应的parts/Partitions 会在 "/var/lib/clickhouse/data/<database_name>/<table_name>/" 中保留十分钟。

-- Drop from table itself
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP PARTITION|PART 'PART/PARTITON EXPRESSION'

-- Remove specified part/partition from detached folder
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP DETACHED PARTITION|PART 'PART/PARTITON EXPRESSION'

移动Parts/Partitions

可以将partitions移动到 Clickhouse 中的另一个表。在这种情况下,源表和目标表都具有相同的结构、分区键、按键排序、主键、存储策略和引擎系列。

对于合并树引擎表,移动的另一个选项是将parts或partitions移动到另一个磁盘或卷上。

-- Usage: ALTER TABLE <DATABASE_NAME.SOURCE_TABLE> [ON CLUSTER <CLUSTER_NAME>] MOVE PARTITION <PARTITION EXPRESSION> TO TABLE <DATABASE_NAME.DESTINATION_TABLE>

-- 1. Create empty mytest_of_ti_new table same as the mytest_of_ti

-- 2. Move 20231013 partition from mytest_of_ti to mytest_of_ti_new
ALTER TABLE test.mytest_of_ti
    MOVE PARTITION '20231013' TO TABLE test.mytest_of_ti_new
Ok.
--3. Take partition count for 20231013 in source table
SELECT count()
FROM test.mytest_of_ti
WHERE _partition_id = '20231013'
┌─count()─┐
│       0 │
└─────────┘

--4 Take partition count for 20231013 in destination table
SELECT count()
FROM test.mytest_of_ti_new
WHERE _partition_id = '20231013'

Query id: 49319c73-c84f-4fbf-838d-ab787971eaad

┌─count()─┐
│   78536 │
└─────────┘

 

移动 parts/partitions 到其它的磁盘或卷,需要设置一个存储策略,并使用创建的策略创建一个新表。这里不再讲存储策略。

-- Our table's(mytest_of_ti_with_storage_policy) policy is "vo1_to_vo2" 
-- and this volume contain 2 disks(vo1 and vo2)
SELECT *
FROM system.storage_policies
WHERE policy_name = (
    SELECT storage_policy
    FROM system.tables
    WHERE (database = 'test') AND (name = 'mytest_of_ti_with_storage_policy')

┌─policy_name─┬─volume_name─┬─volume_priority─┬─disks───┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐
│ vo1_to_vo2  │ vo1_volume  │               1 │ ['vo1'] │ JBOD        │                  0 │         0.1 │                   0 │
│ vo1_to_vo2  │ vo2_volume  │               2 │ ['vo2'] │ JBOD        │                  0 │         0.1 │                   0 │
└─────────────┴─────────────┴─────────────────┴─────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘

-- The 20231013 partition is stored under vo1 disk.
SELECT
    partition,
    name,
    path
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti_with_storage_policy') AND (partition = '20231013')


┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐
│ 20231013  │ 20231013_107_107_0 │ /vo1/store/346/3465ekb6-231d-493d-94bc-518234d79b45/20231013_107_107_0/ │
└───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘

-- Let's move it to the vo2 disk
ALTER TABLE test.mytest_of_ti_with_storage_policy
    MOVE PARTITION '20231013' TO DISK 'vo2'

-Check the new volume
SELECT
    partition,
    name,
    path
FROM system.parts
WHERE (database = 'test') AND (table = 'mytest_of_ti_with_storage_policy') AND (partition = '20231013')

┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐
│ 20231013  │ 20231013_107_107_0 │ /vo2/store/346/3465ekb6-231d-493d-94bc-518234d79b45/20231013_107_107_0/ │
└───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘

-- You can move parts and also you can move volume instead of disk