|NO.Z.00004|——————————|Deployment|——|Hadoop&OLAP数据库管理系统.v04|——|ClickHouse.v04|表引擎|MergeTree|MergeTree存储策略|HOT/COLD策略|

一、HOT/COLD策略

### --- 配置方式在config.xml配置文件中指定：

~~~     # 在集群主机上config.xml配置JBOD策略
[root@hadoop01 ~]# vim /etc/clickhouse-server/config.xml
~~~ 在config.xml文件最后一行：</yandex>参数之上添加JBOD参数

<storage_configuration>
    <disks>
        <disk_hot1>
            <path>/var/lib/clickhouse/chbase/hotdata1/</path>
        </disk_hot1>
        <disk_hot2>
            <path>/var/lib/clickhouse/chbase/hotdata2/</path>
        </disk_hot2>
            <disk_cold>
                <path>/var/lib/clickhouse/chbase/colddata/</path>
            </disk_cold>
    </disks>
        <policies>
        <default_jbod>
            <volumes>
                <jbod>
                    <disk>disk_hot1</disk>:
                    <disk>disk_hot2</disk>
                </jbod>
            </volumes>
            <move_factor>0.2</move_factor>
        </default_jbod>
            <moving_from_hot_to_cold>
                <volumes>
                    <hot>
                        <disk>disk_hot1</disk>
                        <max_data_part_size_bytes>1073741824</max_data_part_size_bytes>
                    </hot>
                    <cold>
                        <disk>disk_cold</disk>
                    </cold>
                </volumes>
                <move_factor>0.2</move_factor>
            </moving_from_hot_to_cold>
    </policies>
</storage_configuration>

### --- 发送到其它主机并重启服务

~~~     # 发送到其它所有主机
[root@hadoop01 ~]# rsync-script /etc/clickhouse-server/config.xml
~~~     # 所有节点重启服务
[root@hadoop01 ~]# systemctl restart clickhouse-server

### --- 判断策略是否生效

~~~     # system.storage_policies系统表可以看到刚才配置的策略也生效了。
hadoop01 :) select policy_name, volume_name, volume_priority, disks, formatReadableSize(max_data_part_size) max_data_part_size, move_factor from system.storage_policies;

SELECT 
    policy_name,
    volume_name,
    volume_priority,
    disks,
    formatReadableSize(max_data_part_size) AS max_data_part_size,
    move_factor
FROM system.storage_policies

┌─policy_name─────────────────┬─volume_name─┬─volume_priority─┬─disks─────────────────────┬─max_data_part_size─┬─move_factor─┐
│ default                     │ default     │               1 │ ['default']               │ 0.00 B             │           0 │
│ default_jbod                │ jbod        │               1 │ ['disk_hot1','disk_hot2'] │ 0.00 B             │         0.2 │
│ moving_from_hot_to_cold     │ hot         │               1 │ ['disk_hot1']             │ 1.00 GiB           │         0.2 │
│ moving_from_hot_to_cold     │ cold        │               2 │ ['disk_cold']             │ 0.00 B             │         0.2 │
│ moving_from_hot_to_cold_new │ hot         │               1 │ ['disk_hot2']             │ 1.00 MiB           │         0.2 │
│ moving_from_hot_to_cold_new │ cold        │               2 │ ['disk_cold']             │ 0.00 B             │         0.2 │
└─────────────────────────────┴─────────────┴─────────────────┴───────────────────────────┴────────────────────┴─────────────┘

二、HOT/COLD策略示例

### --- 创建表，测试moving_from_hot_to_cold存储策略：

hadoop01 :) create table hot_cold_table( id UInt64 )ENGINE=MergeTree() ORDER BY id SETTINGS storage_policy='moving_from_hot_to_cold_new';

CREATE TABLE hot_cold_table
(
    `id` UInt64
)
ENGINE = MergeTree()
ORDER BY id
SETTINGS storage_policy = 'moving_from_hot_to_cold_new'

Ok.

### --- 写入一批500K的数据，生成一个分区目录：

~~~     # 接着写入第一批500k的数据
hadoop01 :) insert into hot_cold_table select rand() from numbers(100000);

INSERT INTO hot_cold_table SELECT rand()
FROM numbers(100000)

Ok.

~~~     # 第一个分区写入hot2盘。

hadoop01 :) select name,disk_name from system.parts where table = 'hot_cold_table';

SELECT 
    name,
    disk_name
FROM system.parts
WHERE table = 'hot_cold_table'

┌─name──────┬─disk_name─┐
│ all_1_1_0 │ disk_hot2 │
└───────────┴───────────┘

### --- 写入第二批500k数据

~~~     # 接着写入第二批500k的数据
hadoop01 :) insert into hot_cold_table select rand() from numbers(100000);

INSERT INTO hot_cold_table SELECT rand()
FROM numbers(100000)

Ok.

hadoop01 :) select name,disk_name from system.parts where table = 'hot_cold_table';

SELECT 
    name,
    disk_name
FROM system.parts
WHERE table = 'hot_cold_table'

┌─name──────┬─disk_name─┐
│ all_1_1_0 │ disk_hot2 │
│ all_2_2_0 │ disk_hot2 │
└───────────┴───────────┘

三、合并数据盘操作

### --- 合并：每一个分区大小为500k；接下来触发合并：
~~~     若是数据大于1G，在15分钟之后就会触发合并，我们通过手动触发合并，
~~~     生成disk_cold数据盘，disk_hot2的这两个盘的数据就会在8分钟之后自动清除掉

~~~     # 合并触发
hadoop01 :)  OPTIMIZE TABLE hot_cold_table;

OPTIMIZE TABLE hot_cold_table

Ok.

hadoop01 :) select name,disk_name from system.parts where table = 'hot_cold_table';

SELECT 
    name,
    disk_name
FROM system.parts
WHERE table = 'hot_cold_table'

┌─name──────┬─disk_name─┐
│ all_1_1_0 │ disk_hot2 │
│ all_1_2_1 │ disk_cold │
│ all_2_2_0 │ disk_hot2 │
└───────────┴───────────┘

### --- 查询大小：
~~~     虽然MergeTree的存储策略不能修改，但是分区目录却支持移动。

hadoop01 :) select disk_name, formatReadableSize(bytes_on_disk) as size from system.parts where (table = 'hot_cold_table') and active = 1;

SELECT 
    disk_name,
    formatReadableSize(bytes_on_disk) AS size
FROM system.parts
WHERE (table = 'hot_cold_table') AND (active = 1)

┌─disk_name─┬─size─────┐
│ disk_cold │ 1.01 MiB │
└───────────┴──────────┘

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart

——W.S.Landor