|NO.Z.00025|——————————|BigDataEnd|——|Hadoop&OLAP_ClickHouse.V06|——|ClickHouse.v06|ClickHouse:ClickHouse副本分片|ReplicatedMergeTree原理|
一、ReplicatedMergeTree原理
### --- 数据结构
[zk: localhost:2181(CONNECTED) 8] ls /clickhouse/tables/01/replicated_sales_5
[metadata, temp, mutations, log, leader_election, columns, blocks, nonincrement_block_numbers, replicas, quorum, block_numbers]
### --- 数据结构说明
~~~ # 元数据
~~~ metadata:元数信息: 主键、采样表达式、分区键
~~~ columns:列的字段的数据类型、字段名
~~~ replicats:副本的名称
~~~ # 标志:
~~~ leder_election:主副本的选举路径
~~~ blocks:hash值(复制数据重复插入)、partition_id
~~~ max_insert_block_size: 1048576行
~~~ block_numbers:在同一分区下block的顺序
~~~ quorum:副本的数据量
### --- 操作类:
~~~ # log:log-000000 常规操作
~~~ mutations: delete update
~~~ replicas:
~~~ # Entry:
~~~ LogEntry和MutationEntry
[zk: localhost:2181(CONNECTED) 14] get /clickhouse/tables/01/a1/log/log-0000000000
format version: 4
create_time: 2021-11-04 19:31:51
source replica: hadoop01
block_id: 202111_4775801442814045523_14663512626267065022
get
202111_0_0_0
~~~ get:指令(获取数据的指令)
~~~ 谁会获取这个指令? --- - hadoop02会获取,并执行
~~~ 2021_11_04:分区信息、告诉hdp-2 你要获取哪一个分区的数据
二、副本协同的核心流程
### --- INSERT:在hadoop01机器上创建一个副本实例:
~~~ # 在hadoop01的clickhouse创建a1表
[root@hadoop01 ~]# clickhouse-client -m
hadoop01 :) create table a1(
id String,
price Float64,
create_time DateTime
)ENGINE=ReplicatedMergeTree('/clickhouse/tables/01/a1','hadoop01')
PARTITION BY toYYYYMM(create_time)
ORDER BY id;
~~~输出参数
CREATE TABLE a1
(
`id` String,
`price` Float64,
`create_time` DateTime
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/a1', 'hadoop01')
PARTITION BY toYYYYMM(create_time)
ORDER BY id
Ok.
~~~ # 参数说明:
~~~ 根据zk_path初始化所有的zk节点
~~~ 在replicas节点下注册自己的副本实例hadoop01
~~~ 启动监听任务,监听/log日志节点
~~~ 参与副本选举,选出主副本。
~~~ 选举的方式是向leader_election/插入子节点,第一个插入成功的副本就是主副本
### --- 创建第二个副本实例:
~~~ # 在hadoop02的clickhouse创建a1表
[root@hadoop02 ~]# clickhouse-client -m
hadoop02 :) create table a1(
id String,
price Float64,
create_time DateTime
)ENGINE=ReplicatedMergeTree('/clickhouse/tables/01/a1','hadoop02')
PARTITION BY toYYYYMM(create_time)
ORDER BY id;
~~~输出参数:
CREATE TABLE a1
(
`id` String,
`price` Float64,
`create_time` DateTime
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/a1', 'hadoop02')
PARTITION BY toYYYYMM(create_time)
ORDER BY id
Ok.
~~~ # 参数说明:
~~~ 参与副本选举,hadoop01副本成为主副本。
### --- 向第一个副本实例插入数据:
~~~ # 在hadoop01下clickhouse.a1表中插入数据
hadoop01 :) insert into table a1 values('A001',100,'2021-11-02 08:00:00');
~~~输出参数
INSERT INTO a1 VALUES
Ok.
### --- 在hadoop01和hadoop02下查看插入的数据
~~~ # 在hadoop01查询a1的数据
hadoop01 :) select * from a1;
┌─id───┬─price─┬─────────create_time─┐
│ A001 │ 100 │ 2021-11-02 08:00:00 │
└──────┴───────┴─────────────────────┘
~~~ # 在hadoop02查询a1的数据
hadoop02 :) select * from a1;
┌─id───┬─price─┬─────────create_time─┐
│ A001 │ 100 │ 2021-11-02 08:00:00 │
└──────┴───────┴─────────────────────┘
### --- 插入命令执行后,在本地完成分区目录的写入,接着向block写入该分区的block_id
~~~ # 在zookeeper下查看数据:因为zookeeper三台是集群模式,数据都是可以查看到的
[zk: localhost:2181(CONNECTED) 0] ls /clickhouse/tables/01/a1/blocks
[202111_4775801442814045523_14663512626267065022]
三、配置参数说明
### --- inser_quorum参数
~~~ 如果设置了inser_quorum参数,
~~~ 且insert_quorum>=2,则hadoop01会进一步监控已完成写入操作的副本个数,
~~~ 直到写入副本个数>= insert_quorum的时候,整个写入操作才算完成。
~~~ 接下来,hadoop01副本发起向log日志推送操作日志[log-0000000000]
[zk: localhost:2181(CONNECTED) 1] ls /clickhouse/tables/01/a1/log
[log-0000000000]
### --- 操作日式的内容为:
[zk: localhost:2181(CONNECTED) 2] get /clickhouse/tables/01/a1/log/log-0000000000
format version: 4
create_time: 2021-11-04 19:56:29
source replica: hadoop01
block_id: 202111_4775801442814045523_14663512626267065022
get
202111_0_0_0
cZxid = 0x1100000036
ctime = Thu Nov 04 19:56:29 CST 2021
mZxid = 0x1100000036
mtime = Thu Nov 04 19:56:29 CST 2021
pZxid = 0x1100000036
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 151
numChildren = 0
### --- LogEntry:
~~~ source replica: 发送这条Log指令的副本来源,对应replica_name
~~~ ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/table_name', '{replica_name}')
~~~ get: 操作指令类型
~~~ get:从远程副本下载分区
~~~ merge:合并分区
~~~ mutate:MUTATION操作
~~~ block_id:当前分区的blockId,对应/blocks路径下的子节点名称
~~~ 202008_0_0_0: 当前分区目录的名称
~~~ 从日志内容可以看到,操作类型为get下载,需要下载的分区是202008_0_0_0,
~~~ 其余所有副本都会基于Log日志以相同的顺序执行。
### --- 接下来:第二个副本实例拉取Log日志:
~~~ hadoop会一直监听/log节点变化,
~~~ 当hdp-1推送了/log/log-0000000000之后,hdp-2便会触发日志的拉取任务,并更新log_pointer,
[zk: localhost:2181(CONNECTED) 4] get /clickhouse/tables/01/a1/replicas/hadoop02/log_pointer
1
cZxid = 0x1100000027
ctime = Thu Nov 04 19:56:13 CST 2021
mZxid = 0x1100000037
mtime = Thu Nov 04 19:56:29 CST 2021
pZxid = 0x1100000027
cversion = 0
dataVersion = 2
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 1
numChildren = 0
### --- 在拉取LogEntry之后,它并不会立即执行,而是将其转成任务对象放入队列
[zk: localhost:2181(CONNECTED) 5] ls /clickhouse/tables/01/a1/replicas/hadoop02/queue
[queue-0000000000]
[zk: localhost:2181(CONNECTED) 6] get /clickhouse/tables/01/a1/replicas/hadoop02/queue/queue-0000000000
format version: 4
create_time: 2021-11-04 19:56:29
source replica: hadoop01
block_id: 202111_4775801442814045523_14663512626267065022
get
202111_0_0_0
cZxid = 0x1100000037
ctime = Thu Nov 04 19:56:29 CST 2021
mZxid = 0x1100000037
mtime = Thu Nov 04 19:56:29 CST 2021
pZxid = 0x1100000037
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 151
numChildren = 0
### --- 第二个副本实例向其他副本发起下载请求。
~~~ 当看到type为get的时候,ReplicatedMergeTree会明白在远端的其它副本已经成功写入了数据分区,
~~~ 并根据log_pointer下标做大的下载数据。
~~~ hadoop01的DataPartsExchange端口服务就收到调用请求,在得知对方来意之后,
~~~ 根据参数做出相应,将本地的202111_0_0_0基于DataPartsExchange的服务相应发送给hadoop02
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?