ClickHouse关于报错Metadata on replica is not up to date with common metadata in Zookeeper
ClickHouse关于报错Metadata on replica is not up to date with common metadata in Zookeeper
完整的报错信息:
Code: 517. DB::Exception: Received from localhost:9000. DB::Exception: Metadata on replica is not up to date with common metadata in Zookeeper. It means that this replica still not applied some of previous alters. Probably too many alters executing concurrently (highly not recommended). You can retry this error. (CANNOT_ASSIGN_ALTER)
之前研究了挺久,后来因为其他问题耽搁了10多天时间,由于研究尝试的东西太乱了甚至去啃了源码,当时也没整理记录所以现在......
首先明确的一点,当前我没发现该错误可以在ClickHouse上完全处理好,比如强制执行动作,比如跳过zk中data_version的校验完成SQL的执行,或者明确根因下清除动作、会话等等。也就是通过CK解决是无解的(除了你把表删除重建类似的操作)。
先创建测试表,创建跳数索引:
CK01 :) CREATE TABLE default.zkm_local ON CLUSTER ceb_cluster ( `NAME` String DEFAULT 'NULL' COMMENT '名字', `CREATE_TIME` DateTime DEFAULT '1971-01-01' COMMENT '数据建立时间' ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ceb_cluster-{shard}/default/zkm_local', '{replica}') PARTITION BY toYYYYMMDD(CREATE_TIME) PRIMARY KEY NAME ORDER BY (NAME, CREATE_TIME) SETTINGS index_granularity = 8192; CREATE TABLE default.zkm_local ON CLUSTER ceb_cluster ( `NAME` String DEFAULT 'NULL' COMMENT '名字', `CREATE_TIME` DateTime DEFAULT '1971-01-01' COMMENT '数据建立时间' ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ceb_cluster-{shard}/default/zkm_local', '{replica}') PARTITION BY toYYYYMMDD(CREATE_TIME) PRIMARY KEY NAME ORDER BY (NAME, CREATE_TIME) SETTINGS index_granularity = 8192 Query id: 91681180-490b-488b-b94e-ecb0752b8912 ┌─host──────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ dev-app78 │ 9000 │ 0 │ │ 3 │ 2 │ │ dev-app77 │ 9000 │ 0 │ │ 2 │ 2 │ └───────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘ ┌─host──────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ dev-app76 │ 9000 │ 0 │ │ 1 │ 0 │ │ dev-app79 │ 9000 │ 0 │ │ 0 │ 0 │ └───────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘ 4 rows in set. Elapsed: 0.175 sec. CK01 :) ALTER TABLE default.zkm_local on cluster ceb_cluster ADD INDEX skipIdxNAME NAME TYPE minmax GRANULARITY 3; ALTER TABLE default.zkm_local ON CLUSTER ceb_cluster ADD INDEX skipIdxNAME NAME TYPE minmax GRANULARITY 3 Query id: 402794ab-1aa5-4db1-9b82-67911a779996 ┌─host──────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ dev-app78 │ 9000 │ 0 │ │ 3 │ 0 │ │ dev-app76 │ 9000 │ 0 │ │ 2 │ 0 │ │ dev-app79 │ 9000 │ 0 │ │ 1 │ 0 │ │ dev-app77 │ 9000 │ 0 │ │ 0 │ 0 │ └───────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘ 4 rows in set. Elapsed: 0.118 sec.
相应的分布式表我就不创建了。
首先,需要知道,ReplicatedMergeTree会在ZooKeeper中为表创建监听节点,以zk_path为根路径。
比如现在我建表之后,会在zk多出来很多节点,截个图简单明了:
研究了很多种影响因素,比如zk_path(/clickhouse/tables/ceb_cluster-01/default/zkm_local)为根路径下的:
- metadata节点的dataVersion
- columns节点的dataVersion
- log节点下最新指令节点的值的一个小项:alter_version
和zk_path/replicas/{replia_name}/下:
- metadata节点的dataVersion
- columns节点的dataVersion
- metadata_version节点的值
- log_pointer节点的值
个中曲折难以言说啊...
最后发现实际上唯一有影响的其实就是(以我建的表为背景):/clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata这个节点的dataVersion这个属性值,不是节点值。
命令的话这么看:
[zk: localhost:2181(CONNECTED) 0] stat /clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata cZxid = 0x4e00000e3f ctime = Fri Dec 16 17:05:46 CST 2022 mZxid = 0x4e00000f15 mtime = Fri Dec 16 17:15:16 CST 2022 pZxid = 0x4e00000e3f cversion = 0 dataVersion = 2 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 248 numChildren = 0
网上(http://events.jianshu.io/p/5420fc6d3db2)说replicas的metadata节点的dataVersion要比上层的metadata节点的dataVersion大,实际上是错误的,并无任何关联。
先把错误构造出来,把/clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata的dataVersion值加1,只要改一次值就可以了,内容保持不变。
现在是1,
改为2,(工具prettyZoo非常方便,点一下就可以)
删除索引,被更改的副本对应的两个节点就开始报错了。
CK01 :) alter table zkm_local on cluster ceb_cluster drop index skipIdxNAME; ALTER TABLE zkm_local ON CLUSTER ceb_cluster DROP INDEX skipIdxNAME Query id: 28e31b45-89f2-4ecc-8b24-1f0bb6c3fddb ┌─host──────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ dev-app78 │ 9000 │ 0 │ │ 3 │ 2 │ │ dev-app79 │ 9000 │ 0 │ │ 2 │ 2 │ └───────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘ ┌─host──────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐ │ dev-app76 │ 9000 │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 1 │ 0 │ │ dev-app77 │ 9000 │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 0 │ 0 │ └───────────┴──────┴────────┴───────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘ ↑ Progress: 2.00 rows, 106.00 B (0.69 rows/s., 36.74 B/s.) 49% 4 rows in set. Elapsed: 2.886 sec. Received exception from server (version 22.3.2): Code: 341. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [dev-app76:9000]: Cannot execute replicated DDL query, maximum retries exceeded. (UNFINISHED) CK01 :) alter table zkm_local drop index skipIdxNAME; ALTER TABLE zkm_local DROP INDEX skipIdxNAME Query id: 76b7478b-80dc-463b-9c14-1ddd26db887e 0 rows in set. Elapsed: 0.009 sec. Received exception from server (version 22.3.2): Code: 517. DB::Exception: Received from localhost:9000. DB::Exception: Metadata on replica is not up to date with common metadata in Zookeeper. It means that this replica still not applied some of previous alters. Probably too many alters executing concurrently (highly not recommended). You can retry this error. (CANNOT_ASSIGN_ALTER)
现在瞎改其他内容值或者dataVersion,注意节点的路径:
/clickhouse/tables/ceb_cluster-01/default/zkm_local/replicas/01/metadata_version
/clickhouse/tables/ceb_cluster-01/default/zkm_local/replicas/01/metadata
现在,replicas的metadata节点的dataVersion(11)要比上层的metadata节点的dataVersion(2)大,还是报错:
CK01 :) alter table zkm_local drop index skipIdxNAME; ALTER TABLE zkm_local DROP INDEX skipIdxNAME Query id: afef92d0-a949-4212-a40b-f7ef91db5ed6 0 rows in set. Elapsed: 0.010 sec. Received exception from server (version 22.3.2): Code: 517. DB::Exception: Received from localhost:9000. DB::Exception: Metadata on replica is not up to date with common metadata in Zookeeper. It means that this replica still not applied some of previous alters. Probably too many alters executing concurrently (highly not recommended). You can retry this error. (CANNOT_ASSIGN_ALTER)
比大的不行,测试小也不行,甚至是保持相等都不行,测试过程就略了。
实际上,执行删除索引动作的时候,会获取/clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata的dataVersion的dataVersion的值(当前2),跟replicas的metadata的版本值做比较,要是不相等就报这个错。
这里需要特别注意的是,经过测试,在不重启的情况下,做对比的replicas的metadata的版本值并不是指:
/clickhouse/tables/ceb_cluster-01/default/zkm_local/replicas/01/metadata_version所记录值,就是上边被我改为999的那个值
也不是指:/clickhouse/tables/ceb_cluster-01/default/zkm_local/replicas/01/metadata的dataVersion属性值,也就是11
PS:如果重启了ck,那么就是或者/clickhouse/tables/ceb_cluster-01/default/zkm_local/replicas/01/metadata_version:999这个值。也就是实际上值被“缓存”了起来。
那么怎么获取这个缓存值,很遗憾我在ck的system的表里边没找到哪里可以查到。zk里边更没有。
目前我只有一个方法,那就是更改/clickhouse/tables/ceb_cluster-01/default/zkm_local/log/log-最后一条的alter_version,改为比较小的值(会警告改不成功),不确定就改为0:
我这里只有一个/clickhouse/tables/ceb_cluster-01/default/zkm_local/log/log-0000000000
之后开多一个会话,随时查看error日志,默认为:/var/log/clickhouse-server/clickhouse-server.err.log
tail -200f /var/log/clickhouse-server/clickhouse-server.err.log
然后更改/clickhouse/tables/ceb_cluster-01/default/zkm_local/replicas/01/log_pointer的值,将改为那就是更改/clickhouse/tables/ceb_cluster-01/default/zkm_local/log下最后一个节点的log下标值,我这里改为0
PS:log_pointer表示log日志指针节点,记录了最后一次执行的log日志下标信息,例如log_pointer:4对应了/log/log-0000000003(从0开始计数)。
改为0之后,日志会立刻出现信息:
2022.12.19 10:19:29.341706 [ 27095 ] {} <Warning> default.zkm_local (fed70640-0e23-4b92-8b39-dcf6c54a9304): Attempt to update metadata of version 1 to older version 0 when processing log entry queue-0000000001: format version: 4 create_time: 2022-12-19 10:15:44 source replica: 02 block_id: alter alter_version 0 have_mutation 0 columns_str_size: 164 columns format version: 1 2 columns: `NAME` String DEFAULT \'NULL\' COMMENT \'名字\' `CREATE_TIME` DateTime DEFAULT \'1971-01-01\' COMMENT \'数据建立时间\' metadata_str_size: 300 metadata format version: 1 date column: sampling expression: index granularity: 8192 mode: 0 sign column: primary key: NAME data format version: 1 partition key: toYYYYMMDD(CREATE_TIME) sorting key: NAME, CREATE_TIME indices: skipIdxNAME NAME TYPE minmax GRANULARITY 3 granularity bytes: 10485760
截取关键信息:
Attempt to update metadata of version 1 to older version 0 when processing log entry queue-0000000001
older version 0,这里的0就是/clickhouse/tables/ceb_cluster-01/default/zkm_local/log/log-0000000000的被我改了的alter_version。
而version 1就是被“缓存”的replicas的metadata的版本值。
出现这个表示大的(1)改小的(0)失败,我测试过了改大的可以成功不报错,下边有测试。
到这里终于获取到两个关键的信息了:
/clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata的dataVersion的dataVersion的值:2(以下用a表示)
被“缓存”的replicas的metadata的版本值:1(以下用b表示)
按照http://events.jianshu.io/p/5420fc6d3db2的理论,先试试b值比a大的情况。
重复上边操作,这次更改/clickhouse/tables/ceb_cluster-01/default/zkm_local/log/log-0000000000(最新的log)的alter_version为3:
然后改log_pointer的值为0,这次可以成功。
之后继续删除索引,还是不行:
CK01 :) alter table zkm_local drop index skipIdxNAME; ALTER TABLE zkm_local DROP INDEX skipIdxNAME Query id: accf434b-4dff-4c79-ab33-f7c48b6ac23b 0 rows in set. Elapsed: 0.015 sec. Received exception from server (version 22.3.2): Code: 517. DB::Exception: Received from localhost:9000. DB::Exception: Metadata on replica is not up to date with common metadata in Zookeeper. It means that this replica still not applied some of previous alters. Probably too many alters executing concurrently (highly not recommended). You can retry this error. (CANNOT_ASSIGN_ALTER)
那么只能保持一样才行了。现在:
/clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata的dataVersion的dataVersion的值:2(以下用a表示)
被“缓存”的replicas的metadata的版本值:3(以下用b表示)
把a的值加1:
继续删除索引,成功了:
CK01 :) alter table zkm_local drop index skipIdxNAME; ALTER TABLE zkm_local DROP INDEX skipIdxNAME Query id: 46673dd0-376e-4d18-9695-cd9333645af0 Ok. 0 rows in set. Elapsed: 0.038 sec.
同时,/clickhouse/tables/ceb_cluster-01/default/zkm_local/replicas/01/metadata_version的值会在3的基础上+1变为4:
和/clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata节点的属性值dataVersion:4一样了。
由于在zk里边,副本2和副本1共用一个zk_path(/clickhouse/tables/ceb_cluster-01/default/zkm_local),其中/clickhouse/tables/ceb_cluster-01/default/zkm_local/metadata没有了索引的信息,
导致虽然只在副本1对应的数据库删除索引,但是副本2也被删除了,至少从元数据看是没有了。
CK02 :) show create table zkm_local; SHOW CREATE TABLE zkm_local Query id: f1db0b6c-97ed-485c-bbb1-892c9fa512e5 ┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ CREATE TABLE default.zkm_local ( `NAME` String DEFAULT 'NULL' COMMENT '名字', `CREATE_TIME` DateTime DEFAULT '1971-01-01' COMMENT '数据建立时间' ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/ceb_cluster-{shard}/default/zkm_local', '{replica}') PARTITION BY toYYYYMMDD(CREATE_TIME) PRIMARY KEY NAME ORDER BY (NAME, CREATE_TIME) SETTINGS index_granularity = 8192 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 1 rows in set. Elapsed: 0.002 sec.
至于磁盘上的skp_idx_[Column].idx和skp_idx_[Column].mrk文件有没有被删除,待研究。
附上源码地址:https://github.dev/ClickHouse/ClickHouse
搜索关键字"Metadata on replica is not up to date with common metadata in Zookeeper"找到对应源代码。
至此。