莫名原因启动失败-元数据不一致-清理元数据-重建表-同步数据
背景
2024年11月27日,不知道咋回事,6节点的clickhouse集群(6节点,3分片)中的第5个节点启动不起来了。然后我就经历了一场查原因→删除clickhouse元数据→清理zookeeper数据→clickhouse重建表→同步副本数据的过程;
查询启动不起来的原因
[root@clickhouse5 ~]# tail -f /var/log/clickhouse-server/clickhouse-server.err.log
2024.11.27 19:57:06.536396 [ 2312 ] {} <Error> Application: DB::Exception: Existing table metadata in ZooKeeper differs in TTL. Stored in ZooKeeper: acquisition_time + toIntervalDay(30) TO VOLUME 'test', local: : Cannot attach table `default`.`monitordata` from metadata file /var/lib/clickhouse/metadata/default/monitordata.sql from query ATTACH TABLE default.monitordata (`id` UInt64, `device_code` Nullable(String), `device_type` Nullable(String), `create_time` DateTime DEFAULT now(), `acquisition_time` DateTime, `phase` Nullable(String), `is_alarm` Nullable(String), `monitor_data` Nullable(String), `linked_device` Nullable(String), `asset_code` Nullable(String), `device_no` Nullable(String), `sink_node` Nullable(String), `access_node` Nullable(String), `manufacturer_code` Nullable(String), `frame_sn` Nullable(String), `source` Nullable(String)) ENGINE = ReplicatedMergeTree('/clickhouse/tables/03/monitordata/', '192.168.25.15') PARTITION BY toYYYYMM(acquisition_time) ORDER BY acquisition_time SETTINGS index_granularity = 8192: while loading database `default` from path /var/lib/clickhouse/metadata/default
2024.11.27 19:57:18.690060 [ 2379 ] {} <Error> Application: Caught exception while loading metadata: Code: 342, e.displayText() = DB::Exception: Existing table metadata in ZooKeeper differs in TTL. Stored in ZooKeeper: acquisition_time + toIntervalDay(30) TO VOLUME 'test', local: : Cannot attach table `default`.`monitordata` from metadata file /var/lib/clickhouse/metadata/default/monitordata.sql from query ATTACH TABLE default.monitordata (`id` UInt64, `device_code` Nullable(String), `device_type` Nullable(String), `create_time` DateTime DEFAULT now(), `acquisition_time` DateTime, `phase` Nullable(String), `is_alarm` Nullable(String), `monitor_data` Nullable(String), `linked_device` Nullable(String), `asset_code` Nullable(String), `device_no` Nullable(String), `sink_node` Nullable(String), `access_node` Nullable(String), `manufacturer_code` Nullable(String), `frame_sn` Nullable(String), `source` Nullable(String)) ENGINE = ReplicatedMergeTree('/clickhouse/tables/03/monitordata/', '192.168.25.15') PARTITION BY toYYYYMM(acquisition_time) ORDER BY acquisition_time SETTINGS index_granularity = 8192: while loading database `default` from path /var/lib/clickhouse/metadata/default, Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x18e1b360 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xe736dad in /usr/bin/clickhouse
2. ? @ 0x1641f55c in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::checkTableStructure(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&) @ 0x1605a2ae in /usr/bin/clickhouse
4. DB::StorageReplicatedMergeTree::StorageReplicatedMergeTree(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, DB::StorageID const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::StorageInMemoryMetadata const&, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::MergeTreeData::MergingParams const&, std::__1::unique_ptr<DB::MergeTreeSettings, std::__1::default_delete<DB::MergeTreeSettings> >, bool) @ 0x16081135 in /usr/bin/clickhouse
5. ? @ 0x1642a954 in /usr/bin/clickhouse
6. std::__1::__function::__func<std::__1::shared_ptr<DB::IStorage> (*)(DB::StorageFactory::Arguments const&), std::__1::allocator<std::__1::shared_ptr<DB::IStorage> (*)(DB::StorageFactory::Arguments const&)>, std::__1::shared_ptr<DB::IStorage> (DB::StorageFactory::Arguments const&)>::operator()(DB::StorageFactory::Arguments const&) @ 0x1642ecb7 in /usr/bin/clickhouse
7. DB::StorageFactory::get(DB::ASTCreateQuery const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, DB::Context&, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const @ 0x15faad11 in /usr/bin/clickhouse
8. DB::createTableFromAST(DB::ASTCreateQuery, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool) @ 0x158e78f9 in /usr/bin/clickhouse
9. ? @ 0x158db401 in /usr/bin/clickhouse
10. ? @ 0x158dbda1 in /usr/bin/clickhouse
11. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xe7650a7 in /usr/bin/clickhouse
12. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() const @ 0xe76581a in /usr/bin/clickhouse
13. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xe7645b7 in /usr/bin/clickhouse
14. ? @ 0xe762b03 in /usr/bin/clickhouse
15. start_thread @ 0x7e65 in /usr/lib64/libpthread-2.17.so
16. clone @ 0xfe88d in /usr/lib64/libc-2.17.so
(version 20.9.3.45 (official build))
重点在这儿:
DB::Exception: Existing table metadata in ZooKeeper differs in TTL. Stored in ZooKeeper: acquisition_time + toIntervalDay(30) TO VOLUME 'test', local:
意味着在 ZooKeeper 中存储的表元数据的 TTL(存活时间)与本地的元数据文件中的定义不匹配,我们可以清理zookeeper的元数据,或者删除clickhouse的元数据,就只有这个节点起不来,那么zookeeper的数据是没问题的。
我查询了下这个节点的配置文件,发现是不知道谁改了配置文件,跟其他的节点的配置文件不一样,但是我修改之后还是不行。就删除clickhouse的元数据吧:
sudo systemctl stop clickhouse-server
sudo rm /var/lib/clickhouse/metadata/default/monitordata.sql
sudo systemctl start clickhouse-server
然后就启动起来了,然后就发现,缺了本地表monitordata。
重建monitordata表
我本以为直接执行建表语句就行了,结果报错:
CREATE TABLE default.monitordata
(
`id` UInt64,
`device_code` Nullable(String),
`device_type` Nullable(String),
`create_time` DateTime DEFAULT now(),
`acquisition_time` DateTime,
`phase` Nullable(String),
`is_alarm` Nullable(String),
`monitor_data` Nullable(String),
`linked_device` Nullable(String),
`asset_code` Nullable(String),
`device_no` Nullable(String),
`sink_node` Nullable(String),
`access_node` Nullable(String),
`manufacturer_code` Nullable(String),
`frame_sn` Nullable(String),
`source` Nullable(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/03/monitordata/', '192.168.25.15')
PARTITION BY toYYYYMM(acquisition_time)
ORDER BY acquisition_time
TTL acquisition_time + toIntervalDay(30) TO VOLUME 'test'
SETTINGS index_granularity = 8192,
storage_policy = 'ttl';
SQL 错误 [57] [07000]: Code: 57, e.displayText() = DB::Exception: Directory for table data data/default/monitordata/ already exists (version 20.9.3.45 (official build)), server ClickHouseNode [uri=http://192.168.25.15:8123/default, options={use_server_time_zone=false,use_time_zone=false}]@2035251460
它提示我这个在zookeeper中已存在了,看起来还要去删除一下zookeeper中的数据,但是注意的是,我只能删除这个节点的zookeeper中的数据。
删除zookeeper的本节点的数据
进入交互式工具:
[root@clickhouse5 monitordata]# cd /zookeeper/bin/
[root@clickhouse5 bin]# ./zkCli.sh -server 192.168.25.15:2181
[zk: 192.168.25.15:2181(CONNECTED) 5] ls /clickhouse/tables/03/monitordata
[block_numbers, blocks, columns, leader_election, log, metadata, mutations, nonincrement_block_numbers, quorum, replicas, temp]
[zk: 192.168.25.15:2181(CONNECTED) 6] ls /clickhouse/tables/03/monitordata/replicas
[192.168.25.15, 192.168.25.16]
注意这个路径下有两个节点,192.168.25.15是本节点,我们只需要删除这个节点下的数据就行了,但是麻烦的点在于,这个交互式工具提供的delete不支持递归删除,无奈我只能一个个删除,这个节点下面还是有许多子节点的:
[zk: 192.168.25.15:2181(CONNECTED) 22] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15
[columns, flags, host, is_lost, log_pointer, max_processed_insert_time, metadata, metadata_version, min_unprocessed_insert_time, mutation_pointer, parts, queue]
[zk: 192.168.25.15:2181(CONNECTED) 28] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/columns
[]
[zk: 192.168.25.15:2181(CONNECTED) 29] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/columns
[zk: 192.168.25.15:2181(CONNECTED) 30] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/flags
[]
[zk: 192.168.25.15:2181(CONNECTED) 31] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/flags
[zk: 192.168.25.15:2181(CONNECTED) 32] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/host
[]
[zk: 192.168.25.15:2181(CONNECTED) 33] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/host
[zk: 192.168.25.15:2181(CONNECTED) 34] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/is_lost
[]
[zk: 192.168.25.15:2181(CONNECTED) 35] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/is_lost
[zk: 192.168.25.15:2181(CONNECTED) 36] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/log_pointer
[]
[zk: 192.168.25.15:2181(CONNECTED) 37] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/log_pointer
[zk: 192.168.25.15:2181(CONNECTED) 38] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15
[max_processed_insert_time, metadata, metadata_version, min_unprocessed_insert_time, mutation_pointer, parts, queue]
[zk: 192.168.25.15:2181(CONNECTED) 39] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/m
max_processed_insert_time metadata metadata_version min_unprocessed_insert_time mutation_pointer
[zk: 192.168.25.15:2181(CONNECTED) 39] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/max_processed_insert_time
[]
[zk: 192.168.25.15:2181(CONNECTED) 40] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/max_processed_insert_time
[zk: 192.168.25.15:2181(CONNECTED) 41] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/metadata
metadata metadata_version
[zk: 192.168.25.15:2181(CONNECTED) 41] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/metadata
[]
[zk: 192.168.25.15:2181(CONNECTED) 42] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/metadata
[zk: 192.168.25.15:2181(CONNECTED) 43] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/metadata_version
[zk: 192.168.25.15:2181(CONNECTED) 44] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15
[min_unprocessed_insert_time, mutation_pointer, parts, queue]
[zk: 192.168.25.15:2181(CONNECTED) 45] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/min_unprocessed_insert_time
[zk: 192.168.25.15:2181(CONNECTED) 46] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/mutation_pointer
[zk: 192.168.25.15:2181(CONNECTED) 47] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts
Node not empty: /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts
[zk: 192.168.25.15:2181(CONNECTED) 48] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts
[202405_0_461_3_1952, 202405_1279_1591_4, 202405_1592_1891_4, 202405_1892_1909_2, 202405_1892_1942_3_1952, 202405_1910_1926_2, 202405_1927_1942_2, 202405_462_1891_5_1952, 202405_462_608_3, 202405_609_948_4, 202405_949_1278_4]
[zk: 192.168.25.15:2181(CONNECTED) 49] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_
202405_0_461_3_1952 202405_1279_1591_4 202405_1592_1891_4 202405_1892_1909_2 202405_1892_1942_3_1952 202405_1910_1926_2 202405_1927_1942_2 202405_462_1891_5_1952 202405_462_608_3 202405_609_948_4
202405_949_1278_4
[zk: 192.168.25.15:2181(CONNECTED) 49] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_0_461_3_1952
[]
[zk: 192.168.25.15:2181(CONNECTED) 50] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_0_461_3_1952
[zk: 192.168.25.15:2181(CONNECTED) 51] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_
202405_1279_1591_4 202405_1592_1891_4 202405_1892_1909_2 202405_1892_1942_3_1952 202405_1910_1926_2 202405_1927_1942_2 202405_462_1891_5_1952 202405_462_608_3 202405_609_948_4 202405_949_1278_4
[zk: 192.168.25.15:2181(CONNECTED) 51] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1279_1591_4
[zk: 192.168.25.15:2181(CONNECTED) 52] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_
202405_1592_1891_4 202405_1892_1909_2 202405_1892_1942_3_1952 202405_1910_1926_2 202405_1927_1942_2 202405_462_1891_5_1952 202405_462_608_3 202405_609_948_4 202405_949_1278_4
[zk: 192.168.25.15:2181(CONNECTED) 52] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1592_1891_4
[zk: 192.168.25.15:2181(CONNECTED) 53] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1
202405_1892_1909_2 202405_1892_1942_3_1952 202405_1910_1926_2 202405_1927_1942_2
[zk: 192.168.25.15:2181(CONNECTED) 53] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1892_19
202405_1892_1909_2 202405_1892_1942_3_1952
[zk: 192.168.25.15:2181(CONNECTED) 53] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1892_19
Node does not exist: /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1892_19
[zk: 192.168.25.15:2181(CONNECTED) 54] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1892_19
202405_1892_1909_2 202405_1892_1942_3_1952
[zk: 192.168.25.15:2181(CONNECTED) 54] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1892_1909_2
[zk: 192.168.25.15:2181(CONNECTED) 55] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1892_1909_2
addWatch addauth close config connect create delete deleteall delquota get getAcl getAllChildrenNumber
getEphemerals history listquota ls printwatches quit reconfig redo removewatches set setAcl setquota
stat sync version
[zk: 192.168.25.15:2181(CONNECTED) 55] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1892_1942_3_1952
[zk: 192.168.25.15:2181(CONNECTED) 56] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_
202405_1910_1926_2 202405_1927_1942_2 202405_462_1891_5_1952 202405_462_608_3 202405_609_948_4 202405_949_1278_4
[zk: 192.168.25.15:2181(CONNECTED) 56] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_19
202405_1910_1926_2 202405_1927_1942_2
[zk: 192.168.25.15:2181(CONNECTED) 56] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1910_1926_2
[zk: 192.168.25.15:2181(CONNECTED) 57] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_1927_1942_2
[zk: 192.168.25.15:2181(CONNECTED) 58] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_
202405_462_1891_5_1952 202405_462_608_3 202405_609_948_4 202405_949_1278_4
[zk: 192.168.25.15:2181(CONNECTED) 58] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_462_
202405_462_1891_5_1952 202405_462_608_3
[zk: 192.168.25.15:2181(CONNECTED) 58] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_462_1891_5_1952
[zk: 192.168.25.15:2181(CONNECTED) 59] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_462_608_3
addWatch addauth close config connect create delete deleteall delquota get getAcl getAllChildrenNumber
getEphemerals history listquota ls printwatches quit reconfig redo removewatches set setAcl setquota
stat sync version
[zk: 192.168.25.15:2181(CONNECTED) 59] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_462_608_3
[zk: 192.168.25.15:2181(CONNECTED) 60] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_
202405_609_948_4 202405_949_1278_4
[zk: 192.168.25.15:2181(CONNECTED) 60] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_609_948_4
[zk: 192.168.25.15:2181(CONNECTED) 61] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts/202405_949_1278_4
[zk: 192.168.25.15:2181(CONNECTED) 62] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/parts
[zk: 192.168.25.15:2181(CONNECTED) 63] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/queue
addWatch addauth close config connect create delete deleteall delquota get getAcl getAllChildrenNumber
getEphemerals history listquota ls printwatches quit reconfig redo removewatches set setAcl setquota
stat sync version
[zk: 192.168.25.15:2181(CONNECTED) 63] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15/queue
[zk: 192.168.25.15:2181(CONNECTED) 64] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15/
Path must not end with / character
[zk: 192.168.25.15:2181(CONNECTED) 65] ls /clickhouse/tables/03/monitordata/replicas/192.168.25.15
[]
[zk: 192.168.25.15:2181(CONNECTED) 66] delete /clickhouse/tables/03/monitordata/replicas/192.168.25.15
[zk: 192.168.25.15:2181(CONNECTED) 67] ls /clickhouse/tables/03/monitordata/replicas
[192.168.25.16]
[zk: 192.168.25.15:2181(CONNECTED) 68]
删除完成之后,退出该工具,然后去建表:
重新建表失败-数据目录未删除
SQL 错误 [57] [07000]: Code: 57, e.displayText() = DB::Exception: Directory for table data data/default/monitordata/ already exists (version 20.9.3.45 (official build)), server ClickHouseNode [uri=http://192.168.25.15:8123/default, options={use_server_time_zone=false,use_time_zone=false}]@2035251460
这里在提示我们,数据存储目录下的monitordata目录没有删除,随机我找到该目录并删除:
rm /var/lib/clickhouse/data/default/monitordata -rf
建表成功
这次终于大功告成!成功建表
同步数据
其实我不知道该不该手动同步,因为我手动同步之后,发现有重复数据,我觉得这里,系统会自动同步的
仅仅是记录
-- 在16节点上查询分区
SELECT DISTINCT partition FROM system.parts WHERE table = 'monitordata';
-- 在15节点上同步分区
ALTER TABLE default.monitordata FETCH PARTITION '202405' FROM '/clickhouse/tables/03/monitordata/';
--在15节点上同步16的数据,没有账号密码就省略
INSERT INTO default.monitordata SELECT * FROM remote('192.168.25.16', 'default.monitordata','default','Cmcc_chengdu2o23');
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示