ClickHouse remote表函数引起大量的日志报错

 

ClickHouse remote表函数引起的报错

 

报错的信息如下:

2022.09.27 10:44:34.372314 [ 18958 ] {} <Error> DNSResolver: Cannot resolve host (node20-west.contoso.com), error 0: node20-west.contoso.com.
2022.09.27 10:44:34.372607 [ 18958 ] {} <Error> bool DB::DNSResolver::updateCacheImpl(UpdateF &&, ElemsT &&, const DB::String &) [UpdateF = bool (DB::DNSResolver::*)(const std::string &), ElemsT = std::unordered_set<std::string> &]: Code: 198. DB::Exception: Not found address of host: node20-west.contoso.com. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa4dde1a in /usr/bin/clickhouse
1. ? @ 0xa5c2162 in /usr/bin/clickhouse
2. ? @ 0xa5c2942 in /usr/bin/clickhouse
3. DB::DNSResolver::updateHost(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xa5c6264 in /usr/bin/clickhouse
4. bool DB::DNSResolver::updateCacheImpl<bool (DB::DNSResolver::*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&>(bool (DB::DNSResolver::*&&)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xa5c5d60 in /usr/bin/clickhouse
5. DB::DNSResolver::updateCache() @ 0xa5c56e4 in /usr/bin/clickhouse
6. DB::DNSCacheUpdater::run() @ 0x14137d8c in /usr/bin/clickhouse
7. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x13b2dbae in /usr/bin/clickhouse
8. DB::BackgroundSchedulePool::threadFunction() @ 0x13b30527 in /usr/bin/clickhouse
9. ? @ 0x13b31530 in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa584c97 in /usr/bin/clickhouse
11. ? @ 0xa58881d in /usr/bin/clickhouse
12. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
13. __clone @ 0xfe96d in /usr/lib64/libc-2.17.so
 (version 22.3.2.1)

 半天时间可以把撑的很巨大。

 

 

引起的操作是在研究如何查询远程系统表的信息的时候,执行了如下语句:

SELECT
    hostName() AS host,
    any(partition),
    count()
FROM remote('node{01..30}-west.contoso.com', system, parts)
GROUP BY host;

 

这个node{01..30}-west.contoso.com主机是不存在的,导致dns一直解析失败,所以狂报错。

01-30一共30个主机每个都重复报错导致短时间大量日志积累。

应该是DNS的缓存引起的问题,具体原理尚不清楚。

 

网上查了一下,并没有什么有效的解决方式。

Clickhouse的资料真的是太少了。

后来在官网搜索DNS关键字,发现如下:

DROP DNS CACHE

重置CH的dns缓存。有时候(对于旧的ClickHouse版本)当某些底层环境发生变化时(修改其它Clickhouse服务器的ip或字典所在服务器的ip),需要使用该命令。 更多自动化的缓存管理相关信息,参见disable_internal_dns_cache, dns_cache_update_period这些参数。

由于报错的日志信息也要“DNSResolver::updateCacheImpll”关键字,DNS和cache(缓存),猜测应该使用SYSTEM DROP DNS CACHE命令清理DNS缓存即可。

CK4:default@default> SYSTEM DROP DNS CACHE; 

SYSTEM DROP DNS CACHE

Query id: 19ca4bb3-c2ef-4509-8047-16979f68523a

Ok.

0 rows in set. Elapsed: 5.558 sec.

再次观察日志,无先关错误产生了。

 

其实原来如果再找不到解决方法,我还想着在/etc/hosts文件临时添加手工解析如下:

127.0.0.1 node01-west.contoso.com

127.0.0.1 node02-west.contoso.com

...

127.0.0.1 node90-west.contoso.com

将原本无法解析的域名指向127.0.0.1,待问题处理后在删除。

 

posted @ 2022-09-27 15:05  PiscesCanon  阅读(713)  评论(0编辑  收藏  举报