ClickHouse remote表函数引起大量的日志报错
ClickHouse remote表函数引起的报错
报错的信息如下:
2022.09.27 10:44:34.372314 [ 18958 ] {} <Error> DNSResolver: Cannot resolve host (node20-west.contoso.com), error 0: node20-west.contoso.com. 2022.09.27 10:44:34.372607 [ 18958 ] {} <Error> bool DB::DNSResolver::updateCacheImpl(UpdateF &&, ElemsT &&, const DB::String &) [UpdateF = bool (DB::DNSResolver::*)(const std::string &), ElemsT = std::unordered_set<std::string> &]: Code: 198. DB::Exception: Not found address of host: node20-west.contoso.com. (DNS_ERROR), Stack trace (when copying this message, always include the lines below): 0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa4dde1a in /usr/bin/clickhouse 1. ? @ 0xa5c2162 in /usr/bin/clickhouse 2. ? @ 0xa5c2942 in /usr/bin/clickhouse 3. DB::DNSResolver::updateHost(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xa5c6264 in /usr/bin/clickhouse 4. bool DB::DNSResolver::updateCacheImpl<bool (DB::DNSResolver::*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&>(bool (DB::DNSResolver::*&&)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xa5c5d60 in /usr/bin/clickhouse 5. DB::DNSResolver::updateCache() @ 0xa5c56e4 in /usr/bin/clickhouse 6. DB::DNSCacheUpdater::run() @ 0x14137d8c in /usr/bin/clickhouse 7. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x13b2dbae in /usr/bin/clickhouse 8. DB::BackgroundSchedulePool::threadFunction() @ 0x13b30527 in /usr/bin/clickhouse 9. ? @ 0x13b31530 in /usr/bin/clickhouse 10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa584c97 in /usr/bin/clickhouse 11. ? @ 0xa58881d in /usr/bin/clickhouse 12. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so 13. __clone @ 0xfe96d in /usr/lib64/libc-2.17.so (version 22.3.2.1)
半天时间可以把撑的很巨大。
引起的操作是在研究如何查询远程系统表的信息的时候,执行了如下语句:
SELECT hostName() AS host, any(partition), count() FROM remote('node{01..30}-west.contoso.com', system, parts) GROUP BY host;
这个node{01..30}-west.contoso.com主机是不存在的,导致dns一直解析失败,所以狂报错。
01-30一共30个主机每个都重复报错导致短时间大量日志积累。
应该是DNS的缓存引起的问题,具体原理尚不清楚。
网上查了一下,并没有什么有效的解决方式。
Clickhouse的资料真的是太少了。
后来在官网搜索DNS关键字,发现如下:
DROP DNS CACHE
重置CH的dns缓存。有时候(对于旧的ClickHouse版本)当某些底层环境发生变化时(修改其它Clickhouse服务器的ip或字典所在服务器的ip),需要使用该命令。 更多自动化的缓存管理相关信息,参见disable_internal_dns_cache, dns_cache_update_period这些参数。
由于报错的日志信息也要“DNSResolver::updateCacheImpll”关键字,DNS和cache(缓存),猜测应该使用SYSTEM DROP DNS CACHE命令清理DNS缓存即可。
CK4:default@default> SYSTEM DROP DNS CACHE; SYSTEM DROP DNS CACHE Query id: 19ca4bb3-c2ef-4509-8047-16979f68523a Ok. 0 rows in set. Elapsed: 5.558 sec.
再次观察日志,无先关错误产生了。
其实原来如果再找不到解决方法,我还想着在/etc/hosts文件临时添加手工解析如下:
127.0.0.1 node01-west.contoso.com
127.0.0.1 node02-west.contoso.com
...
127.0.0.1 node90-west.contoso.com
将原本无法解析的域名指向127.0.0.1,待问题处理后在删除。