clickhouse学习笔记
introduction
https://www.youtube.com/watch?v=fGG9dApIhDU
glance of features
- shared nothing architecture
- column storage with vectorized query execution
- build-in sharding and replication
延伸阅读:
replicas help with concurrency, shards add IOPs.
shard table into different nodes, and replicate data one each of them.
use zookeeper to maintain the shared state and leader election.
clickhouse code is optimized for speed
bottom-up design: algorithms determine interface
ch的设计比较特殊,它是根据算法的实现来决定接口的定义。而不是常见的由用法(或使用习惯)决定接口。
specialized algorithms for common operations,seleted by:
由下面四个要素来决定某个操作应该使用哪种算法来执行。
- Data type:14 GROUP BY algorithms
- Data size:whether data fits in memory
- Ordering: whether data is already [partly] sorted or not
- Data distribution: e.g. using multi-armed bandits to optimize LZ4 decomposition
延伸阅读:
Introduction to Multi-Armed Bandits [pdf下载]
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty.
LZ4 (一种极快的压缩/解压算法,但压缩比率较差)
LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU.
vectorized query execution
- SIMD (SSE 4.2+)
- efficient dispatch on all available cores
延伸阅读:
CMU 课程 Vectorized Query Execution
Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering.
how do distributed queries work?
application will visit one node of clickhouse, this node will dispatch subselect to different nodes and aggregateState will compute locally on mutil nodes, then the finnal aggregation will be merged on initiator node, and feedback to application.
其他
- https://en.wikipedia.org/wiki/Materialized_view
- Vectorization vs. Compilation in Query Execution (论文)
- TPC-DS
TPC-DS is an enterprise-class benchmark, published and maintained by the Transaction Processing Performance Council (TPC), to measure the performance of decision support systems running on SQL-based big data systems.
- clickhouse sql 语法 https://clickhouse.tech/docs/zh/sql-reference/syntax/
- 架构概述 https://clickhouse.tech/docs/zh/development/architecture/
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
2011-05-05 CY7C131 BUSY信号线的接法