Apache druid笔记

1.历史节点的查询效率与内存数据比成正比，内存越大则读取磁盘的次数越少，

历史节点内存越大总数据量越小则查询速度越快。

2.缓存机制可以选择外部和内部缓存，外部缓存如：memcached，内部缓存可以选择查询节点（broker）或历史节点（historical），

如果选择查询节点作为缓存则查询时首先访问查询节点的cache，只有当不命中时才会访问历史节点与实时节点

3.一般druid集群只需要一个查询节点即可，如果需要高可用可以多加一个查询节点。

4.副本解决当某个历史节点故障时其上的数据短暂不可用的问题。

协调节点会将故障节点上的数据重新分配给其他节点，但是存在延时。多个协调节点做负载均衡可以保证协调节点的高可用。

5.indexing-service使用主从架构，其中统治节点（overload）为主节点，中间管理者（Middle-manager）为从节点。

6.关于自动删除数据: If you just want to retain recent data, you can use this rule to drop the old data

that before a specified period and add a loadForever rule to follow it. Notes, dropBeforeByPeriod + loadForever

is equivalent to loadByPeriod(includeFuture = true) + dropForever.

7.druid支持http请求压缩和响应压缩 Content-Encoding:gzip and Accept-Encoding:gzip，对于结果集返回较大的情况可以使用

http响应压缩，从而节省带宽提升查询速度。

8.关于段大小： it's generally recommended for each segment to have around 5 million rows。

Segment byte size: it's recommended to set 300 ~ 700MB，如果行数和大小这两个值不匹配则修改行数。

查看段行数和大小的语句：

SELECT
"start",
"end",
version,
COUNT(*) AS num_segments,
AVG("num_rows") AS avg_num_rows,
SUM("num_rows") AS total_num_rows,
AVG("size") AS avg_size,
SUM("size") AS total_size
FROM
sys.segments A
WHERE
datasource = 'your_dataSource' AND
is_published = 1
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3 DESC;

9.关于Null：配置druid.generic.useDefaultValueForNull When set to true, null values will be stored as ''
for string columns and 0 for numeric columns. Set to false to store and query data in SQL compatible mode
配置默认值为true，使用中需要将此值设置为false。
10.druid内部组件使用http互相通信，其中配置druid.global.http.numMaxThreads为Maximum number of I/O worker threads，
默认值为max(10, ((number of cores * 17) / 16 + 2) + 30)，如果k8s指定分配资源则需要手动计算此值并设置。

posted on 2021-04-23 16:07 小SEI子阅读(254) 评论(0) 编辑收藏举报

刷新页面返回顶部