saiku之行速度优化（三）

经历了前两轮优化之后，saiku由不可使用，优化到可以使用，不过在分析大量日志数据的时候，还有顿卡的感觉！继续观察背后执行的Sql，决定将注意力关注到索引上面！

日志的主要使用场景是：固定日期维度的数据分析，也就是说where条件一定跟着日期等于某一天，那么纠结的是：每个字段都建立索引，还是和日期建立联合索引。归结到底就是单个字段的索引效率与联合索引的效率优劣对比！

Postgresql数据表：saiku_search_detail

表结构：

CREATE TABLE test.saiku_search_detail
(
  rpt_date date,
  from_area_id bigint,
  from_value_id bigint,
  in_track_id bigint,
  gid character varying,
  current_city_id bigint,
  dist_city_id bigint,
  category_name_id bigint,
  page_id bigint,
  utmr_page_id bigint,
  num bigint,
  id bigint,
  partner smallint
)

条数：8,510,490。大概851万

测试步骤：

一、裸表

对一个日期进行查询：

1.1 单个条件

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'

结果：1110ms

"Aggregate  (cost=160934.85..160934.86 rows=1 width=0)"
"  ->  Seq Scan on saiku_search_detail  (cost=0.00..160816.78 rows=47230 width=0)"
"        Filter: (rpt_date = '2016-05-13'::date)"

1.2 两个条件

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

结果：1782ms

"Aggregate  (cost=184432.32..184432.33 rows=1 width=0)"
"  ->  Seq Scan on saiku_search_detail  (cost=0.00..184431.73 rows=236 width=0)"
"        Filter: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))"

没有任何异议，0个索引！

二、对两个字段分别添加索引：

--btree索引
CREATE INDEX saiku_search_detail_from_area_id_idx
  ON saiku_search_detail
  USING btree
  (from_area_id);
--hash索引
CREATE INDEX saiku_search_detail_rpt_date_idx
  ON saiku_search_detail
  USING hash
  (rpt_date);

2.1 单个条件

select
  count(1)
from saiku_search_detail
where rpt_date = '2016-05-13'

结果：83ms

"Aggregate  (cost=8.02..8.03 rows=1 width=0)"
"  ->  Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail  (cost=0.00..8.02 rows=1 width=0)"
"        Index Cond: (rpt_date = '2016-05-13'::date)"

使用了索引

2.2 两个条件

select
  count(1)
from saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

结果：149ms

"Aggregate  (cost=8.02..8.03 rows=1 width=0)"
"  ->  Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail  (cost=0.00..8.02 rows=1 width=0)"
"        Index Cond: (rpt_date = '2016-05-13'::date)"
"        Filter: (from_area_id = 135)"

使用了一个索引，第二个索引没有生效。尝试修改sql的条件顺序：

select
  count(1)
from saiku_search_detail
where from_area_id = 135
and rpt_date = '2016-05-13'

结果一样！这说明在Postgresql里面，建立两个索引字段，只会一个起作用！

三、建立联合索引

--复合索引，两个字段都添加索引
CREATE INDEX saiku_search_detail_rpt_date_from_area_idx
  ON test.saiku_search_detail
  USING btree
  (rpt_date, from_area_id);

3.1 单个条件查询&建立索引的第一个字段

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'

结果：66ms

"Aggregate  (cost=47843.00..47843.01 rows=1 width=0)"
"  ->  Bitmap Heap Scan on saiku_search_detail  (cost=2220.63..47362.94 rows=192025 width=0)"
"        Recheck Cond: (rpt_date = '2016-05-13'::date)"
"        ->  Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx  (cost=0.00..2172.62 rows=192025 width=0)"

可见使用了部分索引

3.2 两个条件查询

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

结果：65ms

"Aggregate  (cost=46124.99..46125.00 rows=1 width=0)"
"  ->  Bitmap Heap Scan on saiku_search_detail  (cost=1509.67..45857.37 rows=107047 width=0)"
"        Recheck Cond: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))"
"        ->  Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx  (cost=0.00..1482.90 rows=107047 width=0)"

使用了索引

总结

废话：如果两个字段做为筛选条件，那么联合索引最优。
收益：在日志分析过程中，除了日期的单个字段做为索引，其他的单个字段索引都不起作用，应该删除
纠结：仅仅在日期建立单个索引，还是建立多个包含日期的复合索引？根据使用场景自己决定吧

posted @ 2016-05-15 11:58 李秋阅读(1325) 评论(0) 收藏举报

刷新页面返回顶部