学习ORACLE的思维-索引结构分析

 

一、思考集群因子CLUSTERING_FACTOR
It is calculated as follows:
1. The index is scanned in order.
2. The block portion of the rowid pointed at by the current indexed value is compared with that pointed at by the previous indexed value.
This is achieved by comparing adjacent rowids in the index leaf block (the blocks themselves do not need to be visited).
3. If these rowids are pointing at different blocks then a counter is incremented.
4. This is continued throughout the whole index.
5. The resultant count is then stored.

索引集群因子的计算方法:
1. 用当前索引值的rowid中指向块部份的值和上一条索引值的rowid中指向块部份的值相比较。
2. 这一步通过在索引的树叶块(本身不用被访问的块)中比较相邻的rowid来完成。
3. 如果相邻rowid指向不同的块,那么就会使计数器增加。
4. 以上比较会持续进行直到扫描完整个索引。
5. 最后结果的数值就被记录下来。

测试思路:创建两张内容相同但索引字段是无序和升序的数据表,数据来源为dba_objects,从而测试对索引集群因子和其他索引统计数据的影响。

create table test_objects1 as select * from dba_objects;

create table test_objects2 as select * from dba_objects order by object_id;

create index idx1_test_objects1 on test_objects1(object_id);

create index idx1_test_objects2 on test_objects2(object_id);

select * from test_objects1 where rownum < 10;
test_objects1全表扫描可以看到object_id是无序的。

select * from test_objects2 where rownum < 10;
test_objects2全表扫描可以看到object_id是升序的。

通过分析表获得统计数据
DBA_INDEXES:
execute dbms_stats.gather_table_stats('SCOTT', 'TEST_OBJECTS1', cascade => true);

BLEVEL 1
LEAF_BLOCKS 68
DISTINCT_KEYS 30761
AVG_LEAF_BLOCKS_PER_KEY 1
AVG_DATA_BLOCKS_PER_KEY 1
CLUSTERING_FACTOR 25581
STATUS VALID
NUM_ROWS 30761
SAMPLE_SIZE 30761

execute dbms_stats.gather_table_stats('SCOTT', 'TEST_OBJECTS2', cascade => true);

BLEVEL 1
LEAF_BLOCKS 68
DISTINCT_KEYS 30762
AVG_LEAF_BLOCKS_PER_KEY 1
AVG_DATA_BLOCKS_PER_KEY 1
CLUSTERING_FACTOR 421
STATUS VALID
NUM_ROWS 30762
SAMPLE_SIZE 30762

 

INDEX_STATS:
ANALYZE INDEX IDX1_TEST_OBJECTS1 VALIDATE STRUCTURE;

HEIGHT 2
BLOCKS 80
NAME IDX1_TEST_OBJECTS1
PARTITION_NAME
LF_ROWS 30761
LF_BLKS 68
LF_ROWS_LEN 481877
LF_BLK_LEN 7996
BR_ROWS 67
BR_BLKS 1
BR_ROWS_LEN 784
BR_BLK_LEN 8028
DEL_LF_ROWS 0
DEL_LF_ROWS_LEN 0
DISTINCT_KEYS 30761
MOST_REPEATED_KEY 1
BTREE_SPACE 551756
USED_SPACE 482661
PCT_USED 88
ROWS_PER_KEY 1
BLKS_GETS_PER_ACCESS 3
PRE_ROWS 0
PRE_ROWS_LEN 0
OPT_CMPR_COUNT 0
OPT_CMPR_PCTSAVE 0

ANALYZE INDEX IDX1_TEST_OBJECTS2 VALIDATE STRUCTURE;

HEIGHT 2
BLOCKS 80
NAME IDX1_TEST_OBJECTS2
PARTITION_NAME
LF_ROWS 30762
LF_BLKS 68
LF_ROWS_LEN 481893
LF_BLK_LEN 7996
BR_ROWS 67
BR_BLKS 1
BR_ROWS_LEN 784
BR_BLK_LEN 8028
DEL_LF_ROWS 0
DEL_LF_ROWS_LEN 0
DISTINCT_KEYS 30762
MOST_REPEATED_KEY 1
BTREE_SPACE 551756
USED_SPACE 482677
PCT_USED 88
ROWS_PER_KEY 1
BLKS_GETS_PER_ACCESS 3
PRE_ROWS 0
PRE_ROWS_LEN 0
OPT_CMPR_COUNT 0
OPT_CMPR_PCTSAVE 0

分析统计数据:
根据索引的统计数据和结构数据可以看出,除了集群因子有较大差别外,其他统计信息都是相同。
那集群因子对SQL有什么影响呢?
对于一般的等值查询来说,集群因子是不会有任何影响的,而对于范围的查询则有很大的影响。
例如:
查询object_id = xxxx的等值查询
select * from test_objects1 where object_id = xxxx;
select * from test_objects2 where object_id = xxxx;
按索引的查询原理,
1. 首先查询索引根节点,
2. 然后查询分支节点,
3. 然后是叶子节点,
4. 最后回表查询数据。
所以对于高度相同的索引来说,查询的开销和效率应该是一致的,完全不会受到集群因子的影响。

查询objct_id >= xxxx and object_id <= yyyy的范围查询
select * from test_objects1 where objct_id >= xxxx and object_id <= yyyy;
select * from test_objects2 where objct_id >= xxxx and object_id <= yyyy;
按索引的查询原理,
1. 首先查询索引根节点,
2. 然后查询分支节点,
3. 然后是范围开始叶子节点,
4. 下一个叶子节点,回表查询数据,
5. 重复2 - 4直到范围结束的节点。
在1 - 3步骤是不会有差别的,差别发生在步骤4的回表查询,
如果集群因子接近与数据块的数量,则表明表的数据是有序的,回表查询需要更少的数据块,从而减少逻辑读和物理读。
而相反集群因子接近于行的数量,则表明表的数据是无序的甚至可能是存在碎片的,回表查询需要更多的数据块,从而需要更多的逻辑读和物理读。

以上观点可以通过统计数据证明:

SQL> select * from test_objects1 where object_id = 2000;
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1364 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select * from test_objects2 where object_id = 2000;
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1364 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select * from test_objects1 where object_id >= 8000 and object_id <= 9000
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
489 consistent gets
0 physical reads
0 redo size
73539 bytes sent via SQL*Net to client
1381 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1001 rows processed

SQL> select * from test_objects2 where object_id >= 8000 and object_id <= 9000;
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
52350 bytes sent via SQL*Net to client
1381 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1001 rows processed

从统计数据可以看出等值查询的统计数据是完全相同,
而范围查询中,一致读相差3倍,网络数据增加50%。
所以根据上述的分析可以知道,如果索引只是等值查询的话,可以不考虑集群因子的影响,
如果是范围查询的话则需要考虑如果让数据块更加有序,从而提高SQL语句的效率。
引申到多表等值关联和范围关联等也会造成较大的影响。

二、思考索引rebuild
索引是否需要重建?这是一个困扰大多数DBA的问题,
想知道索引是否需要重建,首先需要了解BTREE索引的原理和实现。
简单的说BTREE索引就是将索引列组织成树形结构,索引列是按照顺序排列的,分支节点指示索引列的范围,叶子节点只是索引列的位置rowid。
Oracle的BTREE索引是会自动平衡叶子节点的高度,从根节点到任何一个叶子节点的长度是相等的。所以坊间流传对索引的操作会引起索引不平衡是一个错误的说法。
Oracle的BTREE索引处理DML操作时主要分为两种分裂方式:99/1和50/50。
99/1是插入最大索引值时(最小是不适用的,这点需要继续研究,我的猜想和索引的序列方向有关系),索引节点分裂后,新插入的值分裂到新的叶子节点,而原来的叶子节点不变。
50/50是插入当前索引值范围内的索引值时,原叶子节点的数据被分裂为相同大小的两个叶子节点并插入新的索引值到其中的节点,两个叶子节点都只有50%的利用率。
所以99/1方式一般不会产生索引碎片,而50/50方式就很容易产生碎片了。

根据上述原理我们来做个测试:
(1)99/1方式
drop table t1;
create table t1 (name varchar2(10),nr number) pctfree 0;
create index i1 on t1(nr);
declare
i number;
begin
for i in 1..50000
loop
insert into t1 values('XX',i);
end loop;
end;
/

SQL> analyze index i1 validate structure;

SQL> select blocks, lf_blks, pct_used from index_stats;
BLOCKS LF_BLKS PCT_USED
---------- ---------- ----------
112 99 99


(2)50/50方式
drop table t2;
create table t2 (name varchar2(10),nr number) pctfree 0;
create index i2 on t2(nr);
declare
i number;
begin
insert into t2 values('XX',50000);
for i in 1..49999
loop
insert into t2 values('XX',i);
end loop;
end;
/

SQL> analyze index i2 validate structure;

SQL> select blocks, lf_blks, pct_used from index_stats;

BLOCKS LF_BLKS PCT_USED
---------- ---------- ----------
384 193 52

(3)从大往小插50/50方式
drop table t1;
create table t1 (name varchar2(10),nr number) pctfree 0;
create index i1 on t1(nr);
declare
i number;
begin
for i in reverse 1..50000
loop
insert into t1 values('XX',i);
end loop;
end;
/

SQL> analyze index i1 validate structure;

SQL> select blocks, lf_blks, pct_used from index_stats;

BLOCKS LF_BLKS PCT_USED
---------- ---------- ----------
384 200 50

根据上述的分析显示,要让索引高效则应用应该尽可能适应这种索引分裂模式,
插入按日期,序列的正序列方式,删除按照FIFO方式则是很好的适应Oracle的索引设计模式。
按照上述设计模式进行设计的应用一般是不需要重建索引的。
而违反这种设计模式的应用则需要定期进行索引重建,使索引保持高效。

请看一下例子:
create table t1 (name varchar2(10),nr number) pctfree 0;
create index i1 on t1(nr);

declare
i number;
begin
for i in 1..50000
loop
insert into t1 values('XX',i);
end loop;
commit;
end;
/

declare
i number;
begin
for i in 1..50000
loop
delete from t1 where nr = i;
end loop;
commit;
end;
/

declare
i number;
begin
for i in 50000..100000
loop
insert into t1 values('XX',i);
end loop;
commit;
end;
/

declare
i number;
begin
for i in 50000..60000
loop
delete from t1 where nr = i;
end loop;
commit;
end;
/

declare
i number;
begin
for i in 100000..110000
loop
insert into t1 values('XX',i);
end loop;
commit;
end;
/

SQL> analyze index i1 validate structure;

Index analyzed.

SQL> select blocks, lf_blks, pct_used from index_stats;

BLOCKS LF_BLKS PCT_USED
---------- ---------- ----------
112 102 98


drop table t1;
create table t1 (name varchar2(10),nr number) pctfree 0;
create index i1 on t1(nr);

declare
i number;
begin
for i in 1..50000
loop
insert into t1 values('XX',i);
end loop;
commit;
end;
/

declare
i number;
begin
for i in 2..9999
loop
delete from t1 where nr = i;
end loop;
for i in 10001..19999
loop
delete from t1 where nr = i;
end loop;
for i in 20001..29999
loop
delete from t1 where nr = i;
end loop;
for i in 30001..39999
loop
delete from t1 where nr = i;
end loop;
for i in 40001..49999
loop
delete from t1 where nr = i;
end loop;
commit;
end;
/

declare
i number;
begin
for i in 50000..100000
loop
insert into t1 values('XX',i);
end loop;
commit;
end;
/


declare
i number;
begin
for i in 50001..59999
loop
delete from t1 where nr = i;
end loop;
commit;
end;
/

declare
i number;
begin
for i in 100000..110000
loop
insert into t1 values('XX',i);
end loop;
commit;
end;
/

SQL> analyze index i1 validate structure;

Index analyzed.

SQL> select blocks, lf_blks, pct_used from index_stats;

BLOCKS LF_BLKS PCT_USED
---------- ---------- ----------
120 109 92

SQL> alter index i1 rebuild online;

Index altered.

SQL> analyze index i1 validate structure;

Index analyzed.

SQL> select blocks, lf_blks, pct_used from index_stats;

BLOCKS LF_BLKS PCT_USED
---------- ---------- ----------
128 112 89

根据分析上述统计数据可以看出,叶子节点多了7个,基本上和6个剩余的记录相等,表示每个记录就要浪费一个数据块的空间。
如果这样的记录不断增加,那么索引就需要进行重建了。
有时候重建后由于需要平衡各方面的因素,重建后的索引的结构可能会显得更差,但其实索引的整体性能是相差不大的,因为数据块都变满了,同时分布也更平均

posted @ 2012-10-20 23:32  wuxi812  阅读(359)  评论(0编辑  收藏  举报