关于集簇因子和柱状图的理解
一、准备实验环境
1、建立表
SQL> create table emp2 as select * from emp;
表已创建。
2、在表上建立索引
SQL> create index emp2_empno_ind on emp2(empno);
索引已创建。
3、分析表
SQL> analyze table emp2 estimate statistics;
表已分析。
4、查看集簇因子
SQL> select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR from dba_tables t,dba_indexes i
2 where t.table_name=i.table_name
3 and t.owner='SCOTT'
4 and t.table_name='EMP2';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ---------- ----------- --------------- -----------------
EMP2 14 4 40 EMP2_EMPNO_IND 1
5、查看执行计划
SQL> select * from emp2 where empno=7788;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
执行计划
----------------------------------------------------------
Plan hash value: 1554879445
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP2 | 1 | 32 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | EMP2_EMPNO_IND | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=7788)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
843 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
从执行计划中可看出,执行语句时为索引查询,集簇因子的值与块数接近:意味着列值相同的行存放得比较集中
二、使empno相同的行分散在不同的块
1、计算出14行数据在每个块中所占比例
SQL> show parameter block
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_block_buffers integer 0
db_block_checking string FALSE
db_block_checksum string TYPICAL
db_block_size integer 8192
db_file_multiblock_read_count integer 128
SQL>
SQL> select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR from dba_tables t,dba_indexes i
2 where t.table_name=i.table_name
3 and t.owner='SCOTT'
4 and t.table_name='EMP2';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ---------- ----------- --------------- -----------------
EMP2 14 4 40 EMP2_EMPNO_IND 1
14*40/8192≈0.07
所以每个块的剩余空间约为97%
2、修改pctfree的值,改为93,插入多次数据
SQL> alter table emp2 pctfree 93;
表已更改。
SQL>
SQL> insert into emp2 select * from emp;
已创建14行。
/
/
/
SQL> select count(*) from emp2;
COUNT(*)
----------
140
3、分析表,查看集簇因子和执行计划
SQL> analyze table emp2 estimate statistics;
表已分析。
SQL>
SQL>select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
2 from dba_tables t,dba_indexes i
3 where t.table_name=i.table_name
4 and t.owner='SCOTT'
5 and t.table_name='EMP2';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ---------- ----------- --------------- -----------------
EMP2 140 16 40 EMP2_EMPNO_IND 140
SQL>
SQL> select * from emp2 where empno=7788;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
已选择10行。
执行计划
----------------------------------------------------------
Plan hash value: 2941272003
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 320 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| EMP2 | 10 | 320 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPNO"=7788)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
17 consistent gets
0 physical reads
0 redo size
915 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL>
从执行计划可以看出语句执行时为全表扫描,原因:需要查询的数据分散在各个块中,集簇因子的值与行数接近
三、使empno相同的值尽量在同一个块中
1、建立一个新表emp_test,其中的数据和emp2中的数据相同
SQL> create table emp_test as select * from emp2;
表已创建。
2、将表emp2截断
SQL>
SQL> truncate table emp2;
表被截断。
3、将数据按照empno排列后重新插入到enp2中
SQL> insert into emp2 select * from emp_test order by empno;
已创建140行。
SQL>
4、查看集簇因子和执行计划
SQL> analyze table emp2 estimate statistics;
表已分析。
SQL>
SQL> select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
2 from dba_tables t,dba_indexes i
3 where t.table_name=i.table_name
4 and t.owner='SCOTT'
5 and t.table_name='EMP2';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ---------- ----------- --------------- -----------------
EMP2 140 13 40 EMP2_EMPNO_IND 16
SQL>
SQL> select * from emp2 where empno=7788;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
已选择10行。
执行计划
----------------------------------------------------------
Plan hash value: 1554879445
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 320 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP2 | 10 | 320 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | EMP2_EMPNO_IND | 10 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=7788)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1255 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL>
从执行计划中可以看出查询语句为索引查询的方式、原因:emp2中的数据按照empno的数据插入的,empno相同的值集聚在相同的块中,集簇因子的值与块数接近
四、柱状图
1、在emp2表中插入数据
SQL> begin
2 for i in 1..1000 loop
3 insert into emp2 values (7788,'SCOTT','ANALYST',7566,sysdate,5000,null,20);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL 过程已成功完成。
SQL> select count(*) from emp2;
COUNT(*)
----------
1140
SQL> select count(*) from emp2 where empno=7788;
COUNT(*)
----------
1010
SQL>
可以看出empno=7788的数据量在整个表中所占比重很大,如果执行select * from emp2 where empno=7788 这条语句理论上是全表扫描执行
2、查看集簇因子和执行计划
SQL> analyze table emp2 estimate statistics;
表已分析。
SQL> select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
2 from dba_tables t,dba_indexes i
3 where t.table_name=i.table_name
4 and t.owner='SCOTT'
5 and t.table_name='EMP2';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ---------- ----------- --------------- -----------------
EMP2 1140 95 40 EMP2_EMPNO_IND 95
SQL>
SQL> select * from emp2 where empno=7788;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
。
。
。
7788 SCOTT ANALYST 7566 2015-10-19 20:50:44 5000 20
7788 SCOTT ANALYST 7566 2015-10-19 20:50:44 5000 20
7788 SCOTT ANALYST 7566 2015-10-19 20:50:44 5000 20
7788 SCOTT ANALYST 7566 2015-10-19 20:50:44 5000 20
已选择1010行。
执行计划
----------------------------------------------------------
Plan hash value: 1554879445
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 81 | 2592 | 8 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP2 | 81 | 2592 | 8 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | EMP2_EMPNO_IND | 81 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=7788)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
215 consistent gets
0 physical reads
0 redo size
51965 bytes sent via SQL*Net to client
1153 bytes received via SQL*Net from client
69 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1010 rows processed
SQL>
可以看出集簇因子数与块数大小接近,但是没有考虑返回数量的大小问题,执行过程中仍是索引查询
3、建柱状图
SQL> analyze table emp2 estimate statistics for columns empno;
表已分析。
4、查看集簇因子和执行计划
SQL> select t.table_name,t.NUM_ROWS,t.BLOCKS,t.AVG_ROW_LEN,i.index_name,i.CLUSTERING_FACTOR
2 from dba_tables t,dba_indexes i
3 where t.table_name=i.table_name
4 and t.owner='SCOTT'
5 and t.table_name='EMP2';
TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ---------- ----------- --------------- -----------------
EMP2 1140 95 40 EMP2_EMPNO_IND 95
SQL>
SQL> select * from emp2 where empno=7788;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
。
。
。
7788 SCOTT ANALYST 7566 2015-10-19 20:50:44 5000 20
已选择1010行。
执行计划
----------------------------------------------------------
Plan hash value: 2941272003
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1010 | 32320 | 27 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| EMP2 | 1010 | 32320 | 27 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPNO"=7788)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
162 consistent gets
0 physical reads
0 redo size
14446 bytes sent via SQL*Net to client
1153 bytes received via SQL*Net from client
69 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1010 rows processed
SQL>
从执行计划中看出为全表扫描查询
SQL> select * from emp2 where empno=7369;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
已选择10行。
执行计划
----------------------------------------------------------
Plan hash value: 1554879445
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 320 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP2 | 10 | 320 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | EMP2_EMPNO_IND | 10 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=7369)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1235 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL>
当查询结果返回值数据值小时为索引查询
从以上可以看出索引查询的代价大于全表扫面的代价,原因是数据量太小,当数据量增大时会发现索引查询的代价是会小于全表扫描的