转载自love wife & love life —Roger 提供oracle技术支持服务
本文链接地址: intra blcok chain
最近在一个优化项目中,通过awr报告发现table fetch continued row 指标很高,怀疑是行迁移/链接比较严重。
后来经过沟通发现,原来客户的数据库中存在几个table,其column 数目超过255. 针对超过255列的行数据.如下:
|
Statistic Total per Second per Trans dirty buffers inspected 7,532 4.17 0.25 free buffer inspected 17,409,018 9,633.69 575.56 free buffer requested 17,115,682 9,471.36 565.86 table fetch by rowid 98,848,588 54,700.18 3,268.05 table fetch continued row 97,797,107 54,118.32 3,233.28 |
oracle会将其每行数据都分成2个row piece. 实际上如果超过510个列,那么会被分成3个row piece存放在同一个 block中。 10gR2的官方文档是这样描述的:
|
"When a table has more than 255 columns, rows that have data after the 255th column are likely to be chained within the same block. This is called intra-block chaining. A chained row's pieces are chained together using the rowids of the pieces. With intra-block chaining, users receive all the data in the same block. If the row fits in the block, users do not see an effect in I/O performance, because no extra I/O operation is required to retrieve the rest of the row." |
创建超过255列的测试表:
----tab1 256 columns SQL> declare
2 v_sql varchar2(32767) ;
3 begin
4 v_sql := 'create table t_chain1 ( ' ;
5 for i in 1..256 loop
6 v_sql := v_sql || 'id' ||i|| ' number,' ;
7 end loop ;
8 v_sql := rtrim(v_sql, ',' ) || ')' ;
9 execute immediate v_sql;
10 end ;
11 / PL/SQL procedure successfully completed. SQL> select count (1) from user_tab_columns where table_name= 'T_CHAIN1' ;
COUNT (1) ----------
256 SQL> create sequence t_chain_seq
2 minvalue 1
3 nomaxvalue
4 start with 1
5 increment by 1
6 nocycle
7 cache 10000; Sequence created. SQL> SQL> insert into t_chain1(id256) values (t_chain_seq.NEXTVAL); 1 row created. SQL> / 1 row created. SQL> / 1 row created. SQL> commit ; Commit complete. SQL> select count (1) from t_chain1;
COUNT (1) ----------
3 SQL> |
再创建一个255列的测试表进行对比:
----tab2 255 columns SQL> declare
2 v_sql varchar2(32767) ;
3 begin
4 v_sql := 'create table t_chain2 ( ' ;
5 for i in 1..255 loop
6 v_sql := v_sql || 'id' ||i|| ' number,' ;
7 end loop ;
8 v_sql := rtrim(v_sql, ',' ) || ')' ;
9 execute immediate v_sql;
10 end ;
11 / PL/SQL procedure successfully completed. SQL> select count (1) from user_tab_columns where table_name= 'T_CHAIN2' ;
COUNT (1) ----------
255 SQL> create sequence t_chain_seq2
2 minvalue 1
3 nomaxvalue
4 start with 1
5 increment by 1
6 nocycle
7 cache 10000; Sequence created. SQL> insert into t_chain2(id255) values (t_chain_seq2.NEXTVAL); 1 row created. SQL> / 1 row created. SQL> / 1 row created. SQL> commit ; Commit complete. SQL> select count (1) from t_chain2;
COUNT (1) ----------
3 |
通过前面的测试表,我们来对比观察下这2个表的block信息的差别:
----使用dbms_rowid定位到block 号 SQL> select dbms_rowid.rowid_relative_fno(t.rowid) as "file#" ,
2 dbms_rowid.rowid_block_number(t.rowid) as "block#" from t_chain1 t ;
file# block# ---------- ----------
2 73688
2 73688
2 73688 SQL> select dbms_rowid.rowid_relative_fno(t.rowid) as "file#" ,
2 dbms_rowid.rowid_block_number(t.rowid) as "block#" from t_chain2 t ;
file# block# ---------- ----------
2 73696
2 73696
2 73696 |
我们先来看下这2种情况下,block内信息的差异.
---tab1 256 column dump data_block_dump,data header at 0xd0da87c =============== tsiz: 0x1f80 hsiz: 0x1e pbl: 0x0d0da87c bdba: 0x00811fd8
76543210 flag=-------- ntab=1 nrow=6 ---注意这里是6(实际上测试表的数据只有3条) frre=-1 fsbo=0x1e fseo=0x1c56 avsp=0x1c38 tosp=0x1c38 0xe:pti[0] nrow=6 offs=0 0x12:pri[0] offs=0x1e7c 0x14:pri[1] offs=0x1e72 0x16:pri[2] offs=0x1d6e 0x18:pri[3] offs=0x1d64 0x1a:pri[4] offs=0x1c60 0x1c:pri[5] offs=0x1c56 block_row_dump: tab 0, row 0, @0x1e7c 我们可以看到,oracle这样讲第1行数据分成了2个row piece.这里是第一个row piece的offset tl: 260 fb: -----L-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ....... col 253: *NULL* col 254: [ 2] c1 02 从这里我们可以看到,oracle将有数据的列存放到第一个row piece了. tab 0, row 1, @0x1e72 这里是第一行数据的第2个row piece. tl: 10 fb: --H-F--- lb: 0x1 cc: 1 nrid: 0x00811fd8.0 这里的nrid相当于rdba,转换后即为我们的file 2 block 73688地址. col 0: *NULL* tab 0, row 2, @0x1d6e tl: 260 fb: -----L-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ...... col 252: *NULL* col 253: *NULL* col 254: [ 2] c1 03 tab 0, row 3, @0x1d64 tl: 10 fb: --H-F--- lb: 0x1 cc: 1 nrid: 0x00811fd8.2 col 0: *NULL* tab 0, row 4, @0x1c60 tl: 260 fb: -----L-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ...... col 252: *NULL* col 253: *NULL* col 254: [ 2] c1 04 tab 0, row 5, @0x1c56 tl: 10 fb: --H-F--- lb: 0x1 cc: 1 nrid: 0x00811fd8.4 col 0: *NULL* end_of_block_dump |
对于超过255列的表,oracle会将其在block内的一行数据分成2个row piece来存放。但是会将有数据的列存放在 第一个row piece中,无数的列存放的在第2个row piece中.
如下是tab2 255 columns的dump:
data_block_dump,data header at 0xd0da864 =============== tsiz: 0x1f98 hsiz: 0x18 pbl: 0x0d0da864 bdba: 0x00811fe0
76543210 flag=-------- ntab=1 nrow=3 frre=-1 fsbo=0x18 fseo=0x1c8c avsp=0x1c74 tosp=0x1c74 0xe:pti[0] nrow=3 offs=0 0x12:pri[0] offs=0x1e94 0x14:pri[1] offs=0x1d90 0x16:pri[2] offs=0x1c8c block_row_dump: tab 0, row 0, @0x1e94 tl: 260 fb: --H-FL-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* ...... col 253: *NULL* col 254: [ 2] c1 02 tab 0, row 1, @0x1d90 tl: 260 fb: --H-FL-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* ....... col 253: *NULL* col 254: [ 2] c1 03 tab 0, row 2, @0x1c8c tl: 260 fb: --H-FL-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ........ col 253: *NULL* col 254: [ 2] c1 04 end_of_block_dump |
对于未超过255列的表.没有什么特别的,正是大家平时所看到的这样.
这里给大家补充一点,为什么oracle这里最大允许255个列呢 ? 如下:
BBED> x /rnnnnnnnnnnnnnnnnnnnn rowdata[6686] @7924 ------------- flag@7924: 0x04 (KDRHFL) lock@7925: 0x00 cols@7926: 255 ....... BBED> d /v offset 7924 count 10
File: /home/ora10g/oradata/roger01.dbf (2)
Block: 73688 Offsets: 7924 to 7933 Dba:0x00811fd8 -------------------------------------------------------
d80000ff 0401ffff ffff l ........
<16 bytes per line> |
可以看到oracle用了一个byte来存放column 的count值。简称cc.
我们知道,1个byte等于8个bit. 一个bit最大表示的数目是power(2,1).以此类推,那么一个byte 所能表示的最大数目都是power(2,8),即使256. 所以oracle这里一个row piece最大允许255个column.
超过255列即为分到另外一个row piece中. 虽然数据被划分到了其他的row piece,然而,一行数据仍然是存在同一个block中. 同时,每行都存在一个rowid.
SQL> select rowid from t_chain1; ROWID ------------------ AAAPJfAACAAAR/YAAB AAAPJfAACAAAR/YAAD AAAPJfAACAAAR/YAAF SQL> select dbms_rowid.rowid_object(rowid) obj#,
2 dbms_rowid.rowid_relative_fno(rowid) rfile#,
3 dbms_rowid.rowid_block_number(rowid) block#,
4 dbms_rowid.rowid_row_number(rowid) row#
5 from t_chain1 ;
OBJ# RFILE# BLOCK# ROW# ---------- ---------- ---------- ----------
62047 2 73688 1
62047 2 73688 3
62047 2 73688 5 |
我们可以看到,虽然该block存放了6行数据(3条数据,6个row piece),然而其rowid只有3个. 从这点也可以看出,其实虽然一条数据 被分成2个row piece,然而其rowid却只有一个.所有如果是所有通过rowid访问,那么是可以指定返回整条数据的,不需要产生额外的IO。 那么,针对这样的行内迁移(intra block chain),到底是否会产生多余的IO消耗呢 ?我们通过创建通过如下的SQL来简单测试下:
+++++++首先测试不存在行迁移的表,观察逻辑读消耗
SQL> conn roger/roger Connected. SQL> select a. NAME , b.VALUE
2 from v$statname a, v$mystat b
3 where a.STATISTIC# = b.STATISTIC#
4 and lower (a. NAME ) = 'table fetch continued row' ; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 0 SQL> alter system flush shared_pool; System altered. SQL> alter system flush BUFFER_CACHE; System altered. SQL> set autot on SQL> set lines 200 SQL> select count (1) from t_chain2 where id255=2;
COUNT (1) ----------
1 Execution Plan ---------------------------------------------------------- Plan hash value: 1667017148 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL | T_CHAIN2 | 1 | 3 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------
2 - filter( "ID255" =2) Statistics ----------------------------------------------------------
476 recursive calls
0 db block gets
41 consistent gets
17 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to / from client
4 sorts (memory)
0 sorts (disk)
1 rows processed |
++++++++ 测试intra block chaining的表
|
SQL> select count (1) from t_chain1 where id256=2;
COUNT (1) ----------
1 SQL> select a. NAME , b.VALUE
2 from v$statname a, v$mystat b
3 where a.STATISTIC# = b.STATISTIC#
4 and lower (a. NAME ) = 'table fetch continued row' ; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 48 SQL> alter system flush buffer_cache; System altered. SQL> set autot on SQL> set autot off SQL> select a. NAME , b.VALUE
2 from v$statname a, v$mystat b
3 where a.STATISTIC# = b.STATISTIC#
4 and lower (a. NAME ) = 'table fetch continued row' ; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 48 SQL> set autot on SQL> select count (1) from t_chain1 where id256=2;
COUNT (1) ----------
1 Execution Plan ---------------------------------------------------------- Plan hash value: 432753352 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL | T_CHAIN1 | 1 | 3 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------
2 - filter( "ID256" =2) Statistics ----------------------------------------------------------
1 recursive calls
0 db block gets
10 consistent gets
6 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to / from client
0 sorts (memory)
0 sorts (disk)
1 rows processed SQL> alter system flush buffer_cache; System altered. SQL> SQL> set autot off SQL> select a. NAME , b.VALUE
2 from v$statname a, v$mystat b
3 where a.STATISTIC# = b.STATISTIC#
4 and lower (a. NAME ) = 'table fetch continued row' ; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 63 SQL> alter system flush BUFFER_CACHE; System altered. SQL> set autot on SQL> select count (1) from t_chain1 where id256=2;
COUNT (1) ----------
1 Execution Plan ---------------------------------------------------------- Plan hash value: 432753352 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL | T_CHAIN1 | 1 | 3 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------
2 - filter( "ID256" =2) Statistics ----------------------------------------------------------
477 recursive calls
0 db block gets
38 consistent gets
18 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to / from client
4 sorts (memory)
0 sorts (disk)
1 rows processed SQL> select a. NAME , b.VALUE
2 from v$statname a, v$mystat b
3 where a.STATISTIC# = b.STATISTIC#
4 and lower (a. NAME ) = 'table fetch continued row' ; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 87 |
我们可以看到虽然intra block chain,会由于对表的访问导致table fetch continued row 统计信息的增加, 然而通过测试我们可以发现,其本身并不会消耗额外的IO。
从我上面的2个测试来看:
t_chain1 256 列 t_chain2 255 列
这2个测试表均包含3条数据。 全表扫描的逻辑读几乎一致。
---------------------------------------------------------------
以上是roger的博文,我基本照做了一遍,没有问题,当中有一个概念
对于超过255列的表,oracle会将其在block内的一行数据分成2个row piece来存放。但是会将有数据的列存放在 第一个row piece中,无数的列存放的在第2个row piece中.
这个概念有误,oracle用逆向存储数据的方式,即最新的数据存在最前段,所以即使是行链,也不是这种情况。(老杨说)
导出块的方式 alter system dump datafile 2 block 73688
nrid: 0x00811fd8.0 是对应的rdba,那是怎么转变的呢?
这里先把nrid转换成数字
SQL> select to_number('00811fd8','xxxxxxxx') from dual;
TO_NUMBER('00811FD8','XXXXXXXX
------------------------------
8462296
执行dbms_utility.data_block_address_block(8462296)
dbms_utility.data_block_address_file(8462296)
得到相应的值
ps:我收藏这篇博文的目的是因为,我比较缺乏这种测试的想法,个人觉得这篇博文,知识点还有测试方法都不错,所以做一个记录