_partition_large_extents

Why you don't want to set _partition_large_extents = FALSE

I've seen some blogs recommending that _partition_large_extents be set to FALSE for a variety of space conserving reasons without the authors thinking about the negative impact this is going to have on Smart Scan. Large Extents cause an INITIAL allocation of 8 MB and a NEXT allocation of 1 MB and they have been the default for table spaces on Exadata since 11.2.0.2. You can verify that large extents are in use by a given table or partition by:

Select
segment_flags From sys_dba_segs where segment_name =
<table_name> and owner
= <schema_name>;

The segment flag bit for large extents is 0x40000000.

This pattern of allocation is design to work optimally with Smart Scan because Smart Scan intrinsically works in 1 MB chunks.  Reads of ASM allocation units are split into maximum 1 MB chunks to be passed to the filter processing library to have their blocks sliced and diced to create the synthetic blocks that contain only the rows and columns of interest to return to the table scan driver. When less than 1 MB gets allocated at a time to a segment and then the next contiguous blocks gets allocated to a different  segment, each separate run of blocks will be read by a different MBR. Each run will be passed separately to Smart Scan and we get sub-optimal chunks to work on increasing both the overhead of processing and the number of round trips needed to process the table scan. The design of Smart Scan is predicated on scooping up contiguous runs of data from disk for efficient processing.

This matters particularly for HCC data and for chained rows.

Hybrid Columnar Compression 

We made two improvements to HCC row-based processing in 12.1.0.1. The first was the introduction of an Order By clause on index-access plans that guaranteed any compression unit would only be decompressed once. The second was 256 KB formatting of extents. Prior to 12c, a Fetch By Rowid would read the first block into the buffer cache in order to get the list of blocks in that Compression Unit and then read the remaining blocks into the buffer cache using the exact read-ahead count. This guaranteed a minimum of two physicals reads to retrieve one row. By formatting 256 KB at a time to the segment, we know a priori that we can issue the first read for all the blocks from the RDBA to next 256 KB alignment thus the majority of Compress For Query reads can be done with a single physical read.

However this formatting can only happen when large segments are used.  You can verify that 256 KB formatting is being used by:

Select
flag From
GV$FILESPACE_USAGE where tablespace_id =
<tsnumber> 
and rfno =
<relative_fno>;

The flag bit for 256 KB formatting is 0x08. Note: 256 KB formatting requires a minimum 8 KB block size not by undoing this critical table scan optimization.

Chained Rows

When Smart Scan processes chained rows all it can see is the 1 MB chunk it is given. It has no ability to initiate a disk I/O to retrieve chained row pieces that lie outside that 1 MB chunk (even if it could the chances that the required row piece is on the same cell is relatively low). We changed the direct load path in 11.2.0.3 to work with the natural 1 MB alignment of Smart Scan so that direct loaded rows would never cross a 1 MB boundary. However as of 12.1.0.2 we have not yet changed Updates to keep all the row pieces inside a 1 MB chunk nor have we taught Alter Table Shrink Space Compact to obey the 1 MB alignment rule.

Keeping 1 MB chunk contiguous for the same segment maximizes the chances that after non-bulk inserts and updates Smart Scan is able to see all the row pieces needed to satisfy the column references and not have to send the block back to RDBMS for buffer cache chained row processing. Using 11.2.0.1 style 64 KB allocations maximizes the chances Smart Scan will not be able to process those blocks . Just saying ...

Conclusion

Getting Exadata for its table scan performance but then setting  "_partition_large_extents" = FALSE is like buying a BMW M5 and insisting on filling it with 87 octane. The correct answer to conserving space on small tables is to over-ride the default by explicitly setting INITIAL and NEXT in the Create Table statement.

Roger MacNicol

posted @ 2018-02-27 16:39  Ryan_Wo  阅读(62)  评论(0编辑  收藏  举报