OLAP 大表和小表并行hash join

一个表50MB 

一个表10GB 


50M表做驱动表,放在PGA里

这时候慢在对对 10g 的全表扫描


对10个G扫描块 需要开并行


我有这样一个算法 

一个进程 读 50mb 

8进程 来 扫描 10gb 

一个 进程扫描 1.25gb 


50MB 都分发到 8个进程

超大表和小表之间做HASH JOIN,一般会启用用并行,ORACLE在并行HASH JOIN的时候会用到很多技术,比如 HASH HASH, 或者BROADCAST

对于超大表和小表做HASH JOIN,一定要让小表进行广播(Broadcast),通常情况下CBO会选择正确,但是如果统计信息不准,或者基数计算错误CBO选择了 HASH HASH join,这个时

候就很慢,观察现象就是它在做direct path write temp,这个时候就可以用HINT PQ_DISTRIBUTE 进行调整

PQ_DISTRIBUTE(驱动表 None, Broadcast) 如果外层表很小(HASH_AJ),
这个时候可以用 PQ_DISTRIBUTE(驱动表 Broadcast,None)


下面就是一个具体的例子, F 是一个超大表 T 是一个小表 

SQL&get; explain plan for select /*+ parallel(f 8) parallel(t 8) use_hash(t,f) full(f) full(t) PQ_DISTRIBUTE(f HASH, HASH) */ *
  2    from crs_data_fct f
  3    JOIN crs_time_perd_fdim t ON t.TIME_PERD_ID = f.TIME_PERD_ID;

Explained.

Elapsed: 00:00:00.83
SQL&get; select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 353396990

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |        |      |            |
|   1 |  PX COORDINATOR                 |                    |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)           | :TQ10002           |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,02 | P-&get;S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED           |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE                  |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH               | :TQ10000           | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | P-&get;P | HASH       |
|   6 |       PX BLOCK ITERATOR         |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS STORAGE FULL| CRS_TIME_PERD_FDIM | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|   8 |     PX RECEIVE                  |                    |   239M|   673G|  8267  (19)| 00:01:56 |       |       |  Q1,02 | PCWP |            |
|   9 |      PX SEND HASH               | :TQ10001           |   239M|   673G|  8267  (19)| 00:01:56 |       |       |  Q1,01 | P-&get;P | HASH       |
|  10 |       PX PARTITION LIST ALL     |                    |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWC |            |
|  11 |        TABLE ACCESS STORAGE FULL| CRS_DATA_FCT       |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."TIME_PERD_ID"="F"."TIME_PERD_ID")

Note
-----
   - dynamic sampling used for this statement (level=6)

27 rows selected.

SQL&get; explain plan for select /*+ parallel(f,8) parallel(t 8) use_hash(t,f) full(f) full(t) PQ_DISTRIBUTE(t None, Broadcast) */ *
  2    from crs_data_fct f
  3    JOIN crs_time_perd_fdim t ON t.TIME_PERD_ID = f.TIME_PERD_ID;

Explained.

SQL&get; select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 271674260

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |        |      |            |
|   1 |  PX COORDINATOR                 |                    |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)           | :TQ10001           |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,01 | P-&get;S | QC (RAND)  |
|*  3 |    HASH JOIN                    |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                  |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND BROADCAST          | :TQ10000           | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | P-&get;P | BROADCAST  |
|   6 |       PX BLOCK ITERATOR         |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS STORAGE FULL| CRS_TIME_PERD_FDIM | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|   8 |     PX PARTITION LIST ALL       |                    |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWC |            |
|   9 |      TABLE ACCESS STORAGE FULL  | CRS_DATA_FCT       |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."TIME_PERD_ID"="F"."TIME_PERD_ID")

Note
-----
   - dynamic sampling used for this statement (level=6)

25 rows selected.

hints:PQ_DISTRIBUTE(小表 None, Broadcast)




posted @ 2014-03-01 20:15  czcb  阅读(299)  评论(0编辑  收藏  举报