Oracle Golden Gate 系列十八 -- GG 多对一 real-time data warehousing 说明 与 示例
一.官网说明
A datawarehousing configuration is a many-to-one configuration. Multiple sourcedatabases send data to one target warehouse database. Oracle GoldenGatesupports like-to-like or heterogeneous transfer of data, with capabilities forfiltering and conversion on any system in the configuration (support varies bydatabase platform).
--多个Source DB发送到一个Target warehouse DB。
1.1 Considerations for a data warehousing configuration
1.1.1 Isolation of data records
Thisconfiguration assumes that each source database contributes different recordsto the target system. If the same record exists in the same table on two ormore source systems and can be changed on any of those systems, conflictresolution routines are needed to resolve conflicts when changes to that recordare made on both sources at the same time and replicated to the target table.
这里假设每个source DB 发送给TargetDB的都是不同的record,如果是相同的record,可能就会在TargetDB 上产生冲突。
1.1.2 Data storage
You can dividethe data storage between the source systems and the target system to reduce theneed for massive amounts of disk space on the target system. This is accomplishedby using a data pump on each source, rather than sending data directly from eachExtract across the network to the target.
(1) A primary Extract writes to alocal trail on each source.
(2) A data-pump Extract on eachsource reads the local trail and sends it across TCP/IP to a dedicated Replicatgroup.
1.1.3 Filtering and conversion
If not all ofthe data from a source system will be sent to the data warehouse, you can use thedata pump to perform the filtering. This removes that processing overhead fromthe primary Extract group, and it reduces the amount of data that is sentacross the network.
1.1.3.1 To filter data,you can use:
(1) A FILTER or WHERE clause in a TABLEstatement (Extract) or in a MAP statement (Replicat).
(2) A SQL query or procedure
(3) User exits
1.1.3.2 To transformdata, you can use:
(1) Native Oracle GoldenGateconversion functions
(2) A user exit from the Extract orReplicat process that applies rules from an external
(3) transformation solution, thenreturns the manipulated data to Oracle GoldenGate.
(4) Replicat to deliver datadirectly to an ETL solution or other transformation engine.
1.1.4 Data volume
The standard configuration is sufficientif:
(1) The transaction load isconsistent and of moderate volume that is spread out more or less evenly amongall of the objects to be replicated.
(2) There are none of thefollowing: tables that are subject to long-running transactions, tables thathave a very large number of columns that change, or tables that contain columnsfor which Oracle GoldenGate must fetch from the database (generally columnswith LOBs, columns that are affected by SQL procedures executed by Oracle GoldenGate,and columns that are not logged to the transaction log).
If yourenvironment does not satisfy those conditions, consider adding one or more setsof parallel processes.
1.2 Creating a data warehousing configuration
Figure 13 Configuration for datawarehousing
1.2.1 Source systems
--To configure theManager process
1. On each source, configure the Managerprocess.
2. In each Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail on the local system.
--To configure theprimary Extract groups
3. On each source, use the ADD EXTRACT commandto create a primary Extract group. For documentation purposes, these groups arecalled ext_1 and ext_2.
Extract_1
ADD EXTRACT<ext_1>, TRANLOG, BEGIN <time> [, THREADS <n>]
Extract_2
ADD EXTRACT<ext_2>, TRANLOG, BEGIN <time> [, THREADS <n>]
注意: Use TRANLOG as the data source option. For DB2 on Z/OS, specify thebootstrap data set (BSDS) name following TRANLOG.
4. On each source, use the ADD EXTTRAIL commandto create a local trail.
Extract_1
ADD EXTTRAIL<local_trail_1>, EXTRACT <ext_1>
Extract_2
ADD EXTTRAIL<local_trail_2>, EXTRACT <ext_2>
注意: Use the EXTRACT argument to link each Extract group to the localtrail on the same system. The primary Extract writes to this trail, and thedata-pump reads it.
5. On each source, use the EDIT PARAMS commandto create a parameter file for the primary Extract. Include the followingparameters plus any others that apply to your database environment.
Extract_1
-- Identify the Extract group:
EXTRACT <ext_1>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the local trail that thisExtract writes to:
EXTTRAIL <local_trail_1>
-- Specify tables to be captured:
TABLE <owner>.<table>;
Extract_2
-- Identify the Extract group:
EXTRACT <ext_2>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the local trail that thisExtract writes to:
EXTTRAIL <local_trail_2>
-- Specify tables to be captured:
TABLE <owner>.<table>;
--To configure the datapumps
6. On each source, use the ADD EXTRACT commandto create a data pump Extract group. For documentation purposes, these pumpsare called pump_1 and pump_2.
Data pump_1
ADD EXTRACT <pump_1>, EXTTRAILSOURCE<local_trail_1>, BEGIN <time>
Data pump_2
ADD EXTRACT <pump_2>, EXTTRAILSOURCE<local_trail_2>, BEGIN <time>
注意:Use EXTTRAILSOURCE as the data source option, and specify the nameof the trail on the local system.
7. On each source, use the ADD RMTTRAIL commandto create a remote trail on the target.
Source_1
ADD RMTTRAIL<remote_trail_1>, EXTRACT <pump_1>
Source_2
ADD RMTTRAIL<remote_trail_2>, EXTRACT <pump_2>
注意:Use the EXTRACT argumentto link each remote trail to a different data pump. The data pump writes tothis trail over TCP/IP, and a Replicat reads from it.
8. On each source, use the EDIT PARAMS commandto create a parameter file for the data pump group. Include the followingparameters plus any others that apply to your
database environment.
Data pump_1
-- Identify the data pump group:
EXTRACT <pump_1>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the name or IP address of thetarget system:
RMTHOST <target>, MGRPORT<portnumber>
-- Specify the remote trail on the target system:
RMTTRAIL <remote_trail_1>
-- Allow mapping, filtering, conversion orpass data through as-is:
[PASSTHRU | NOPASSTHRU]
-- Specify tables to be captured:
TABLE <owner>.<table>;
Data pump_2
-- Identify the data pump group:
EXTRACT <pump_2>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the name or IP address of thetarget system:
RMTHOST <target>, MGRPORT<portnumber>
-- Specify the remote trail on the targetsystem:
RMTTRAIL <remote_trail_2>
-- Allow mapping, filtering, conversion orpass data through as-is:
[PASSTHRU | NOPASSTHRU]
-- Specify tables to be captured:
TABLE <owner>.<table>;
注意:Use NOPASSTHRU if the data pump will be filtering or convertingdata, and also use the SOURCEDB and USERID parameters as appropriate for thedatabase, to enable definitions lookups. If the data pump will not be filteringor converting data, use PASSTHRU to bypass the lookups.
NOTE:
To use PASSTHRU mode,the names of the source and target objects must be identical. No column mapping, filtering, SQLEXECfunctions, transformation, or other functions that require data manipulationcan be specified in the parameter file. You can combine normal processing withpass-through processing by pairing PASSTHRU and NOPASSTHRU with different TABLEstatements.
1.2.2 Target system
--To configure theManager process
9. Configure the Manager process.
10. In the Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail.
--To configure the Replicat groups
11. On the target, use the ADD REPLICAT commandto create a Replicat group for each remote trail that you created. Fordocumentation purposes, these groups are called rep_1 and rep_2.
Replicat_1
ADD REPLICAT <rep_1>, EXTTRAIL<remote_trail_1>, BEGIN <time>
Replicat_2
ADD REPLICAT <rep_2>, EXTTRAIL<remote_trail_2>, BEGIN <time>
注意: Use the EXTTRAIL argumentto link the Replicat group to the trail.
每个Data Pump 对应一个Replicat进程。
12. On the target, use the EDIT PARAMS commandto create a parameter file for each Replicat group. Include the followingparameters plus any others that apply to your database environment.
Replicat_1
-- Identify the Replicat group:
REPLICAT <rep_1>
-- State whether or not source and targetdefinitions are identical:
SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS
-- Specify database login information asneeded for the database:
[TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]
-- Specify error handling rules:
REPERROR (<error>, <response>)
-- Specify tables for delivery:
MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];
Replicat_2
-- Identify the Replicat group:
REPLICAT <rep_2>
-- State whether or not source and targetdefinitions are identical:
SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS
-- Specify database login information asneeded for the database:
[TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]
-- Specify error handling rules:
REPERROR (<error>, <response>)
-- Specify tables for delivery:
MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];
注意:You can use anynumber of MAP statements for any given Replicat group. All MAP statements for agiven Replicat group must specify the same objects that are contained in thetrail that is linked to the group.
二.示例
这里还是3个节点的GG,我们这里从节点2和节点3发送到节点1.
DB: oracle 11.2.0.3
OS: redhat 5.4
2.1 在所有节点上设置Manager
GGSCI (gg1) 2> edit params mgr
PORT 7809
2.2 分别在GG2和GG3上添加Extract进程
--GG2
GGSCI (gg2) 4> add extract ext1,tranlog,begin now
GGSCI (gg2) 5> add exttrail/u01/ggate/dirdat/lt, extract ext1
GGSCI (gg2) 7> view params ext1
extract ext1
userid ggate@gg2,password ggate
exttrail /u01/ggate/dirdat/lt
ddl include all objname dave.pdba;
table dave.pdba;
--GG3
GGSCI (gg3) 2> add extract ext1,tranlog,begin now
GGSCI (gg3) 4> add exttrail/u01/ggate/dirdat/lt, extract ext1
GGSCI (gg3) 6> view params ext1
extract ext1
userid ggate@gg3,password ggate
exttrail /u01/ggate/dirdat/lt
ddl include all objname dave.pdba;
table dave.pdba;
2.3 分别在GG2和GG3上添加DataPump 进程
--GG2
GGSCI (gg2) 8> add extractdpump,exttrailsource /u01/ggate/dirdat/lt
GGSCI (gg2) 9> add rmttrail/u01/ggate/dirdat/d1, extract dpump
--注意这里指定的Target端的位置,我们以d1 开头。
GGSCI (gg2) 11> view params dpump
extract dpump
userid ggate@gg2, password ggate
rmthost gg1, mgrport 7809,compress,compressthreshold0
rmttrail /u01/ggate/dirdat/d1
passthru
table dave.pdba;
--GG3
GGSCI (gg3) 7> add extractdpump,exttrailsource /u01/ggate/dirdat/lt
GGSCI (gg3) 8> add rmttrail/u01/ggate/dirdat/d2, extract dpump
GGSCI (gg3) 10> view params dpump
extract dpump
userid ggate@gg3, password ggate
rmthost gg1, mgrport 7809,compress,compressthreshold0
rmttrail /u01/ggate/dirdat/d2
passthru
table dave.pdba;
2.4 在GG1上配置2个Replicat进程,分别对应Data Pump进程
--配置checkpoint
GGSCI (gg1) 3> view params ./GLOBALS
GGSCHEMA ggate
CHECKPOINTTABLE ggate.checkpoint
GGSCI (gg1) 5> dblogin useridggate@gg1,password ggate
Successfully logged into database.
GGSCI (gg1) 7> add checkpointtableggate.checkpoint
Successfully created checkpoint tableGGATE.CHECKPOINT.
这部分,参考:
Oracle Golden Gate 系列十三 --配置GG进程检查点(checkpoint) 说明
http://blog.csdn.net/tianlesoftware/article/details/6983928
--配置2个Replicat:
--rep1:
GGSCI (gg1) 8> add replicatrep1,exttrail /u01/ggate/dirdat/d1, checkpointtable ggate.checkpoint
REPLICAT added.
GGSCI (gg1) 11> view params rep1
replicat rep1
ASSUMETARGETDEFS
userid ggate@gg1,password ggate
discardfile/u01/ggate/dirdat/rep1_discard.txt, append, megabytes 10
--HANDLECOLLISIONS
ddl include all
ddlerror default ignore retryop
map dave.pdba, target dave.pdba;
--rep2:
GGSCI (gg1) 9> add replicatrep2,exttrail /u01/ggate/dirdat/d2, checkpointtable ggate.checkpoint
REPLICAT added.
GGSCI (gg1) 13> view params rep2
replicat rep2
ASSUMETARGETDEFS
userid ggate@gg1,password ggate
discardfile/u01/ggate/dirdat/rep2_discard.txt, append, megabytes 10
--HANDLECOLLISIONS
ddl include all
ddlerror default ignore retryop
map dave.pdba, target dave.pdba;
数据初始化的问题,具体参考:
Oracle Golden Gate 系列九--GG 数据初始化装载 说明 与 示例
http://blog.csdn.net/tianlesoftware/article/details/6976551
OracleGoldenGate 系列十二--GG 数据初始化装载二 基于SCN 的初始化 说明 与 示例
http://blog.csdn.net/tianlesoftware/article/details/6982908
2.5 启动相关进程进行测试
--GG2:
GGSCI (gg2) 20> start ext1
确定ext1的trails 文件:
GGSCI (gg2) 25> view report ext1
…
2011-11-20 11:31:43 INFO OGG-01056 Recovery initializationcompleted for target file /u01/ggate
/dirdat/lt000018, at RBA 132350, CSN1525929.
2011-11-20 11:31:43 INFO OGG-01478 Output file/u01/ggate/dirdat/lt is using format RELEASE 10.
4/11.1.
2011-11-20 11:31:43 WARNING OGG-01438 Checkpoint marked as from graceful shutdown,but records fou
nd after checkpoint in trail/u01/ggate/dirdat/lt. Expected EOF Seqno0, RBA 0. Found Seqno 18, RB
A 132350.
2011-11-20 11:31:43 INFO OGG-01026 Rolling over remotefile /u01/ggate/dirdat/lt000018.
…
GGSCI (gg2) 24> info dpump
EXTRACT DPUMP Initialized 2011-11-20 11:11 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:20:16 ago)
Log Read Checkpoint File /u01/ggate/dirdat/lt000000
First Record RBA 0
GGSCI (gg2) 26> alter extractdpump,extseqno 18,extrba 0
EXTRACT altered.
GGSCI (gg2) 27> start dpump
Sending START request to MANAGER ...
EXTRACT DPUMP starting
GGSCI (gg2) 40> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING DPUMP 00:00:00 00:02:05
EXTRACT RUNNING EXT1 00:00:00 00:00:02
REPLICAT ABENDED REP1 00:00:00 12:24:42
GGSCI (gg2) 41> info dpump
EXTRACT DPUMP Last Started 2011-11-2011:37 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:02:24 ago)
Log Read Checkpoint File /u01/ggate/dirdat/lt000018
First Record RBA 0
这里注意Data Pump 默认是从lt000000 开始读取trail 文件,如果是新搭建的GG 同步,那么都是从lt000000开始处理,所以没有问题,如果是后来修改称data pump,就需要根据extract 进程的错误提示信息,使用如下命令:
GGSCI(gg1) 82> alter extract dpump,extseqno 18,extrba 0
具体参考:
OracleGoldenGate 系列十 -- 配置DataPump process 说明 与 示例
http://blog.csdn.net/tianlesoftware/article/details/6978501
--GG3:
GGSCI (gg3) 14> start ext1
Sending START request to MANAGER ...
EXTRACT EXT1 starting
GGSCI (gg3) 18> view report ext1
…
2011-11-20 11:39:50 INFO OGG-01056 Recovery initializationcompleted for target file /u01/ggate
/dirdat/lt000016, at RBA 132672, CSN1525929.
2011-11-20 11:39:50 INFO OGG-01478 Output file/u01/ggate/dirdat/lt is using format RELEASE 10.
4/11.1.
2011-11-20 11:39:50 WARNING OGG-01438 Checkpoint marked as from graceful shutdown,but records fou
nd after checkpoint in trail/u01/ggate/dirdat/lt. Expected EOF Seqno0, RBA 0. Found Seqno 16, RB
A 132672.
2011-11-20 11:39:50 INFO OGG-01026 Rolling over remotefile /u01/ggate/dirdat/lt000016.
…
GGSCI (gg3) 19> alter extractdpump,extseqno 16,extrba 0
EXTRACT altered.
GGSCI (gg3) 20> info dpump
EXTRACT DPUMP Initialized 2011-11-20 11:41 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:00:06 ago)
Log Read Checkpoint File /u01/ggate/dirdat/lt000016
First Record RBA 0
GGSCI (gg3) 21> start dpump
Sending START request to MANAGER ...
EXTRACT DPUMP starting
GGSCI (gg3) 22> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING DPUMP 00:00:00 00:00:13
EXTRACT RUNNING EXT1 00:00:00 00:00:05
REPLICAT ABENDED REP1 00:00:00 12:28:42
--GG1:
GGSCI (gg1) 18> start rep1
Sending START request to MANAGER ...
REPLICAT REP1 starting
GGSCI (gg1) 21> start rep2
Sending START request to MANAGER ...
REPLICAT REP2 starting
GGSCI (gg1) 22> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED DPUMP 00:00:00 12:34:55
EXTRACT ABENDED DPUMP2 00:00:00 12:34:49
EXTRACT ABENDED EXT1 00:00:00 12:34:51
REPLICAT RUNNING REP1 00:00:00 00:17:37
REPLICAT RUNNING REP2 00:00:00 00:17:15
2.6 验证同步
--GG2的pdba 表insert 一条记录:
SQL> conn dave/dave;
Connected.
SQL> insert into pdba values(2222,sysdate);
1 row created.
SQL> commit;
Commit complete.
--GG3的pdba 表insert 一条记录:
SQL> conn dave/dave;
Connected.
SQL> insert into pdba values(3333,sysdate);
1 row created.
SQL> commit;
Commit complete.
--在GG1 的pdba 表查询这条记录:
SQL> select * from pdba where idin(2222,3333);
no rows selected
没有同步过来。
GGSCI (gg1) 86> info rep1
REPLICAT REP1 Last Started 2011-11-2012:22 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:01 ago)
Log Read Checkpoint File /u01/ggate/dirdat/d1000000
First Record RBA 0
GGSCI (gg1) 87> stop rep1
Sending STOP request to REPLICAT REP1 ...
Request processed.
GGSCI (gg1) 88> stop rep2
Sending STOP request to REPLICAT REP2 ...
Request processed.
GGSCI (gg1) 89> alter replicat rep1extseqno 1 extrba 0
REPLICAT altered.
GGSCI (gg1) 91> alter replicat rep2 extseqno1 extrba 0
REPLICAT altered.
GGSCI (gg1) 93> start rep1
Sending START request to MANAGER ...
REPLICAT REP1 starting
GGSCI (gg1) 94> start rep2
Sending START request to MANAGER ...
REPLICAT REP2 starting
--最后确认一下:
SQL> select * from pdba where id<4444;
ID TIME
---------- ------------
2222 20-NOV-11
3333 20-NOV-11
多对一的测试同步成功。
注意这里有2个小技巧:
1. 如果同步有问题,但进程report又没有提供更多信息,可以使用Logdump工具查看trails里的内容,从而确认是哪一步出现了问题。关于logdump ,参考:
Oracle Golden Gate 系列十五 --GG Trails 说明
http://blog.csdn.net/tianlesoftware/article/details/6990611
2. 如果tail 搞不定,一个简单的解决方法:
把所有进程delete 掉,所有的tails 全部delete 掉,重新来过,这种方法比查看trail 要节省很多时间。
-------------------------------------------------------------------------------------------------------
版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!
Blog: http://blog.csdn.net/tianlesoftware
Weibo: http://weibo.com/tianlesoftware
Email: tianlesoftware@gmail.com
Skype: tianlesoftware
-------加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请----
DBA1 群:62697716(满); DBA2 群:62697977(满) DBA3 群:62697850(满)
DBA 超级群:63306533(满); DBA4 群:83829929(满) DBA5群: 142216823(满)
DBA6 群:158654907(满) DBA7 群:69087192(满) DBA8 群:172855474
DBA 超级群2:151508914 DBA9群:102954821 聊天 群:40132017(满)