datax oracle到mysql数据抽取
环境配置(官方推荐):
JDK(1.8以上,推荐1.8)
Python(推荐Python2.6.X)
下载datax工具:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
解压后就能使用。
目录结构如下:
[root@rancher1 datax]# pwd /datax [root@rancher1 datax]# ls -lh total 4.0K drwxr-xr-x 2 62265 users 59 Jul 8 17:42 bin drwxr-xr-x 2 62265 users 68 Oct 12 2019 conf drwxr-xr-x 2 62265 users 85 Jul 8 18:39 job drwxr-xr-x 2 62265 users 4.0K Oct 12 2019 lib drwxr-xr-x 3 root root 24 Jul 8 17:22 log drwxr-xr-x 3 root root 24 Jul 8 17:22 log_perf drwxr-xr-x 4 62265 users 34 Oct 12 2019 plugin drwxr-xr-x 2 62265 users 23 Oct 12 2019 script drwxr-xr-x 2 62265 users 24 Oct 12 2019 tmp
ORACLE服务器信息:
监听信息:
[oracle@exam ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 08-JUL-2020 19:45:54 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=exam)(PORT=1521))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production Start Date 08-JUL-2020 16:51:50 Uptime 0 days 2 hr. 54 min. 4 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/oracle/product/11.2.0/db_1/network/admin/listener.ora Listener Log File /u01/app/oracle/diag/tnslsnr/exam/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=exam)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) Services Summary... Service "exam" has 1 instance(s). Instance "exam", status READY, has 1 handler(s) for this service... Service "examXDB" has 1 instance(s). Instance "exam", status READY, has 1 handler(s) for this service... The command completed successfully
表结构:
SQL> desc issue.db_hosts; Name Null? Type ----------------------------------------- -------- ---------------------------- DB_ID NUMBER(16) HOST_NAME VARCHAR2(256 CHAR) IP VARCHAR2(30 CHAR) RAC_ID NUMBER(16)
mysql服务器信息:
mysql 2569 1276 0 16:47 ? 00:00:11 /u01/mysql-5.7.27/bin/mysqld --basedir=/u01/mysql-5.7.27 --datadir=/u01/mysql-5.7.27/data --plugin-dir=/u01/mysql-5.7.27/lib/plugin --user=mysql --log-error=/u01/mysql-5.7.27/log/mysql_error.log --open-files-limit=65535 --pid-file=/u01/mysql-5.7.27/mysql.pid --socket=/u01/mysql-5.7.27/mysql.sock --port=3306
mysql> desc issue.db_hosts; +-----------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+-------------+------+-----+---------+-------+ | db_id | bigint(20) | YES | | NULL | | | host_name | varchar(50) | YES | | NULL | | | ip | varchar(20) | YES | | NULL | | | rac_id | int(11) | YES | | NULL | | +-----------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec)
配置数据抽取的json文件:
[root@rancher1 job]# cat db_host.json { "job": { "content": [ { "reader": { "name": "oraclereader", "parameter": { "username": "issue", "password": "issue", "column": [ "*" ], "connection": [ { "table": [ "db_hosts" ], "jdbcUrl": [ "jdbc:oracle:thin:@192.168.0.210:1521:exam" ] } ] } }, "writer": { "name": "mysqlwriter", "parameter": { "writeMode": "insert", "username": "root", "password": "123456", "column": [ "db_id","host_name","ip","rac_id" ], "preSql": [ "truncate table db_hosts" ], "connection": [ { "jdbcUrl": "jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8", "table": [ "db_hosts" ] } ] } } } ], "setting": { "speed": { "channel": 5 } } } }
执行抽取:
[root@rancher1 bin]# python datax.py ../job/db_host.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2020-07-08 19:56:11.019 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2020-07-08 19:56:11.029 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.161-b14 jvmInfo: Linux amd64 5.5.10-1.el7.elrepo.x86_64 cpu num: 2 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB 2020-07-08 19:56:11.054 [main] INFO Engine - { "content":[ { "reader":{ "name":"oraclereader", "parameter":{ "column":[ "*" ], "connection":[ { "jdbcUrl":[ "jdbc:oracle:thin:@192.168.0.210:1521:exam" ], "table":[ "db_hosts" ] } ], "password":"*****", "username":"issue" } }, "writer":{ "name":"mysqlwriter", "parameter":{ "column":[ "db_id", "host_name", "ip", "rac_id" ], "connection":[ { "jdbcUrl":"jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8", "table":[ "db_hosts" ] } ], "password":"******", "preSql":[ "truncate table db_hosts" ], "username":"root", "writeMode":"insert" } } } ], "setting":{ "speed":{ "channel":5 } } } 2020-07-08 19:56:11.080 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2020-07-08 19:56:11.083 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2020-07-08 19:56:11.083 [main] INFO JobContainer - DataX jobContainer starts job. 2020-07-08 19:56:11.086 [main] INFO JobContainer - Set jobId = 0 2020-07-08 19:56:11.498 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.0.210:1521:exam. 2020-07-08 19:56:11.500 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. 2020-07-08 19:56:11.816 [job-0] INFO OriginalConfPretreatmentUtil - table:[db_hosts] all columns:[ db_id,host_name,ip,rac_id ]. 2020-07-08 19:56:11.833 [job-0] INFO OriginalConfPretreatmentUtil - Write data [ insert INTO %s (db_id,host_name,ip,rac_id) VALUES(?,?,?,?) ], which jdbcUrl like:[jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true] 2020-07-08 19:56:11.833 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2020-07-08 19:56:11.834 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do prepare work . 2020-07-08 19:56:11.835 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work . 2020-07-08 19:56:11.847 [job-0] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[truncate table db_hosts]. context info:jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2020-07-08 19:56:11.857 [job-0] INFO JobContainer - jobContainer starts to do split ... 2020-07-08 19:56:11.858 [job-0] INFO JobContainer - Job set Channel-Number to 5 channels. 2020-07-08 19:56:11.865 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks. 2020-07-08 19:56:11.865 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks. 2020-07-08 19:56:11.887 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2020-07-08 19:56:11.891 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2020-07-08 19:56:11.895 [job-0] INFO JobContainer - Running by standalone Mode. 2020-07-08 19:56:11.909 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2020-07-08 19:56:11.923 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2020-07-08 19:56:11.923 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2020-07-08 19:56:11.939 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2020-07-08 19:56:11.947 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from db_hosts ] jdbcUrl:[jdbc:oracle:thin:@192.168.0.210:1521:exam]. 2020-07-08 19:56:12.117 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from db_hosts ] jdbcUrl:[jdbc:oracle:thin:@192.168.0.210:1521:exam]. 2020-07-08 19:56:12.376 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[442]ms 2020-07-08 19:56:12.377 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2020-07-08 19:56:21.924 [job-0] INFO StandAloneJobContainerCommunicator - Total 667 records, 21266 bytes | Speed 2.08KB/s, 66 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.003s | All Task WaitReaderTime 0.140s | Percentage 100.00% 2020-07-08 19:56:21.925 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2020-07-08 19:56:21.925 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work. 2020-07-08 19:56:21.925 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do post work. 2020-07-08 19:56:21.926 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2020-07-08 19:56:21.927 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /datax/hook 2020-07-08 19:56:21.929 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 1 | 1 | 1 | 0.054s | 0.054s | 0.054s PS Scavenge | 1 | 1 | 1 | 0.032s | 0.032s | 0.032s 2020-07-08 19:56:21.930 [job-0] INFO JobContainer - PerfTrace not enable! 2020-07-08 19:56:21.931 [job-0] INFO StandAloneJobContainerCommunicator - Total 667 records, 21266 bytes | Speed 2.08KB/s, 66 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.003s | All Task WaitReaderTime 0.140s | Percentage 100.00% 2020-07-08 19:56:21.932 [job-0] INFO JobContainer - 任务启动时刻 : 2020-07-08 19:56:11 任务结束时刻 : 2020-07-08 19:56:21 任务总计耗时 : 10s 任务平均流量 : 2.08KB/s 记录写入速度 : 66rec/s 读出记录总数 : 667 读写失败总数 : 0 [root@rancher1 bin]#
查看结果:
mysql> select * from issue.db_hosts; +------------+------------------------+----------------+--------+ | db_id | host_name | ip | rac_id | +------------+------------------------+----------------+--------+ | 500011196 | dxxxxxxxxx3 | 10.xxx.xxx.174 | 28 | | 500014437 | dxxxx | 10.xxx.xxx.30 | 10 | | 500014437 | dxxxx4 | 10.1xx.xxx.33 | 10 | | 500014437 | dxxxx | 10.xxx.xxx.39 | 10 | | 500014437 | dbxx | 10.xxx.xxx.38 | 10 | | 500014437 | dbxx | 10.xxx.xxx.37 | 10 |
感觉配置还是比较简单的,抽取大量数据库没有测试不晓得性能咋样;
大批量表数据迁移时候需要些脚本生成对应的json(配置json文件很容易出错),具体每个类型数据库的json写法参考https://github.com/alibaba/DataX
oracleread:https://github.com/alibaba/DataX/blob/master/oraclereader/doc
mysqlwrite:https://github.com/alibaba/DataX/tree/master/mysqlwriter/doc
表结构可以借助powerdesign工具实行转换,个别表单独修改;
kettle工具也能实现异构数据库的数据同步,提供操作界面拖一拖就能完成(没有实际的项目操作过,性能啥的没对比过)。
java实现oracle数据库json配置文件拼接
package com.product; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import org.springframework.jdbc.core.JdbcTemplate; import org.springframework.jdbc.core.RowMapper; import org.json.JSONObject; import java.sql.ResultSet; import java.sql.SQLException; import java.util.ArrayList; import java.util.LinkedList; import java.util.List; import java.util.Map; public class Product_json { public static void main(String[] args) throws SQLException { String xmlpath = "bean.xml"; ApplicationContext context = new ClassPathXmlApplicationContext(xmlpath); JdbcTemplate query = (JdbcTemplate) context.getBean("jdbctemplate"); String sql="select table_name from user_tables where rownum=1"; List<String> table_name = query.query(sql, new RowMapper<String>(){ public String mapRow(ResultSet rs, int rowNum) throws SQLException { return rs.getString(1); } }); sql ="select COLUMN_NAME from user_tab_cols where table_name=?"; for(int j=0;j<table_name.size();j++) { List<String> column_name = query.query(sql, new RowMapper<String>(){ public String mapRow(ResultSet rs, int rowNum) throws SQLException { return rs.getString(1); } },table_name.get(j)); //reader数据库连接信息 List<JSONObject> list_re_con = new ArrayList<>(); JSONObject json_re_con = new JSONObject(); String[] re_table = new String[] { table_name.get(j)}; String[] re_jdbcUrl = new String[] { "jdbc:oracle:thin:@192.168.0.210:1521:exam"}; json_re_con.put("jdbcUrl",re_jdbcUrl); json_re_con.put("table",re_table); list_re_con.add(json_re_con); System.out.println(json_re_con); //write 数据库连接信息 List<JSONObject> list_wr_con = new ArrayList<>(); JSONObject json_wr_con = new JSONObject(); String[] wr_table = new String[] { table_name.get(j)}; String[] wr_jdbcUrl = new String[] { "jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8"}; json_wr_con.put("jdbcUrl",wr_jdbcUrl); json_wr_con.put("table",wr_table); list_wr_con.add(json_wr_con); System.out.println(json_wr_con); List<String> list_col = new ArrayList<String>(); for(int k=0;k<column_name.size();k++){ list_col.add(column_name.get(k)); } //reader para JSONObject json_re_para = new JSONObject(); json_re_para.put("username","scott"); json_re_para.put("password","tiger"); json_re_para.put("column",list_col); json_re_para.put("connection",list_re_con); System.out.println(json_re_para); //writer para JSONObject json_wr_para = new JSONObject(); json_wr_para.put("writeMode","insert"); json_wr_para.put("username","root"); json_wr_para.put("password","123456"); json_wr_para.put("column",list_col); json_wr_para.put("preSql", new String[] { "truncate table "+table_name.get(j)}); json_wr_para.put("connection",list_wr_con); System.out.println(json_wr_para); JSONObject json_re = new JSONObject(); json_re.put("name","oraclereader"); json_re.put("parameter",json_re_para); JSONObject json_wr = new JSONObject(); json_wr.put("name","mysqlwriter"); json_wr.put("parameter",json_wr_para); List<JSONObject> a = new ArrayList<JSONObject>(); JSONObject json_content = new JSONObject(); json_content.put("reader",json_re); json_content.put("writer",json_wr); a.add(json_content); JSONObject json_job = new JSONObject(); json_job.put("content",a); System.out.println(json_job); JSONObject json = new JSONObject(); json.put("job",json_job); System.out.println(json); } } }
信息: Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 3, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 10000, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, contextClassLoaderSource -> caller, dataSourceName -> 5kk2ywabz5no2a1174n2i|33723e30, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> oracle.jdbc.driver.OracleDriver, extensions -> {}, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, forceSynchronousCheckins -> false, forceUseNamedDriverClass -> false, identityToken -> 5kk2ywabz5no2a1174n2i|33723e30, idleConnectionTestPeriod -> 30, initialPoolSize -> 3, jdbcUrl -> jdbc:oracle:thin:@192.168.0.15:1521/orcl, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 8, maxStatements -> 0, maxStatementsPerConnection -> 0, minPoolSize -> 3, numHelperThreads -> 3, preferredTestQuery -> null, privilegeSpawnedThreads -> false, properties -> {user=******, password=******}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> true, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ] {"jdbcUrl":["jdbc:oracle:thin:@192.168.0.210:1521:exam"],"table":["DEPT"]} {"jdbcUrl":["jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8"],"table":["DEPT"]} {"password":"tiger","column":["LOC","DNAME","DEPTNO"],"connection":[{"jdbcUrl":["jdbc:oracle:thin:@192.168.0.210:1521:exam"],"table":["DEPT"]}],"username":"scott"} {"password":"123456","column":["LOC","DNAME","DEPTNO"],"connection":[{"jdbcUrl":["jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8"],"table":["DEPT"]}],"writeMode":"insert","username":"root","preSql":["truncate table DEPT"]} {"content":[{"reader":{"parameter":{"password":"tiger","column":["LOC","DNAME","DEPTNO"],"connection":[{"jdbcUrl":["jdbc:oracle:thin:@192.168.0.210:1521:exam"],"table":["DEPT"]}],"username":"scott"},"name":"oraclereader"},"writer":{"parameter":{"password":"123456","column":["LOC","DNAME","DEPTNO"],"connection":[{"jdbcUrl":["jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8"],"table":["DEPT"]}],"writeMode":"insert","username":"root","preSql":["truncate table DEPT"]},"name":"mysqlwriter"}}]} {"job":{"content":[{"reader":{"parameter":{"password":"tiger","column":["LOC","DNAME","DEPTNO"],"connection":[{"jdbcUrl":["jdbc:oracle:thin:@192.168.0.210:1521:exam"],"table":["DEPT"]}],"username":"scott"},"name":"oraclereader"},"writer":{"parameter":{"password":"123456","column":["LOC","DNAME","DEPTNO"],"connection":[{"jdbcUrl":["jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8"],"table":["DEPT"]}],"writeMode":"insert","username":"root","preSql":["truncate table DEPT"]},"name":"mysqlwriter"}}]}}
实现有点土。