datax同步数据java简单用法
1. 到github下载源码,自己编译。同步数据支持mysql8.0,如果直接用编译好的会遇到各种问题。
https://github.com/alibaba/DataX/blob/master/userGuid.md
idea导入项目,需要先安装好jdk1.8+,python,maven3.x,mysql-connmaven依赖版本号修改:
把mysql-connector-java-5.1.34.jar 修改为mysql-connector-java-8.0.19.jar,这些模块都要修改
<mysql.driver.version>8.0.19</mysql.driver.version>
DataBaseType.java修改:com.mysql.jdbc.Driver改为:com.mysql.cj.jdbc.Driver
<mysql.driver.version>8.0.19</mysql.driver.version>
MySql("mysql", "com.mysql.cj.jdbc.Driver"),
Tddl("mysql", "com.mysql.cj.jdbc.Driver"),
DRDS("drds", "com.mysql.cj.jdbc.Driver"),
Oracle("oracle", "oracle.jdbc.OracleDriver"),
SQLServer("sqlserver", "com.microsoft.sqlserver.jdbc.SQLServerDriver"),
PostgreSQL("postgresql", "org.postgresql.Driver"),
RDBMS("rdbms", "com.alibaba.datax.plugin.rdbms.util.DataBaseType"),
DB2("db2", "com.ibm.db2.jcc.DB2Driver"),
ADS("ads","com.mysql.cj.jdbc.Driver"),
ClickHouse("clickhouse", "ru.yandex.clickhouse.ClickHouseDriver"),
KingbaseES("kingbasees", "com.kingbase8.Driver"),
Oscar("oscar", "com.oscar.Driver"),
OceanBase("oceanbase", "com.alipay.oceanbase.jdbc.Driver"),
StarRocks("starrocks", "com.mysql.cj.jdbc.Driver");
将连接属性中的zeroDateTimeBehavior=convertToNull改为zeroDateTimeBehavior=CONVERT_TO_NULL
suffix = "yearIsDateType=false&zeroDateTimeBehavior=CONVERT_TO_NULL&tinyInt1isBit=false&rewriteBatchedStatements=true";
重新编译:mvn -U clean package assembly:assembly -Dmaven.test.skip=true
等待编译完成,实际大概花了18分钟,每个人可能不同。最后在target目录下生产文件。
2. 新建java springboot项目
datax:刚才编译的项目
导入依赖。datax-common-0.0.1-SNAPSHOT.jar,datax-core-0.0.1-SNAPSHOT.jar
把这2个jar导入本地maven仓库中。在idea,maven命令行窗口执行下面命令:
install:install-file -Dfile=D:\work\temp\datax-common-0.0.1-SNAPSHOT.jar -DgroupId=com.datax.common -DartifactId=common -Dversion=0.0.1 -Dpackaging=jar
install:install-file -Dfile=D:\work\temp\datax-core-0.0.1-SNAPSHOT.jar -DgroupId=com.datax.core -DartifactId=core -Dversion=0.0.1 -Dpackaging=jar
pom.xml文件添加依赖:
<dependency> <groupId>com.datax.core</groupId> <artifactId>core</artifactId> <version>0.0.1</version> </dependency> <dependency> <groupId>com.datax.common</groupId> <artifactId>common</artifactId> <version>0.0.1</version> </dependency> <dependency> <groupId>commons-cli</groupId> <artifactId>commons-cli</artifactId> <version>1.4</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.13</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-io</artifactId> <version>1.3.2</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.12.0</version> </dependency> <dependency> <groupId>commons-lang</groupId> <artifactId>commons-lang</artifactId> <version>2.6</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.60</version> </dependency>
新建test.json文件
{ "job": { "setting": { "speed": { "channel": 4 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "123456", "connection": [ { "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test1?serverTimezone=UTC&characterEncoding=utf8&useSSL=true&autoReconnect=true&failOverReadOnly=false"], "querySql": ["select t.id,t.name,t.status from user_test t where t.id>=${id}"] } ] } }, "writer": { "name": "mysqlwriter", "parameter": { "username": "root", "password": "123456", "writeMode": "insert", "column": ["id","name","status"], "connection": [ { "table": [ "user_test" ], "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test2?serverTimezone=UTC&characterEncoding=utf8&useSSL=true&autoReconnect=true&failOverReadOnly=false" } ] } } } ] } }
java调用代码:
@Override public void syncDataUser() { System.setProperty("datax.home","D:\\work\\project\\code2\\demo1106\\datax"); System.setProperty("id","5"); String path= commonUtils.getCurrentClasspath(); String[] datxArgs2 = {"-job", path+"/datax/test.json", "-mode", "standalone", "-jobid", "-1"}; try { Engine.entry(datxArgs2); } catch (Throwable e) { e.printStackTrace(); } }
datax.home:源码编译后的路径。不能错否则报错。
从mysql test1.user_test 条件:id>=5数据同步到test2.user_test
目标:
源库表:
执行结果:
19万+数据同步测试,只用10秒。速度非常快。
package com.example.demo.service.impl;
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl; import com.example.demo.dao.UserTestDao; import com.example.demo.entity.UserTestEntity; import com.example.demo.service.IUserTestService; import lombok.extern.slf4j.Slf4j; import org.springframework.stereotype.Service; import org.springframework.util.CollectionUtils; import org.springframework.util.StopWatch; import java.util.ArrayList; import java.util.List; /** * TODO your comment * * @author xiaozw * @date 2022/11/12 19:18 */ @Service @Slf4j public class UserTestServiceImpl extends ServiceImpl<UserTestDao, UserTestEntity> implements IUserTestService { @Override public void insertUserTest() { StopWatch sw = new StopWatch(); sw.start(); List<UserTestEntity> list=new ArrayList<>(); //插入10万数据测试; for(int i=1;i<100000;i++){ //创建对象 UserTestEntity sans = new UserTestEntity(); sans.setName("xiaozw"+i); sans.setStatus(0); list.add(sans); if(!CollectionUtils.isEmpty(list) && list.size()>10000){ log.info("list数量超过1万,开始入库。。。"); this.saveBatch(list,500); list.clear(); log.info("入库完成,list清空。。。"); }else if(i>900000){ this.saveBatch(list); list.clear(); } } sw.stop(); System.out.println("测试耗时(秒):"+sw.getTotalTimeSeconds()); } }
完整项目代码下载:
链接:https://pan.baidu.com/s/1Ooj-jvqpwa1MSaq0eCJY8g
提取码:pzpj
参考资料:
alibaba/DataX github
https://github.com/alibaba/DataX/blob/master/userGuid.md
java整合datax最详细的教程
https://blog.csdn.net/Ting1king/article/details/120871137
DataX实践趟坑大全
https://blog.csdn.net/superman_wang/article/details/118635976
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!