datax同步数据java简单用法

1. 到github下载源码,自己编译。同步数据支持mysql8.0,如果直接用编译好的会遇到各种问题。

https://github.com/alibaba/DataX/blob/master/userGuid.md

idea导入项目,需要先安装好jdk1.8+,python,maven3.x,mysql-connmaven依赖版本号修改:

把mysql-connector-java-5.1.34.jar 修改为mysql-connector-java-8.0.19.jar,这些模块都要修改

  <mysql.driver.version>8.0.19</mysql.driver.version>

DataBaseType.java修改:com.mysql.jdbc.Driver改为:com.mysql.cj.jdbc.Driver

<mysql.driver.version>8.0.19</mysql.driver.version>

MySql("mysql", "com.mysql.cj.jdbc.Driver"),
    Tddl("mysql", "com.mysql.cj.jdbc.Driver"),
    DRDS("drds", "com.mysql.cj.jdbc.Driver"),
    Oracle("oracle", "oracle.jdbc.OracleDriver"),
    SQLServer("sqlserver", "com.microsoft.sqlserver.jdbc.SQLServerDriver"),
    PostgreSQL("postgresql", "org.postgresql.Driver"),
    RDBMS("rdbms", "com.alibaba.datax.plugin.rdbms.util.DataBaseType"),
    DB2("db2", "com.ibm.db2.jcc.DB2Driver"),
    ADS("ads","com.mysql.cj.jdbc.Driver"),
    ClickHouse("clickhouse", "ru.yandex.clickhouse.ClickHouseDriver"),
    KingbaseES("kingbasees", "com.kingbase8.Driver"),
    Oscar("oscar", "com.oscar.Driver"),
    OceanBase("oceanbase", "com.alipay.oceanbase.jdbc.Driver"),
    StarRocks("starrocks", "com.mysql.cj.jdbc.Driver");

将连接属性中的zeroDateTimeBehavior=convertToNull改为zeroDateTimeBehavior=CONVERT_TO_NULL 

suffix = "yearIsDateType=false&zeroDateTimeBehavior=CONVERT_TO_NULL&tinyInt1isBit=false&rewriteBatchedStatements=true";

  重新编译:mvn -U clean package assembly:assembly -Dmaven.test.skip=true

  等待编译完成,实际大概花了18分钟,每个人可能不同。最后在target目录下生产文件。

 

  2. 新建java springboot项目

  datax:刚才编译的项目

导入依赖。datax-common-0.0.1-SNAPSHOT.jar,datax-core-0.0.1-SNAPSHOT.jar

把这2个jar导入本地maven仓库中。在idea,maven命令行窗口执行下面命令:

install:install-file -Dfile=D:\work\temp\datax-common-0.0.1-SNAPSHOT.jar -DgroupId=com.datax.common -DartifactId=common -Dversion=0.0.1 -Dpackaging=jar
install:install-file -Dfile=D:\work\temp\datax-core-0.0.1-SNAPSHOT.jar -DgroupId=com.datax.core -DartifactId=core -Dversion=0.0.1 -Dpackaging=jar

  pom.xml文件添加依赖:

<dependency>
            <groupId>com.datax.core</groupId>
            <artifactId>core</artifactId>
            <version>0.0.1</version>
        </dependency>
        <dependency>
            <groupId>com.datax.common</groupId>
            <artifactId>common</artifactId>
            <version>0.0.1</version>
        </dependency>
        <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>1.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.13</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-io</artifactId>
            <version>1.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.12.0</version>
        </dependency>
        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>2.6</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.60</version>
        </dependency>

新建test.json文件

{
  "job": {
    "setting": {
      "speed": {
        "channel": 4
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "123456",
            "connection": [
              {
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test1?serverTimezone=UTC&characterEncoding=utf8&useSSL=true&autoReconnect=true&failOverReadOnly=false"],
                "querySql": ["select t.id,t.name,t.status from user_test t where t.id>=${id}"]
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "root",
            "password": "123456",
            "writeMode": "insert",
            "column": ["id","name","status"],
            "connection": [
              {
                "table": [
                  "user_test"
                ],
                "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test2?serverTimezone=UTC&characterEncoding=utf8&useSSL=true&autoReconnect=true&failOverReadOnly=false"
              }
            ]
          }
        }
      }
    ]
  }
}

java调用代码:

@Override
    public void syncDataUser() {
        System.setProperty("datax.home","D:\\work\\project\\code2\\demo1106\\datax");
        System.setProperty("id","5");
        String path= commonUtils.getCurrentClasspath();
        String[] datxArgs2 = {"-job", path+"/datax/test.json", "-mode", "standalone", "-jobid", "-1"};
        try {
            Engine.entry(datxArgs2);
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }

datax.home:源码编译后的路径。不能错否则报错。

从mysql test1.user_test 条件:id>=5数据同步到test2.user_test

目标:

  源库表:

  执行结果:

19万+数据同步测试,只用10秒。速度非常快。

 

 package com.example.demo.service.impl;

import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
import com.example.demo.dao.UserTestDao;
import com.example.demo.entity.UserTestEntity;
import com.example.demo.service.IUserTestService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import org.springframework.util.CollectionUtils;
import org.springframework.util.StopWatch;

import java.util.ArrayList;
import java.util.List;

/**
 * TODO your comment
 *
 * @author xiaozw
 * @date 2022/11/12 19:18
 */
@Service
@Slf4j
public class UserTestServiceImpl extends ServiceImpl<UserTestDao, UserTestEntity> implements IUserTestService {

    @Override
    public void insertUserTest() {
        StopWatch sw = new StopWatch();
        sw.start();
        List<UserTestEntity> list=new ArrayList<>();
        //插入10万数据测试;
        for(int i=1;i<100000;i++){
            //创建对象
            UserTestEntity sans = new UserTestEntity();
            sans.setName("xiaozw"+i);
            sans.setStatus(0);
            list.add(sans);
            if(!CollectionUtils.isEmpty(list) && list.size()>10000){
                log.info("list数量超过1万,开始入库。。。");
                this.saveBatch(list,500);
                list.clear();
                log.info("入库完成,list清空。。。");
            }else if(i>900000){
                this.saveBatch(list);
                list.clear();
            }
        }
        sw.stop();
        System.out.println("测试耗时(秒):"+sw.getTotalTimeSeconds());

    }
}

完整项目代码下载:

 链接:https://pan.baidu.com/s/1Ooj-jvqpwa1MSaq0eCJY8g 

提取码:pzpj

 

参考资料:

alibaba/DataX github
https://github.com/alibaba/DataX/blob/master/userGuid.md
java整合datax最详细的教程
https://blog.csdn.net/Ting1king/article/details/120871137
DataX实践趟坑大全
https://blog.csdn.net/superman_wang/article/details/118635976

 

posted @ 2022-11-12 22:05  放电的牛  阅读(1619)  评论(1编辑  收藏  举报