DataX二次开发详解-Kingbasees86Reader、Kingbasees86Writer插件
一、前提
国产数据库的崛起元年,不得不提人大金仓(Kingbase)、南大通用数据库(Gbase)、达梦数据库(DM)、华为数据库(GaussDB)、阿里数据库(Oceanbase)等,此文章介绍采用datax作为同步人大金仓Kingbase86数据库的工具。目前github上的datax版本功能仅支持Kingbase82系列产品。而项目上如果要用Kingbase86版本作为数据库,所以要对Datax源码进行二次开发,自己构建Kingbasees86Reader和Kingbasees86Writer插件。
二、实施
Kingbase的背景不赘述,同样基于JDBC协议进行远程连接数据库并执行相应的SQL语句将数据从KingbaseES库中SELECT出来,以前玩过Datax工具的同学可以简单把Kingbase理解成Mysql的同步脚本。
2.1 Kingbasees86Reader插件开发
目前Kingbasees86Reader支持大部分KingbaseES类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。
下面列出Kingbasees86Reader针对KingbaseES类型转换列表:
DataX内部类型 | KingbaseES数据类型 |
Long | bigint, bigserial, integer, smallint, serial |
Double | double precision, money, numeric, real |
String | varchar, char, text, bit, inet |
Date | date, time, timestamp |
Boolean | bool |
Bytes | bytea |
2.1.1 配置样例
下面是一个从KingbaseES数据库中同步抽取数据到本地作业的展示脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | { "job" : { "setting" : { "speed" : { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte" : 1048576 }, //出错限制 "errorLimit" : { //出错的record条数上限,当大于该值即报错。 "record" : 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage" : 0.02 } }, "content" : [ { "reader" : { "name" : "kingbasees86reader" , "parameter" : { // 数据库连接用户名 "username" : "xx" , // 数据库连接密码 "password" : "xx" , "column" : [ "id" , "name" ], //切分主键 "splitPk" : "id" , "connection" : [ { "table" : [ "table" ], "jdbcUrl" : [ "jdbc:kingbase86://host:port/database" ] } ] } }, "writer" : { //writer类型 "name" : "streamwriter" , //是否打印内容 "parameter" : { "print" : true , } } } ] } } |
纯净版(验证可执行通过)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | { "job" : { "setting" : { "speed" : { "byte" : 1048576 }, "errorLimit" : { "record" : 0, "percentage" : 0.02 } }, "content" : [ { "reader" : { "name" : "kingbasees86reader" , "parameter" : { "username" : "root" , "password" : "123456" , "column" : [ "id" , "name" ], "splitPk" : "id" , "connection" : [ { "table" : [ "t1" ], "jdbcUrl" : [ "jdbc:kingbase8://192.168.12.104:54321/test" ] } ] } }, "writer" : { "name" : "streamwriter" , "parameter" : { "print" : true } } } ] } } |
配置一个自定义SQL的数据库同步任务到本地内容的作业:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | { "job" : { "setting" : { "speed" : 1048576 }, "content" : [ { "reader" : { "name" : "kingbasees86reader" , "parameter" : { "username" : "xx" , "password" : "xx" , "where" : "" , "connection" : [ { "querySql" : [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl" : [ "jdbc:kingbase86://host:port/database" , "jdbc:kingbase86://host:port/database" ] } ] } }, "writer" : { "name" : "streamwriter" , "parameter" : { "print" : false , "encoding" : "UTF-8" } } } ] } } |
2.1.2 代码实现
代码架构
代码-package.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | <assembly xmlns= "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0" xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd" > <id></id> <formats> <format>dir</format> </formats> <includeBaseDirectory> false </includeBaseDirectory> <fileSets> <fileSet> <directory>src/main/resources</directory> <includes> <include>plugin.json</include> <include>plugin_job_template.json</include> </includes> <outputDirectory>plugin/reader/kingbasees86reader</outputDirectory> </fileSet> <fileSet> <directory>target/</directory> <includes> <include>kingbasees86reader-0.0.1-SNAPSHOT.jar</include> </includes> <outputDirectory>plugin/reader/kingbasees86reader</outputDirectory> </fileSet> <fileSet> <directory>src/main/libs</directory> <includes> <include>*.*</include> </includes> <outputDirectory>plugin/reader/kingbasees86reader/libs</outputDirectory> </fileSet> </fileSets> <dependencySets> <dependencySet> <useProjectArtifact> false </useProjectArtifact> <outputDirectory>plugin/reader/kingbasees86reader/libs</outputDirectory> <scope>runtime</scope> </dependencySet> </dependencySets> </assembly> |
代码-Constant
1 2 3 4 5 6 7 | package com.alibaba.datax.plugin.reader.kingbasees86reader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1000 ; } |
代码-Kingbasees86Reader
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | package com.alibaba.datax.plugin.reader.kingbasees86reader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class Kingbasees86Reader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES86; public static class Job extends Reader.Job { private Configuration originalConfig; private CommonRdbmsReader.Job commonRdbmsReaderMaster; @Override public void init() { this .originalConfig = super .getPluginJobConf(); int fetchSize = this .originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1 ) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, String.format( "您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1." , fetchSize)); } this .originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); this .commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); this .commonRdbmsReaderMaster.init( this .originalConfig); } @Override public List<Configuration> split( int adviceNumber) { return this .commonRdbmsReaderMaster.split( this .originalConfig, adviceNumber); } @Override public void post() { this .commonRdbmsReaderMaster.post( this .originalConfig); } @Override public void destroy() { this .commonRdbmsReaderMaster.destroy( this .originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderSlave; @Override public void init() { this .readerSliceConfig = super .getPluginJobConf(); this .commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super .getTaskGroupId(), super .getTaskId()); this .commonRdbmsReaderSlave.init( this .readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this .readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this .commonRdbmsReaderSlave.startRead( this .readerSliceConfig, recordSender, super .getTaskPluginCollector(), fetchSize); } @Override public void post() { this .commonRdbmsReaderSlave.post( this .readerSliceConfig); } @Override public void destroy() { this .commonRdbmsReaderSlave.destroy( this .readerSliceConfig); } } } |
代码-plugin.json
1 2 3 4 5 6 | { "name" : "kingbasees86reader" , "class" : "com.alibaba.datax.plugin.reader.kingbasees86reader.Kingbasees86Reader" , "description" : "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter." , "developer" : "alibaba" } |
代码-plugin_job_template.json
1 2 3 4 5 6 7 8 9 10 11 12 13 | { "name" : "kingbasees86reader" , "parameter" : { "username" : "" , "password" : "" , "connection" : [ { "table" : [], "jdbcUrl" : [] } ] } } |
注意1:在根目录的package.xml文件下添加
1 2 3 4 5 6 7 | <fileSet> <directory>kingbasees86reader/target/datax/</directory> <includes> <include>**/*.*</include> </includes> <outputDirectory>datax</outputDirectory> </fileSet> |
注意2:在根目录的pom.xml文件下添加
1 | <module>kingbasees86reader</module> |
注意3:在DataBaseType中注册Reader信息
2.1.3 打包上传
可以在根目录下注释掉不需要的module,加速打包过程。
将下面的几个文件复制到Kingbase安装目录下对应的plugin文件夹下
2.1.4 KingbaseES创建测试表
注意:需要先启动kingbase Server服务以及检查防火墙是否关闭
启动Kingbase Server服务
cd /opt/Kingbase/ES/V8/Server/bin
./sys_ctl start -D /opt/Kingbase/ES/V8/data
2.1.5 执行DataX同步脚本进行测试
2.1.6 可能遇到的问题
Description:[DataX引擎配置错误,该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数
解决办法:
进入DataX安装目录,修改文件datax/conf/core.json
修改core -> transport -> channel -> speed -> “byte”: 2000000,将单个channel的大小改为2MB即可。
补充部分
从Kingbase中读取数据并写入Mysql中
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | { "job" : { "setting" : { "speed" : { "byte" : 1048576 }, "errorLimit" : { "record" : 0 , "percentage" : 0.02 } }, "content" : [ { "reader" : { "name" : "kingbasees86reader" , "parameter" : { "username" : "root" , "password" : "123456" , "column" : [ "id" , "name" ], "splitPk" : "id" , "connection" : [ { "table" : [ "t1" ], "jdbcUrl" : [ "jdbc:kingbase8://192.168.12.104:54321/prod" ] } ] } }, "writer" : { "parameter" : { "username" : "root" , "password" : "123456" , "writeMode" : "insert" , "connection" : [ { "table" : [ "t1" ], "jdbcUrl" : "jdbc:mysql://hadoop101:3306/gmall_report?useUnicode=true&characterEncoding=utf-8" } ], "column" : [ "id" , "name" ] }, "name" : "mysqlwriter" } } ] } } |
相关文档
github:https://github.com/alibaba/DataX/tree/master
https://gitee.com/mirrors/DataX/tree/master
阿里云Maven仓库:https://developer.aliyun.com/mvn/search
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现
· 25岁的心里话