DataX

一、资料地址

1、Git地址:https://github.com/alibaba/DataX

2、DataX详细介绍:https://github.com/alibaba/DataX/blob/master/introduction.md/

3、编译下载:https://github.com/alibaba/DataX/blob/master/userGuid.md

4、DataX数据源参考指南:https://github.com/alibaba/DataX/wiki/DataX-all-data-channels

5、插件开发教程:https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md

二、工具部署

1、源码下载

https://github.com/alibaba/DataX/archive/refs/heads/master.zip

2、执行编译命令

解压后进入源码目录,执行编译命令:
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

3、查看编译结果

编译成功后的DataX包位于:
./target/datax/datax/

4、将编译好的datax上传至/opt/module

5、查看配置模板

python datax.py -r {你的读} -w {你的写}
cd /opt/module/datax/bin
python datax.py -r streamreader -w streamwriter

6、官网小案例

(1)编写stream2stream.json

mkdir -p /opt/module/datax/data
cd /opt/module/datax/data
vim stream2stream.json

  添加如下内容:

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}

(2)启动DataX

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/stream2stream.json

(3)控制台日志

......
2022-10-17 11:21:26.003 [job-0] INFO  JobContainer - PerfTrace not enable!
2022-10-17 11:21:26.004 [job-0] INFO  StandAloneJobContainerCommunicator - Total 50 records, 950 bytes | Speed 95B/s, 5 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.002s | Percentage 100.00%
2022-10-17 11:21:26.004 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2022-10-17 11:21:15
任务结束时刻                    : 2022-10-17 11:21:26
任务总计耗时                    :                 10s
任务平均流量                    :               95B/s
记录写入速度                    :              5rec/s
读出记录总数                    :                  50
读写失败总数                    :                   0

三、数据同步

1、MySQL到MySQL

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r mysqlreader -w mysqlwriter

(2)编写json文件

mkdir -p /opt/module/datax/data/mysqlTomysql
vim /opt/module/datax/data/mysqlTomysql/mysqlTomysql.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "column": [
                            "name",
                            "sourceLabel",
                            "targetLabel",
                            "properties",
                            "nullableKeys"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户",
                        "where": ""
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": [
                            "name",
                            "sourceLabel",
                            "targetLabel",
                            "properties",
                            "nullableKeys"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "preSql": [],
                        "session": [],
                        "username": "用户",
                        "writeMode": "insert"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/mysqlTomysql/mysqlTomysql.json

2、SQLServer-->SQLServer

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r sqlserverreader -w sqlserverwriter

(2)编写json文件

mkdir -p /opt/module/datax/data/sqlserverTosqlserver
vim /opt/module/datax/data/sqlserverTosqlserver/sqlserverTosqlserver.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "sqlserverreader",
                    "parameter": {
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "memo",
                            "outmemo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:sqlserver://192.168.xxx.xxx:1433;DatabaseName=库名"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户"
                    }
                },
                "writer": {
                    "name": "sqlserverwriter",
                    "parameter": {
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "memo",
                            "outmemo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:sqlserver://192.168.xxx.xxx:1433;DatabaseName=库名",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "postSql": [],
                        "preSql": [],
                        "username": "用户"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/sqlserverTosqlserver/sqlserverTosqlserver.json

3、MySQL-->SQLServer

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r mysqlreader -w sqlserverwriter

(2)编写json文件

mkdir -p /opt/module/datax/data/mysqlTosqlserver
vim /opt/module/datax/data/mysqlTosqlserver/mysqlTosqlserver.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "column": [
                            "CompanyCode1",
                            "adate1",
                            "gid1",
                            "code1",
                            "total1",
                            "bcktotal1",
                            "outmemo1",
                            "memo1"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户",
                        "where": ""
                    }
                },
                "writer": {
                    "name": "sqlserverwriter",
                    "parameter": {
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "memo",
                            "outmemo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:sqlserver://192.168.xxx.xxx:1433;DatabaseName=库名",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "postSql": [],
                        "preSql": [],
                        "username": "用户"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/mysqlTosqlserver/mysqlTosqlserver.json

4、SQLServer-->MySQL

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r sqlserverreader -w mysqlwriter

(2)编写json文件

mkdir -p /opt/module/datax/data/sqlserverTomysql
vim /opt/module/datax/data/sqlserverTomysql/sqlserverTomysql.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "sqlserverreader",
                    "parameter": {
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "memo",
                            "outmemo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:sqlserver://192.168.xxx.xxx:1433;DatabaseName=库名"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户"
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": [
                            "CompanyCode1",
                            "adate1",
                            "gid1",
                            "code1",
                            "total1",
                            "bcktotal1",
                            "memo1",
                            "outmemo1"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "preSql": [],
                        "session": [],
                        "username": "用户",
                        "writeMode": "insert"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/sqlserverTomysql/sqlserverTomysql.json

5、MySQL-->SQLServer

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r mysqlreader -w sqlserverwriter

(2)编写json文件

mkdir -p /opt/module/datax/data/mysqlTosqlserver
vim /opt/module/datax/data/mysqlTosqlserver/mysqlTosqlserver.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "column": [
                            "CompanyCode1",
                            "adate1",
                            "gid1",
                            "code1",
                            "total1",
                            "bcktotal1",
                            "outmemo1",
                            "memo1"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户",
                        "where": ""
                    }
                },
                "writer": {
                    "name": "sqlserverwriter",
                    "parameter": {
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "memo",
                            "outmemo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:sqlserver://192.168.xxx.xxx:1433;DatabaseName=库名",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "postSql": [],
                        "preSql": [],
                        "username": "用户"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/sqlserverTosqlserver/sqlserverTosqlserver.json

6、MySQL-->HDFS

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r mysqlreader -w hdfswriter

(2)编写json文件

mkdir -p /opt/module/datax/data/mysqlTohdfs
vim /opt/module/datax/data/mysqlTohdfs/mysqlTohdfs.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "column": [
                            "CompanyCode1",
                            "adate1",
                            "gid1",
                            "code1",
                            "total1",
                            "bcktotal1",
                            "outmemo1",
                            "memo1"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户",
                        "where": ""
                    }
                },
                "writer": {
                    "name": "hdfswriter",
                    "parameter": {
                        "column": [
                            {
                                "name": "CompanyCode1",
                                "type": "string"
                            },
                            {
                                "name": "adate1",
                                "type": "string"
                            },
                            {
                                "name": "gid1",
                                "type": "int"
                            },
                            {
                                "name": "code1",
                                "type": "string"
                            },
                            {
                                "name": "total1",
                                "type": "double"
                            },
                            {
                                "name": "bcktotal1",
                                "type": "double"
                            },
                            {
                                "name": "outmemo1",
                                "type": "string"
                            },
                            {
                                "name": "memo1",
                                "type": "string"
                            }
                        ],
                        "compress": "NONE",
                        "defaultFS": "hdfs://nametest",
                        "hadoopConfig": {
                            "dfs.nameservices": "nametest",
                            "dfs.client.failover.proxy.provider.nametest": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                            "dfs.ha.automatic-failover.enabled.nametest": "true",
                            "ha.zookeeper.quorum": "192.168.xxx.xxx:2181,192.168.xxx.xxx:2181,192.168.xxx.xxx:2181",
                            "dfs.ha.namenodes.nametest": "namenode1,namenode2",
                            "dfs.namenode.rpc-address.nametest.namenode1": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode1": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode1": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode1": "192.168.xxx.xxx:9871",
                            "dfs.namenode.rpc-address.nametest.namenode2": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode2": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode2": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode2": "192.168.xxx.xxx:9871"
                    
                        },
                        "fieldDelimiter": "\t",
                        "fileName": "文件名",
                        "fileType": "ORC",
                        "path": "/",
                        "writeMode": "append"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/mysqlTohdfs/mysqlTohdfs.json

7、SQLServer-->HDFS

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r sqlserverreader -w hdfswriter

(2)编写json文件

mkdir -p /opt/module/datax/data/sqlserverTohdfs
vim /opt/module/datax/data/sqlserverTohdfs/sqlserverTohdfs.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "sqlserverreader",
                    "parameter": {
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "memo",
                            "outmemo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:sqlserver://192.168.xxx.xxx:1433;DatabaseName=库名"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户"
                    }
                },
                "writer": {
                    "name": "hdfswriter",
                    "parameter": {
                        "column": [
                            {
                                "name": "CompanyCode1",
                                "type": "string"
                            },
                            {
                                "name": "adate1",
                                "type": "string"
                            },
                            {
                                "name": "gid1",
                                "type": "int"
                            },
                            {
                                "name": "code1",
                                "type": "string"
                            },
                            {
                                "name": "total1",
                                "type": "double"
                            },
                            {
                                "name": "bcktotal1",
                                "type": "double"
                            },
                            {
                                "name": "outmemo1",
                                "type": "string"
                            },
                            {
                                "name": "memo1",
                                "type": "string"
                            }
                        ],
                        "compress": "snappy",
                        "defaultFS": "hdfs://nametest",
                        "hadoopConfig": {
                            "dfs.nameservices": "nametest",
                            "dfs.client.failover.proxy.provider.nametest": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                            "dfs.ha.automatic-failover.enabled.nametest": "true",
                            "ha.zookeeper.quorum": "192.168.xxx.xxx:2181,192.168.xxx.xxx:2181,192.168.xxx.xxx:2181",
                            "dfs.ha.namenodes.nametest": "namenode1,namenode2",
                            "dfs.namenode.rpc-address.nametest.namenode1": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode1": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode1": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode1": "192.168.xxx.xxx:9871",
                            "dfs.namenode.rpc-address.nametest.namenode2": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode2": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode2": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode2": "192.168.xxx.xxx:9871"

                        },
                        "fieldDelimiter": "\t",
                        "fileName": "文件名",
                        "fileType": "parquet",
                        "path": "/",
                        "writeMode": "append"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/sqlserverTohdfs/sqlserverTohdfs.json

8、HDFS-->SQLServer

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r hdfsreader -w sqlserverwriter

(2)编写json文件

mkdir -p /opt/module/datax/data/hdfsTosqlserver
vim /opt/module/datax/data/hdfsTosqlserver/hdfsTosqlserver.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hdfsreader",
                    "parameter": {
                        "column": [
                            {
                                "name": "CompanyCode1",
                                "type": "string"
                            },
                            {
                                "name": "adate1",
                                "type": "string"
                            },
                            {
                                "name": "gid1",
                                "type": "int"
                            },
                            {
                                "name": "code1",
                                "type": "string"
                            },
                            {
                                "name": "total1",
                                "type": "double"
                            },
                            {
                                "name": "bcktotal1",
                                "type": "double"
                            },
                            {
                                "name": "outmemo1",
                                "type": "string"
                            },
                            {
                                "name": "memo1",
                                "type": "string"
                            }
                        ],
                        "defaultFS": "hdfs://nametest",
                        "hadoopConfig": {
                            "dfs.nameservices": "nametest",
                            "dfs.client.failover.proxy.provider.nametest": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                            "dfs.ha.automatic-failover.enabled.nametest": "true",
                            "ha.zookeeper.quorum": "192.168.xxx.xxx:2181,192.168.xxx.xxx:2181,192.168.xxx.xxx:2181",
                            "dfs.ha.namenodes.nametest": "namenode1,namenode2",
                            "dfs.namenode.rpc-address.nametest.namenode1": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode1": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode1": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode1": "192.168.xxx.xxx:9871",
                            "dfs.namenode.rpc-address.nametest.namenode2": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode2": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode2": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode2": "192.168.xxx.xxx:9871"
                        },
                        "encoding": "UTF-8",
                        "fieldDelimiter": ",",
                        "fileType": "orc",
                        "path": "/*"
                    }
                },
                "writer": {
                    "name": "sqlserverwriter",
                    "parameter": {
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "memo",
                            "outmemo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:sqlserver://192.168.xxx.xxx:1433;DatabaseName=库名",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "postSql": [],
                        "preSql": [],
                        "username": "用户"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/hdfsTosqlserver/hdfsTosqlserver.json

9、Hive-->MySQL

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r hdfsreader -w mysqlwriter

(2)编写json文件

mkdir -p /opt/module/datax/data/hdfsTomysql
vim /opt/module/datax/data/hdfsTomysql/hdfsTomysql.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hdfsreader",
                    "parameter": {
                        "column": [
                            {
                                "name": "CompanyCode1",
                                "type": "string"
                            },
                            {
                                "name": "adate1",
                                "type": "string"
                            },
                            {
                                "name": "gid1",
                                "type": "int"
                            },
                            {
                                "name": "code1",
                                "type": "string"
                            },
                            {
                                "name": "total1",
                                "type": "double"
                            },
                            {
                                "name": "bcktotal1",
                                "type": "double"
                            },
                            {
                                "name": "outmemo1",
                                "type": "string"
                            },
                            {
                                "name": "dt",
                                "type": "string",
                                "value": "${dt}"
                            }
                        ],
                        "defaultFS": "hdfs://nametest",
                        "hadoopConfig": {
                            "dfs.nameservices": "nametest",
                            "dfs.client.failover.proxy.provider.nametest": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                            "dfs.ha.automatic-failover.enabled.nametest": "true",
                            "ha.zookeeper.quorum": "192.168.xxx.xxx:2181,192.168.xxx.xxx:2181,192.168.xxx.xxx:2181",
                            "dfs.ha.namenodes.nametest": "namenode1,namenode2",
                            "dfs.namenode.rpc-address.nametest.namenode1": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode1": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode1": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode1": "192.168.xxx.xxx:9871",
                            "dfs.namenode.rpc-address.nametest.namenode2": "192.168.xxx.xxx:8020",
                            "dfs.namenode.servicerpc-address.nametest.namenode2": "192.168.xxx.xxx:8022",
                            "dfs.namenode.http-address.nametest.namenode2": "192.168.xxx.xxx:9870",
                            "dfs.namenode.https-address.nametest.namenode2": "192.168.xxx.xxx:9871"
                        },
                        "encoding": "UTF-8",
                        "fieldDelimiter": ",",
                        "fileType": "orc",
                        "path": "/apps/hive/warehouse/ods.db/student/dt=${dt}/*"
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": [
                            "CompanyCode1",
                            "adate1",
                            "gid1",
                            "code1",
                            "total1",
                            "bcktotal1",
                            "memo1",
                            "dt"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "preSql": [],
                        "session": [],
                        "username": "用户",
                        "writeMode": "insert"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/hdfsTomysql/hdfsTomysql.json

10、MySQL-->Doris

(1)查看需要编写的json数据格式

cd /opt/module/datax/bin/
python datax.py -r mysqlreader -w doriswriter

(2)编写json文件

mkdir -p /opt/module/datax/data/mysqlTodoris
vim /opt/module/datax/data/mysqlTodoris/mysqlTodoris.json

  添加如下内容:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "column": [
                            "CompanyCode1",
                            "adate1",
                            "gid1",
                            "code1",
                            "total1",
                            "bcktotal1",
                            "outmemo1",
                            "memo1"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.xxx.xxx:3306/库名?autoReconnect=true&useSSL=false"
                                ],
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "password": "密码",
                        "username": "用户",
                        "where": ""
                    }
                },
                "writer": {
                    "name": "doriswriter",
                    "parameter": {
                        "beLoadUrl": [
                            "192.168.xxx.xxx:8040",
                            "192.168.xxx.xxx:8040",
                            "192.168.xxx.xxx:8040",
                            "192.168.xxx.xxx:8040"
                        ],
                        "column": [
                            "CompanyCode",
                            "adate",
                            "gid",
                            "code",
                            "total",
                            "bcktotal",
                            "outmemo",
                            "memo"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.xxx.xxx:9030/",
                                "selectedDatabase": "库名",
                                "table": [
                                    "表名"
                                ]
                            }
                        ],
                        "loadProps": {},
                        "loadUrl": [
                            "192.168.xxx.xxx:8030",
                            "192.168.xxx.xxx:8030"
                        ],
                        "password": "密码",
                        "postSql": [],
                        "preSql": [],
                        "maxBatchRows" : 10000,
                        "maxBatchByteSize" : 104857600,
                        "labelPrefix": "datax_doris_writer_demo_",
                        "lineDelimiter": "\n",
                        "username": "用户"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}
View Code

(3)执行同步

cd /opt/module/datax/bin/
python datax.py /opt/module/datax/data/mysqlTodoris/mysqlTodoris.json

 

posted @ 2022-10-19 11:49  落魄的大数据转AI小哥  阅读(883)  评论(0编辑  收藏  举报