【YashanDB知识库】Kettle迁移MySQL到YashanDB

本文内容来自YashanDB官网,原文内容请见 https://www.yashandb.com/newsinfo/7396987.html?templateId=1718516

概述

由于YMP不支持Latin1字符集MySQL数据库,导致MySQL的中文数据迁移到YashanDB,变成乱码。因此使用Kettle迁移可以规避该问题。

因此本文介绍了两种环境的Kettle使用方式进行数据迁移。这两种环境分别是Windows环境和Linux环境:Windows环境可以使用图形界面,便于调试;Linux环境一般处于源或者目标数据库直接相连的网络,所以网络性能最佳。请根据自己的环境和业务需要,使用最佳执行方式进行数据迁移。

环境

Kettle版本:8.3

JAVA版本:1.8

源MySQL:版本5.7,字符集Latin1

目标YashanDB:23.2.1.100

执行方式

Kettle既可以在Windows执行,也可以在Linux执行,请根据自己的环境和业务需要,使用最佳执行方式进行数据迁移。

Windows执行Kettle

1、确保Windows当前JAVA环境是JAVA 1.8。如果当前Windows环境存在多个JAVA版本,而默认JAVA环境不是JAVA 1.8,则可以通过Windows环境变量保证Kettle使用JAVA 1.8,例如示例:

PENTAHO_JAVA设置为C:\Program Files\Java\jre-1.8\bin\java.exe

PENTAHO_JAVA_HOME设置为C:\Program Files\Java\jre-1.8

2、解压作业zip包 - job_kettle_MySQL2YashanDB.zip,放置在Kettle所在目录。

3、在Kettle所在目录运行Spoon.bat,启动图形操作界面。

4、打开syncData_MySQL_YashanDB.kjb,这个任务是总任务,它封装了多个子任务用于从mysql迁移数据到崖山。

5、打开getDatas_MySQL_YashanDB,这个任务是实际执行从mysql迁移数据到崖山的任务,封装了DB连接,需要根据实际环境进行调整和测试,确保后续配置DB连接MySQLInput和YashanOutput的步骤都测试成功。

6、修改getDatas_MySQL_YashanDB的DB连接MySQLInput,连接类型使用MySQL,然后输入MySQL的主机地址、数据库名字、端口号、用户名和密码,点击“测试”,测试通过后选择“确认”。

7、修改getDatas_MySQL_YashanDB的DB连接YashanOutput,连接类型使用Generic database,连接方式使用Native(JDBC),设置自定义JDBC URL(jdbc:yasdb://YashanDB_IP:YashanDB_PORT/YashanDB_User)和JDBC驱动类名称,并输入用户名和密码,点击“测试”,测试通过后选择“确认”。

8、修改DB连接MySQLInput和YashanOutput后,getDatas_MySQL_YashanDB配置处于未保存状态,请务必对该任务进行保存。

9、修改tablelist_MySQL_YashanDB.csv,这个配置文件用于指定mysql需要迁移的库表名,注意这里需要和mysql的库表大小写保持一致,例如:

owner,table_name

test,sys_log

注意:1) owner这一列对应于mysql的database;2) tablelist_MySQL_YashanDB.csv最后一行不能为空行。

10、切换回syncData_MySQL_YashanDB.kjb,执行运行即可。

11、确保数据迁移作业成功完成,如果迁移失败,可查看运行日志,定位原因并修复问题,然后重试迁移即可(注意:每次同步之前会将目标库中目标表truncate,所以保证重试不会存在数据重复或者冲突的问题)。

Linux执行Kettle

1、确保Linux当前JAVA环境是JAVA 1.8。

[yashan@mysql57 ~]$ java -version

java version "1.8.0_381"

Java(TM) SE Runtime Environment (build 1.8.0_381-b09)

Java HotSpot(TM) 64-Bit Server VM (build 25.381-b09, mixed mode)

2、解压作业zip包 - job_kettle_MySQL2YashanDB.zip,放置在Kettle所在目录。

3、syncData_MySQL_YashanDB.kjb是总任务,它封装了多个子任务用于从mysql迁移数据到崖山。

4、getDatas_MySQL_YashanDB.ktr是实际执行从mysql迁移数据到崖山的任务,封装了DB连接,需要根据实际环境调整DB连接MySQLInput和YashanOutput。

5、修改getDatas_MySQL_YashanDB.ktr的DB连接MySQLInput配置。

例如,根据MySQL的连接mysql -h 127.0.0.1 -P 3306 -u root -pwelcome1 -D test

[yashan@mysql57 ~]$ mysql -h 127.0.0.1 -P 3306 -u root -pwelcome1 -D test

mysql: [Warning] Using a password on the command line interface can be insecure.

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

 

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 18

Server version: 5.7.41 MySQL Community Server (GPL)

 

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

 

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

 

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

 

mysql> exit;

Bye

[yashan@mysql57 ~]$

调整MySQLInput的主机地址、数据库名字、端口号、用户名和密码:

XML

   <connection>

    <name>MySQLInput</name>

    <server>127.0.0.1</server>

    <type>MYSQL</type>

    <access>Native</access>

    <database>test</database>

    <port>3306</port>

    <username>root</username>

    <password>welcome1</password>

    <servername/>

    <data_tablespace/>

    <index_tablespace/>

    <attributes>

      <attribute>

        <code>FORCE_IDENTIFIERS_TO_LOWERCASE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>FORCE_IDENTIFIERS_TO_UPPERCASE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>IS_CLUSTERED</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>PORT_NUMBER</code>

        <attribute>3306</attribute>

      </attribute>

      <attribute>

        <code>PRESERVE_RESERVED_WORD_CASE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>QUOTE_ALL_FIELDS</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>STREAM_RESULTS</code>

        <attribute>Y</attribute>

      </attribute>

      <attribute>

        <code>SUPPORTS_BOOLEAN_DATA_TYPE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>SUPPORTS_TIMESTAMP_DATA_TYPE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>USE_POOLING</code>

        <attribute>N</attribute>

      </attribute>

    </attributes>

  </connection>

![](https://img2024.cnblogs.com/blog/3434249/202502/3434249-20250208114803908-2005395037.png)

6、修改getDatas_MySQL_YashanDB.ktr的DB连接YashanOutput配置。

例如,根据YashanDB的连接yasql test/yasdb_123@127.0.0.1:1688

[yashan@mysql57 ~]$ yasql test/yasdb_123@127.0.0.1:1688

YashanDB SQL Enterprise Edition Release 23.2.1.100 x86_64

 

Connected to:

YashanDB Server Enterprise Edition Release 23.2.1.100 x86_64 - X86 64bit Linux

 

SQL> exit;

[yashan@mysql57 ~]$

调整YashanOutput的下列参数:

连接类型GENERIC

连接方式Native

端口号1521

用户名test

密码yasdb_123

属性-自定义JDBC连接串jdbc:yasdb://127.0.0.1:1688/test

属性-自定义JDBC驱动类名称com.yashandb.jdbc.Driver

属性-数据库方言Generic database

属性-端口1688

注意:JDBC连接串和类型请参考yashanDB官方文档:YashanDB JDBC驱动使用介绍

XML

  <connection>

    <name>YashanOutput</name>

    <server/>

    <type>GENERIC</type>

    <access>Native</access>

    <database/>

    <port>1521</port>

    <username>test</username>

    <password>yasdb_123</password>

    <servername/>

    <data_tablespace/>

    <index_tablespace/>

    <attributes>

      <attribute>

        <code>CUSTOM_DRIVER_CLASS</code>

        <attribute>com.yashandb.jdbc.Driver</attribute>

      </attribute>

      <attribute>

        <code>CUSTOM_URL</code>

        <attribute>jdbc:yasdb://127.0.0.1:1688/test</attribute>

      </attribute>

      <attribute>

        <code>DATABASE_DIALECT_ID</code>

        <attribute>Generic database</attribute>

      </attribute>

      <attribute>

        <code>FORCE_IDENTIFIERS_TO_LOWERCASE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>FORCE_IDENTIFIERS_TO_UPPERCASE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>INITIAL_POOL_SIZE</code>

        <attribute>20</attribute>

      </attribute>

      <attribute>

        <code>IS_CLUSTERED</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>MAXIMUM_POOL_SIZE</code>

        <attribute>20</attribute>

      </attribute>

      <attribute>

        <code>PORT_NUMBER</code>

        <attribute>1688</attribute>

      </attribute>

      <attribute>

        <code>PRESERVE_RESERVED_WORD_CASE</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>QUOTE_ALL_FIELDS</code>

        <attribute>N</attribute>

      </attribute>

      <attribute>

        <code>SUPPORTS_BOOLEAN_DATA_TYPE</code>

        <attribute>Y</attribute>

      </attribute>

      <attribute>

        <code>SUPPORTS_TIMESTAMP_DATA_TYPE</code>

        <attribute>Y</attribute>

      </attribute>

      <attribute>

        <code>USE_POOLING</code>

        <attribute>Y</attribute>

      </attribute>

    </attributes>

  </connection>

7、修改tablelist_MySQL_YashanDB.csv,这个配置文件用于指定mysql需要迁移的库表名,注意这里需要和mysql的库表大小写保持一致,例如:

owner,table_name

test,sys_log

注意:1) owner这一列对应于mysql的database;2) tablelist_MySQL_YashanDB.csv最后一行不能为空行。

8、运行以下命令即可。

Chmod 755 *.sh

./kitchen.sh -file=syncData_MySQL_YashanDB.kjb

9、确保数据迁移作业成功完成,如果迁移失败,可查看运行日志,定位原因并修复问题,然后重试迁移即可(注意:每次同步之前会将目标库中目标表truncate,所以保证重试不会存在数据重复或者冲突的问题)。

Bash

[yashan@mysql57 data-integration]$ ./kitchen.sh -file=syncData_MySQL_YashanDB.kjb

#######################################################################

WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable

    Consider installing the package with apt-get or yum.

    e.g. 'sudo apt-get install libwebkitgtk-1.0-0'

#######################################################################

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0

12:34:56,842 INFO [KarafBoot] Checking to see if org.pentaho.clean.karaf.cache is enabled

12:34:56,898 INFO [KarafInstance]

*******************************************************************************

*** Karaf Instance Number: 1 at /home/yashan/Kettle8/data-integration/./sys ***

*** tem/karaf/caches/kitchen/data-1 ***

*** FastBin Provider Port:52901 ***

*** Karaf Port:8802 ***

*** OSGI Service Port:9051 ***

*******************************************************************************

Jun 01, 2024 12:34:57 PM org.apache.karaf.main.Main$KarafLockCallback lockAquired

INFO: Lock acquired. Setting startlevel to 100

2024/06/01 12:34:57 - Kitchen - Start of run.

D:\poc\data-integration2\system\karaf/deploy does not exist, please create it.

Root path does not exist: /home/yashan/Kettle8/data-integration/D:\poc\data-integration2\system\karaf/deploy

*ERROR* [org.osgi.service.cm.ManagedService, id=255, bundle=52/mvn:org.apache.aries.transaction/org.apache.aries.transaction.manager/1.1.1]: Unexpected problem updating configuration org.apache.aries.transaction

java.lang.ExceptionInInitializerError

        at org.apache.aries.transaction.internal.TransactionManagerService.<init>(TransactionManagerService.java:114)

        at org.apache.aries.transaction.internal.Activator.updated(Activator.java:63)

        at org.apache.felix.cm.impl.helper.ManagedServiceTracker.updateService(ManagedServiceTracker.java:148)

        at org.apache.felix.cm.impl.helper.ManagedServiceTracker.provideConfiguration(ManagedServiceTracker.java:81)

        at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.provide(ConfigurationManager.java:1448)

        at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.run(ConfigurationManager.java:1404)

        at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)

        at java.lang.Thread.run(Thread.java:750)

Caused by: java.util.MissingResourceException: Can't find bundle for base name org.apache.aries.transaction.txManager, locale en_US

        at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1581)

        at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396)

        at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091)

        at org.apache.aries.util.nls.MessageUtil.createMessageUtil(MessageUtil.java:152)

        at org.apache.aries.util.nls.MessageUtil.createMessageUtil(MessageUtil.java:107)

        at org.apache.aries.transaction.internal.NLS.<clinit>(NLS.java:25)

        ... 8 more

2024-06-01 12:34:59.079:INFO:oejs.Server:jetty-8.1.15.v20140411

2024-06-01 12:34:59.111:INFO:oejs.AbstractConnector:Started NIOSocketConnectorWrapper@0.0.0.0:9051

Jun 01, 2024 12:34:59 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions

INFO: Adding the extensions from bundle org.apache.cxf.cxf-rt-management (182) [org.apache.cxf.management.InstrumentationManager]

Jun 01, 2024 12:34:59 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions

INFO: Adding the extensions from bundle org.apache.cxf.cxf-rt-transports-http (183) [org.apache.cxf.transport.http.HTTPTransportFactory, org.apache.cxf.transport.http.HTTPWSDLExtensionLoader, org.apache.cxf.transport.http.policy.HTTPClientAssertionBuilder, org.apache.cxf.transport.http.policy.HTTPServerAssertionBuilder, org.apache.cxf.transport.http.policy.NoOpPolicyInterceptorProvider]

Jun 01, 2024 12:34:59 PM org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1 onSuccess

INFO: New Caching Service registered

Jun 01, 2024 12:34:59 PM org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1 onSuccess

INFO: New Caching Service registered

Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.

2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Start of job execution

2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Starting entry [获取同步表列表]

2024/06/01 12:35:01 - 获取同步表列表 - Using run configuration [Pentaho local]

2024/06/01 12:35:01 - 获取同步表列表 - Using legacy execution engine

2024/06/01 12:35:01 - getTables_MySQL_YashanDB - Dispatching started for transformation [getTables_MySQL_YashanDB]

2024/06/01 12:35:01 - CSV文件输入.0 - Header row skipped in file 'file:///home/yashan/Kettle8/data-integration\tablelist_MySQL_YashanDB.csv'

2024/06/01 12:35:01 - CSV文件输入.0 - Finished processing (I=2, O=0, R=0, W=1, U=0, E=0)

2024/06/01 12:35:01 - 字段选择.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)

2024/06/01 12:35:01 - 复制记录到结果.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)

2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Starting entry [执行同步]

2024/06/01 12:35:01 - 执行同步 - Using run configuration [Pentaho local]

2024/06/01 12:35:01 - jobDatas_MySQL_YashanDB - Starting entry [获取表变量]

2024/06/01 12:35:01 - 获取表变量 - Using run configuration [Pentaho local]

2024/06/01 12:35:01 - 获取表变量 - Using legacy execution engine

2024/06/01 12:35:01 - getResults - Dispatching started for transformation [getResults]

2024/06/01 12:35:01 - 从结果获取记录.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)

2024/06/01 12:35:01 - 设置变量.0 - Setting environment variables...

2024/06/01 12:35:01 - 设置变量.0 - Set variable OWNER to value [test]

2024/06/01 12:35:01 - 设置变量.0 - Set variable TABLE_NAME to value [sys_log]

2024/06/01 12:35:01 - 设置变量.0 - Finished after 1 rows.

2024/06/01 12:35:01 - 设置变量.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)

2024/06/01 12:35:01 - jobDatas_MySQL_YashanDB - Starting entry [写入数据到目标表]

2024/06/01 12:35:02 - 写入数据到目标表 - Using run configuration [Pentaho local]

2024/06/01 12:35:02 - 写入数据到目标表 - Using legacy execution engine

2024/06/01 12:35:02 - getDatas_MySQL_YashanDB - Dispatching started for transformation [getDatas_MySQL_YashanDB]

2024/06/01 12:35:02 - YashanOutput - Creating database connection pool for 'YashanOutput'...

2024/06/01 12:35:02 - YashanOutput - Successfully created database connection pool for 'YashanOutput'

2024/06/01 12:35:02 - 表输出.0 - Connected to database [YashanOutput] (commit=20000)

2024/06/01 12:35:03 - 执行SQL脚本.0 - Finished reading query, closing connection.

2024/06/01 12:35:03 - 执行SQL脚本.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)

2024/06/01 12:35:03 - 表输入.0 - Finished reading query, closing connection.

2024/06/01 12:35:03 - 表输入.0 - Finished processing (I=9, O=0, R=0, W=9, U=0, E=0)

2024/06/01 12:35:03 - 表输出.0 - Finished processing (I=0, O=9, R=9, W=9, U=0, E=0)

2024/06/01 12:35:03 - jobDatas_MySQL_YashanDB - Finished job entry [写入数据到目标表] (result=[true])

2024/06/01 12:35:03 - jobDatas_MySQL_YashanDB - Finished job entry [获取表变量] (result=[true])

2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Starting entry [成功]

2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [成功] (result=[true])

2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [执行同步] (result=[true])

2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [获取同步表列表] (result=[true])

2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Job execution finished

2024/06/01 12:35:03 - Kitchen - Finished!

2024/06/01 12:35:03 - Kitchen - Start=2024/06/01 12:34:57.418, Stop=2024/06/01 12:35:03.414

2024/06/01 12:35:03 - Kitchen - Processing ended after 5 seconds.

posted @   YashanDB  阅读(8)  评论(0编辑  收藏  举报
编辑推荐:
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· PostgreSQL 和 SQL Server 在统计信息维护中的关键差异
· C++代码改造为UTF-8编码问题的总结
· DeepSeek 解答了困扰我五年的技术问题
阅读排行:
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· 易语言 —— 开山篇
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
点击右上角即可分享
微信分享提示