【YashanDB知识库】Kettle迁移MySQL到YashanDB
本文内容来自YashanDB官网,原文内容请见 https://www.yashandb.com/newsinfo/7396987.html?templateId=1718516
概述
由于YMP不支持Latin1字符集MySQL数据库,导致MySQL的中文数据迁移到YashanDB,变成乱码。因此使用Kettle迁移可以规避该问题。
因此本文介绍了两种环境的Kettle使用方式进行数据迁移。这两种环境分别是Windows环境和Linux环境:Windows环境可以使用图形界面,便于调试;Linux环境一般处于源或者目标数据库直接相连的网络,所以网络性能最佳。请根据自己的环境和业务需要,使用最佳执行方式进行数据迁移。
环境
Kettle版本:8.3
JAVA版本:1.8
源MySQL:版本5.7,字符集Latin1
目标YashanDB:23.2.1.100
执行方式
Kettle既可以在Windows执行,也可以在Linux执行,请根据自己的环境和业务需要,使用最佳执行方式进行数据迁移。
Windows执行Kettle
1、确保Windows当前JAVA环境是JAVA 1.8。如果当前Windows环境存在多个JAVA版本,而默认JAVA环境不是JAVA 1.8,则可以通过Windows环境变量保证Kettle使用JAVA 1.8,例如示例:
PENTAHO_JAVA设置为C:\Program Files\Java\jre-1.8\bin\java.exe
PENTAHO_JAVA_HOME设置为C:\Program Files\Java\jre-1.8
2、解压作业zip包 - job_kettle_MySQL2YashanDB.zip,放置在Kettle所在目录。
3、在Kettle所在目录运行Spoon.bat,启动图形操作界面。
4、打开syncData_MySQL_YashanDB.kjb,这个任务是总任务,它封装了多个子任务用于从mysql迁移数据到崖山。
5、打开getDatas_MySQL_YashanDB,这个任务是实际执行从mysql迁移数据到崖山的任务,封装了DB连接,需要根据实际环境进行调整和测试,确保后续配置DB连接MySQLInput和YashanOutput的步骤都测试成功。
6、修改getDatas_MySQL_YashanDB的DB连接MySQLInput,连接类型使用MySQL,然后输入MySQL的主机地址、数据库名字、端口号、用户名和密码,点击“测试”,测试通过后选择“确认”。
7、修改getDatas_MySQL_YashanDB的DB连接YashanOutput,连接类型使用Generic database,连接方式使用Native(JDBC),设置自定义JDBC URL(jdbc:yasdb://YashanDB_IP:YashanDB_PORT/YashanDB_User)和JDBC驱动类名称,并输入用户名和密码,点击“测试”,测试通过后选择“确认”。
8、修改DB连接MySQLInput和YashanOutput后,getDatas_MySQL_YashanDB配置处于未保存状态,请务必对该任务进行保存。
9、修改tablelist_MySQL_YashanDB.csv,这个配置文件用于指定mysql需要迁移的库表名,注意这里需要和mysql的库表大小写保持一致,例如:
owner,table_name
test,sys_log
注意:1) owner这一列对应于mysql的database;2) tablelist_MySQL_YashanDB.csv最后一行不能为空行。
10、切换回syncData_MySQL_YashanDB.kjb,执行运行即可。
11、确保数据迁移作业成功完成,如果迁移失败,可查看运行日志,定位原因并修复问题,然后重试迁移即可(注意:每次同步之前会将目标库中目标表truncate,所以保证重试不会存在数据重复或者冲突的问题)。
Linux执行Kettle
1、确保Linux当前JAVA环境是JAVA 1.8。
[yashan@mysql57 ~]$ java -version java version "1.8.0_381" Java(TM) SE Runtime Environment (build 1.8.0_381-b09) Java HotSpot(TM) 64-Bit Server VM (build 25.381-b09, mixed mode) |
2、解压作业zip包 - job_kettle_MySQL2YashanDB.zip,放置在Kettle所在目录。
3、syncData_MySQL_YashanDB.kjb是总任务,它封装了多个子任务用于从mysql迁移数据到崖山。
4、getDatas_MySQL_YashanDB.ktr是实际执行从mysql迁移数据到崖山的任务,封装了DB连接,需要根据实际环境调整DB连接MySQLInput和YashanOutput。
5、修改getDatas_MySQL_YashanDB.ktr的DB连接MySQLInput配置。
例如,根据MySQL的连接mysql -h 127.0.0.1 -P 3306 -u root -pwelcome1 -D test
[yashan@mysql57 ~]$ mysql -h 127.0.0.1 -P 3306 -u root -pwelcome1 -D test mysql: [Warning] Using a password on the command line interface can be insecure. Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 18 Server version: 5.7.41 MySQL Community Server (GPL)
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> exit; Bye [yashan@mysql57 ~]$ |
调整MySQLInput的主机地址、数据库名字、端口号、用户名和密码:
XML <connection> <name>MySQLInput</name> <server>127.0.0.1</server> <type>MYSQL</type> <access>Native</access> <database>test</database> <port>3306</port> <username>root</username> <password>welcome1</password> <servername/> <data_tablespace/> <index_tablespace/> <attributes> <attribute> <code>FORCE_IDENTIFIERS_TO_LOWERCASE</code> <attribute>N</attribute> </attribute> <attribute> <code>FORCE_IDENTIFIERS_TO_UPPERCASE</code> <attribute>N</attribute> </attribute> <attribute> <code>IS_CLUSTERED</code> <attribute>N</attribute> </attribute> <attribute> <code>PORT_NUMBER</code> <attribute>3306</attribute> </attribute> <attribute> <code>PRESERVE_RESERVED_WORD_CASE</code> <attribute>N</attribute> </attribute> <attribute> <code>QUOTE_ALL_FIELDS</code> <attribute>N</attribute> </attribute> <attribute> <code>STREAM_RESULTS</code> <attribute>Y</attribute> </attribute> <attribute> <code>SUPPORTS_BOOLEAN_DATA_TYPE</code> <attribute>N</attribute> </attribute> <attribute> <code>SUPPORTS_TIMESTAMP_DATA_TYPE</code> <attribute>N</attribute> </attribute> <attribute> <code>USE_POOLING</code> <attribute>N</attribute> </attribute> </attributes> </connection> |
6、修改getDatas_MySQL_YashanDB.ktr的DB连接YashanOutput配置。
例如,根据YashanDB的连接yasql test/yasdb_123@127.0.0.1:1688
[yashan@mysql57 ~]$ yasql test/yasdb_123@127.0.0.1:1688 YashanDB SQL Enterprise Edition Release 23.2.1.100 x86_64
Connected to: YashanDB Server Enterprise Edition Release 23.2.1.100 x86_64 - X86 64bit Linux
SQL> exit; [yashan@mysql57 ~]$ |
调整YashanOutput的下列参数:
连接类型
连接方式
端口号
用户名
密码
属性-自定义JDBC连接串
属性-自定义JDBC驱动类名称
属性-数据库方言
属性-端口
注意:JDBC连接串和类型请参考yashanDB官方文档:YashanDB JDBC驱动使用介绍
XML <connection> <name>YashanOutput</name> <server/> <type>GENERIC</type> <access>Native</access> <database/> <port>1521</port> <username>test</username> <password>yasdb_123</password> <servername/> <data_tablespace/> <index_tablespace/> <attributes> <attribute> <code>CUSTOM_DRIVER_CLASS</code> <attribute>com.yashandb.jdbc.Driver</attribute> </attribute> <attribute> <code>CUSTOM_URL</code> <attribute>jdbc:yasdb://127.0.0.1:1688/test</attribute> </attribute> <attribute> <code>DATABASE_DIALECT_ID</code> <attribute>Generic database</attribute> </attribute> <attribute> <code>FORCE_IDENTIFIERS_TO_LOWERCASE</code> <attribute>N</attribute> </attribute> <attribute> <code>FORCE_IDENTIFIERS_TO_UPPERCASE</code> <attribute>N</attribute> </attribute> <attribute> <code>INITIAL_POOL_SIZE</code> <attribute>20</attribute> </attribute> <attribute> <code>IS_CLUSTERED</code> <attribute>N</attribute> </attribute> <attribute> <code>MAXIMUM_POOL_SIZE</code> <attribute>20</attribute> </attribute> <attribute> <code>PORT_NUMBER</code> <attribute>1688</attribute> </attribute> <attribute> <code>PRESERVE_RESERVED_WORD_CASE</code> <attribute>N</attribute> </attribute> <attribute> <code>QUOTE_ALL_FIELDS</code> <attribute>N</attribute> </attribute> <attribute> <code>SUPPORTS_BOOLEAN_DATA_TYPE</code> <attribute>Y</attribute> </attribute> <attribute> <code>SUPPORTS_TIMESTAMP_DATA_TYPE</code> <attribute>Y</attribute> </attribute> <attribute> <code>USE_POOLING</code> <attribute>Y</attribute> </attribute> </attributes> </connection> |
7、修改tablelist_MySQL_YashanDB.csv,这个配置文件用于指定mysql需要迁移的库表名,注意这里需要和mysql的库表大小写保持一致,例如:
owner,table_name
test,sys_log
注意:1) owner这一列对应于mysql的database;2) tablelist_MySQL_YashanDB.csv最后一行不能为空行。
8、运行以下命令即可。
Chmod 755 *.sh
./kitchen.sh -file=syncData_MySQL_YashanDB.kjb
9、确保数据迁移作业成功完成,如果迁移失败,可查看运行日志,定位原因并修复问题,然后重试迁移即可(注意:每次同步之前会将目标库中目标表truncate,所以保证重试不会存在数据重复或者冲突的问题)。
Bash [yashan@mysql57 data-integration]$ ./kitchen.sh -file=syncData_MySQL_YashanDB.kjb ####################################################################### WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable Consider installing the package with apt-get or yum. e.g. 'sudo apt-get install libwebkitgtk-1.0-0' ####################################################################### Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0 12:34:56,842 INFO [KarafBoot] Checking to see if org.pentaho.clean.karaf.cache is enabled 12:34:56,898 INFO [KarafInstance] ******************************************************************************* *** Karaf Instance Number: 1 at /home/yashan/Kettle8/data-integration/./sys *** *** tem/karaf/caches/kitchen/data-1 *** *** FastBin Provider Port:52901 *** *** Karaf Port:8802 *** *** OSGI Service Port:9051 *** ******************************************************************************* Jun 01, 2024 12:34:57 PM org.apache.karaf.main.Main$KarafLockCallback lockAquired INFO: Lock acquired. Setting startlevel to 100 2024/06/01 12:34:57 - Kitchen - Start of run. D:\poc\data-integration2\system\karaf/deploy does not exist, please create it. Root path does not exist: /home/yashan/Kettle8/data-integration/D:\poc\data-integration2\system\karaf/deploy *ERROR* [org.osgi.service.cm.ManagedService, id=255, bundle=52/mvn:org.apache.aries.transaction/org.apache.aries.transaction.manager/1.1.1]: Unexpected problem updating configuration org.apache.aries.transaction java.lang.ExceptionInInitializerError at org.apache.aries.transaction.internal.TransactionManagerService.<init>(TransactionManagerService.java:114) at org.apache.aries.transaction.internal.Activator.updated(Activator.java:63) at org.apache.felix.cm.impl.helper.ManagedServiceTracker.updateService(ManagedServiceTracker.java:148) at org.apache.felix.cm.impl.helper.ManagedServiceTracker.provideConfiguration(ManagedServiceTracker.java:81) at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.provide(ConfigurationManager.java:1448) at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.run(ConfigurationManager.java:1404) at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) at java.lang.Thread.run(Thread.java:750) Caused by: java.util.MissingResourceException: Can't find bundle for base name org.apache.aries.transaction.txManager, locale en_US at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1581) at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396) at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091) at org.apache.aries.util.nls.MessageUtil.createMessageUtil(MessageUtil.java:152) at org.apache.aries.util.nls.MessageUtil.createMessageUtil(MessageUtil.java:107) at org.apache.aries.transaction.internal.NLS.<clinit>(NLS.java:25) ... 8 more 2024-06-01 12:34:59.079:INFO:oejs.Server:jetty-8.1.15.v20140411 2024-06-01 12:34:59.111:INFO:oejs.AbstractConnector:Started NIOSocketConnectorWrapper@0.0.0.0:9051 Jun 01, 2024 12:34:59 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions INFO: Adding the extensions from bundle org.apache.cxf.cxf-rt-management (182) [org.apache.cxf.management.InstrumentationManager] Jun 01, 2024 12:34:59 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions INFO: Adding the extensions from bundle org.apache.cxf.cxf-rt-transports-http (183) [org.apache.cxf.transport.http.HTTPTransportFactory, org.apache.cxf.transport.http.HTTPWSDLExtensionLoader, org.apache.cxf.transport.http.policy.HTTPClientAssertionBuilder, org.apache.cxf.transport.http.policy.HTTPServerAssertionBuilder, org.apache.cxf.transport.http.policy.NoOpPolicyInterceptorProvider] Jun 01, 2024 12:34:59 PM org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1 onSuccess INFO: New Caching Service registered Jun 01, 2024 12:34:59 PM org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1 onSuccess INFO: New Caching Service registered Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Start of job execution 2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Starting entry [获取同步表列表] 2024/06/01 12:35:01 - 获取同步表列表 - Using run configuration [Pentaho local] 2024/06/01 12:35:01 - 获取同步表列表 - Using legacy execution engine 2024/06/01 12:35:01 - getTables_MySQL_YashanDB - Dispatching started for transformation [getTables_MySQL_YashanDB] 2024/06/01 12:35:01 - CSV文件输入.0 - Header row skipped in file 'file:///home/yashan/Kettle8/data-integration\tablelist_MySQL_YashanDB.csv' 2024/06/01 12:35:01 - CSV文件输入.0 - Finished processing (I=2, O=0, R=0, W=1, U=0, E=0) 2024/06/01 12:35:01 - 字段选择.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0) 2024/06/01 12:35:01 - 复制记录到结果.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0) 2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Starting entry [执行同步] 2024/06/01 12:35:01 - 执行同步 - Using run configuration [Pentaho local] 2024/06/01 12:35:01 - jobDatas_MySQL_YashanDB - Starting entry [获取表变量] 2024/06/01 12:35:01 - 获取表变量 - Using run configuration [Pentaho local] 2024/06/01 12:35:01 - 获取表变量 - Using legacy execution engine 2024/06/01 12:35:01 - getResults - Dispatching started for transformation [getResults] 2024/06/01 12:35:01 - 从结果获取记录.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0) 2024/06/01 12:35:01 - 设置变量.0 - Setting environment variables... 2024/06/01 12:35:01 - 设置变量.0 - Set variable OWNER to value [test] 2024/06/01 12:35:01 - 设置变量.0 - Set variable TABLE_NAME to value [sys_log] 2024/06/01 12:35:01 - 设置变量.0 - Finished after 1 rows. 2024/06/01 12:35:01 - 设置变量.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0) 2024/06/01 12:35:01 - jobDatas_MySQL_YashanDB - Starting entry [写入数据到目标表] 2024/06/01 12:35:02 - 写入数据到目标表 - Using run configuration [Pentaho local] 2024/06/01 12:35:02 - 写入数据到目标表 - Using legacy execution engine 2024/06/01 12:35:02 - getDatas_MySQL_YashanDB - Dispatching started for transformation [getDatas_MySQL_YashanDB] 2024/06/01 12:35:02 - YashanOutput - Creating database connection pool for 'YashanOutput'... 2024/06/01 12:35:02 - YashanOutput - Successfully created database connection pool for 'YashanOutput' 2024/06/01 12:35:02 - 表输出.0 - Connected to database [YashanOutput] (commit=20000) 2024/06/01 12:35:03 - 执行SQL脚本.0 - Finished reading query, closing connection. 2024/06/01 12:35:03 - 执行SQL脚本.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0) 2024/06/01 12:35:03 - 表输入.0 - Finished reading query, closing connection. 2024/06/01 12:35:03 - 表输入.0 - Finished processing (I=9, O=0, R=0, W=9, U=0, E=0) 2024/06/01 12:35:03 - 表输出.0 - Finished processing (I=0, O=9, R=9, W=9, U=0, E=0) 2024/06/01 12:35:03 - jobDatas_MySQL_YashanDB - Finished job entry [写入数据到目标表] (result=[true]) 2024/06/01 12:35:03 - jobDatas_MySQL_YashanDB - Finished job entry [获取表变量] (result=[true]) 2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Starting entry [成功] 2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [成功] (result=[true]) 2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [执行同步] (result=[true]) 2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [获取同步表列表] (result=[true]) 2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Job execution finished 2024/06/01 12:35:03 - Kitchen - Finished! 2024/06/01 12:35:03 - Kitchen - Start=2024/06/01 12:34:57.418, Stop=2024/06/01 12:35:03.414 2024/06/01 12:35:03 - Kitchen - Processing ended after 5 seconds. |
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· PostgreSQL 和 SQL Server 在统计信息维护中的关键差异
· C++代码改造为UTF-8编码问题的总结
· DeepSeek 解答了困扰我五年的技术问题
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· 易语言 —— 开山篇
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比