Kettle 抽取源表数据加载到目标表中

目录

 

前景

目的

步骤说明

1. 获取目标库信息

2. 创建输出步骤,将数据输出到目标表

3. 将输出步骤和上一步关联起来

精简版代码

完整代码


前景

初始化数据和环境等参考:https://blog.csdn.net/lw18751836671/article/details/103593363

目的

从mysql的源表[etl_src_table]中抽取数据到目标表[etl_dest_table] ,两个表的结构都是相同的。

CREATE TABLE `etl_dest_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `age` int(3) DEFAULT NULL,
  `mail` varchar(40) DEFAULT NULL,
  `phone` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;

 

步骤说明

在上一篇kettle介绍中是将替换步骤[replaceStringStepMeta]添加到了json步骤中,然后输出为json,如果要加载到目标表中,那么就不需要将替换步骤和json输出步骤连接起来,而是将替换步骤和表输出步骤连接。

1. 获取目标库信息

DatabaseConn destDB = extractBean.getDestDB();
		final String destDBName = "destDB";
		DatabaseMeta destDbMeta = new DatabaseMeta(destDBName, destDB.getType(),
				destDB.getAccess(), destDB.getServer(), destDB.getDatabase(),
				destDB.getPort(), destDB.getUsername(), destDB.getPassword());

2. 创建输出步骤,将数据输出到目标表

TableOutputMeta tableOutputMeta = new TableOutputMeta();
		tableOutputMeta.setDatabaseMeta(destDbMeta);
		
		//设置目标表的 schema和表名
		tableOutputMeta.setSchemaName(null);
		tableOutputMeta.setTablename(extractBean.getDestTable());
		
		//指定目标表数据库字段
		tableOutputMeta.setSpecifyFields(true);
		/**
		 * 在此处如果是oracle中含有大字段的有一个问题:
		 * ORA-24816: 在实际的 LONG 或 LOB 列之后提供了扩展的非 LONG 绑定数据
		解决方法:在实际的 LONG 或 LOB 列之后提供了扩展的非 LONG 绑定数据错误,
		这个错误是因为在绑定参数时把数据库中字段类型为LONG的字段放置在其他字段前设置了,
		只要将类型为Lob的字段绑定参数时放在最后设置即可解决。
		
		即设置  destFields为:  long ,clob,blob这样的顺序,long类型等不能放在clob后面
		 */
		tableOutputMeta.setFieldStream(extractBean.getDestFields());
		tableOutputMeta.setFieldDatabase(extractBean.getDestFields());
		
		String tableOutputPluginId = registry.getPluginId(StepPluginType.class, tableOutputMeta);
		StepMeta tableOutputStep = new StepMeta(tableOutputPluginId, "tableOutput" , (StepMetaInterface) tableOutputMeta);
		
		//将步骤添加进去
		transMeta.addStep(tableOutputStep);

3. 将输出步骤和上一步关联起来

//将步骤和上一步关联起来
transMeta.addTransHop(new TransHopMeta(replaceStringStepMeta, tableOutputStep));

执行方法后,就会将etl_src_table中数据抽取加载到etl_dest_table表中

打印日志如下: 

九月 18, 2020 11:31:41 上午 org.apache.commons.vfs.impl.StandardFileSystemManager info
信息: Using "C:\Users\ADMINI~1\AppData\Local\Temp\vfs_cache" as temporary files store.
select name,age,mail,phone from etl_src_table
2020/09/18 11:31:42 - 全量抽取数据 - 为了转换解除补丁开始  [全量抽取数据]
Fri Sep 18 11:31:42 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Sep 18 11:31:42 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020/09/18 11:31:42 - tableOutput.0 - Connected to database [destDB] (commit=1000)
2020/09/18 11:31:55 - tableInput.0 - Finished reading query, closing connection.
2020/09/18 11:31:55 - tableInput.0 - 完成处理 (I=2, O=0, R=0, W=2, U=0, E=0
2020/09/18 11:31:55 - selectValues.0 - 完成处理 (I=0, O=0, R=2, W=2, U=0, E=0
2020/09/18 11:31:55 - replaceString.0 - 完成处理 (I=0, O=0, R=2, W=2, U=0, E=0
2020/09/18 11:31:55 - tableOutput.0 - 完成处理 (I=0, O=2, R=2, W=2, U=0, E=0
转换执行完成
抽取加载数据完成

 

 

 

 

 

 

 

 

 

 

精简版代码

没有替换变量,选择字段等,纯粹的关联 input和output步骤

//将步骤和上一步关联起来
transMeta.addTransHop(new TransHopMeta(tableInputStepMeta, tableOutputStep));

完整的方法如下 

/**
	 * 5. 将源表数据拷贝到目标表中
	 * @throws KettleXMLException
	 */
	@Test
	public void copySrc2Dest() throws KettleXMLException{
		ExtractBean extractBean = CopyTable.this.extractBean;

		TransMeta transMeta = new TransMeta();
		transMeta.setName("全量抽取数据");

		DatabaseConn srcDB = extractBean.getSrcDB();
		final String srcDBName = "srcDB";
		DatabaseMeta dbMeta = new DatabaseMeta(srcDBName, srcDB.getType(),
				srcDB.getAccess(), srcDB.getServer(), srcDB.getDatabase(),
				srcDB.getPort(), srcDB.getUsername(), srcDB.getPassword());

		dbMeta.setConnectSQL(MessageFormat.format("set names ''{0}'';",
				new Object[] { "utf8" }));

		transMeta.addDatabase(dbMeta);

		
		PluginRegistry registry = PluginRegistry.getInstance();
		TableInputMeta tableInputMeta = new TableInputMeta();
		String tableInputPluginId = registry.getPluginId(StepPluginType.class,
				tableInputMeta);
		/*
		 *  1. 给表输入添加一个DatabaseMeta连接数据库
		 */
		tableInputMeta.setDatabaseMeta(transMeta.findDatabase(srcDBName));
		/*
		 * 2.  构造查询SQL
		 */
		String selectSql = "select {0} from {1}";
		selectSql = MessageFormat.format(
				selectSql,
				new Object[] {
						StringUtils.join(extractBean.getSrcFields(), ","),
						extractBean.getSrcTable()[0] });
		tableInputMeta.setSQL(selectSql);

		// 打印查询SQL
		System.out.println(selectSql);

		/*
		 * 4. 将TableInputMeta添加到转换中
		 */
		StepMeta tableInputStepMeta = new StepMeta(tableInputPluginId,
				"tableInput", (StepMetaInterface) tableInputMeta);
		transMeta.addStep(tableInputStepMeta);

		/*
		 * 10. 获取目标库信息
		 */
		DatabaseConn destDB = extractBean.getDestDB();
		final String destDBName = "destDB";
		DatabaseMeta destDbMeta = new DatabaseMeta(destDBName, destDB.getType(),
				destDB.getAccess(), destDB.getServer(), destDB.getDatabase(),
				destDB.getPort(), destDB.getUsername(), destDB.getPassword());
		
		/*
		 * 12.输出到表中
		 */
		TableOutputMeta tableOutputMeta = new TableOutputMeta();
		tableOutputMeta.setDatabaseMeta(destDbMeta);
		
		//设置目标表的 schema和表名
		tableOutputMeta.setSchemaName(null);
		tableOutputMeta.setTablename(extractBean.getDestTable());
		
		//指定目标表数据库字段
		tableOutputMeta.setSpecifyFields(true);
		/**
		 * 在此处如果是oracle中含有大字段的有一个问题:
		 * ORA-24816: 在实际的 LONG 或 LOB 列之后提供了扩展的非 LONG 绑定数据
		解决方法:在实际的 LONG 或 LOB 列之后提供了扩展的非 LONG 绑定数据错误,
		这个错误是因为在绑定参数时把数据库中字段类型为LONG的字段放置在其他字段前设置了,
		只要将类型为Lob的字段绑定参数时放在最后设置即可解决。
		
		即设置  destFields为:  long ,clob,blob这样的顺序,long类型等不能放在clob后面
		 */
		tableOutputMeta.setFieldStream(extractBean.getDestFields());
		tableOutputMeta.setFieldDatabase(extractBean.getDestFields());
		
		String tableOutputPluginId = registry.getPluginId(StepPluginType.class, tableOutputMeta);
		StepMeta tableOutputStep = new StepMeta(tableOutputPluginId, "tableOutput" , (StepMetaInterface) tableOutputMeta);
		
		//将步骤添加进去
		transMeta.addStep(tableOutputStep);
		
		//将步骤和上一步关联起来
		transMeta.addTransHop(new TransHopMeta(tableInputStepMeta, tableOutputStep));

		Trans trans = new Trans(transMeta);
		trans.setMonitored(true);
		trans.setInitializing(true);
		trans.setPreparing(true);
		trans.setLogLevel(LogLevel.BASIC);
		trans.setRunning(true);
		trans.setSafeModeEnabled(true);

		trans.addTransListener(new TransAdapter() {
			@Override
			public void transFinished(Trans trans) {
				System.out.println("转换执行完成");
			}
		});

		try {
			// trans.startThreads();
			trans.execute(null);
		} catch (KettleException e) {
			e.printStackTrace();
		}
		trans.waitUntilFinished();
		if (trans.getErrors() > 0) {
			System.out.println("抽取数据出错.");
			return;
		}

		System.out.println("抽取加载数据完成");
		
	}

运行完效果一样,id是自动生成的,所以不一样

完整代码

/**
	 * 5. 将源表数据拷贝到目标表中
	 * @throws KettleXMLException
	 */
	@Test
	public void copySrc2Dest() throws KettleXMLException{
		ExtractBean extractBean = CopyTable.this.extractBean;

		TransMeta transMeta = new TransMeta();
		transMeta.setName("全量抽取数据");

		DatabaseConn srcDB = extractBean.getSrcDB();
		final String srcDBName = "srcDB";
		DatabaseMeta dbMeta = new DatabaseMeta(srcDBName, srcDB.getType(),
				srcDB.getAccess(), srcDB.getServer(), srcDB.getDatabase(),
				srcDB.getPort(), srcDB.getUsername(), srcDB.getPassword());

		dbMeta.setConnectSQL(MessageFormat.format("set names ''{0}'';",
				new Object[] { "utf8" }));

		transMeta.addDatabase(dbMeta);

		
		PluginRegistry registry = PluginRegistry.getInstance();
		TableInputMeta tableInputMeta = new TableInputMeta();
		String tableInputPluginId = registry.getPluginId(StepPluginType.class,
				tableInputMeta);
		/*
		 *  1. 给表输入添加一个DatabaseMeta连接数据库
		 */
		tableInputMeta.setDatabaseMeta(transMeta.findDatabase(srcDBName));
		/*
		 * 2.  构造查询SQL
		 */
		String selectSql = "select {0} from {1}";
		selectSql = MessageFormat.format(
				selectSql,
				new Object[] {
						StringUtils.join(extractBean.getSrcFields(), ","),
						extractBean.getSrcTable()[0] });
		tableInputMeta.setSQL(selectSql);

		// 打印查询SQL
		System.out.println(selectSql);

		/*
		 * 3. 替换SQL语句中变量
		 */
		tableInputMeta.setVariableReplacementActive(true);

		/*
		 * 4. 将TableInputMeta添加到转换中
		 */
		StepMeta tableInputStepMeta = new StepMeta(tableInputPluginId,
				"tableInput", (StepMetaInterface) tableInputMeta);
		transMeta.addStep(tableInputStepMeta);

		/*
		 * 5. 字段选择
		 */
		SelectValuesMeta selectValuesMeta = new SelectValuesMeta();
		String selectValuesPluginId = registry.getPluginId(
				StepPluginType.class, selectValuesMeta);
		selectValuesMeta.allocate(extractBean.getSrcFields().length, 0, 0);
		selectValuesMeta.setSelectName(extractBean.getSrcFields());

		StepMeta selectValuesStepMeta = new StepMeta(selectValuesPluginId,
				"selectValues", (StepMetaInterface) selectValuesMeta);
		transMeta.addStep(selectValuesStepMeta);

		/*
		 * 6. 添加hop把两个步骤关联起来
		 */
		transMeta.addTransHop(new TransHopMeta(tableInputStepMeta,
				selectValuesStepMeta));

		/*
		 * 7. 字符串替换
		 */
		ReplaceStringMeta replaceStringMeta = new ReplaceStringMeta();
		String replaceStringPluginId = registry.getPluginId(
				StepPluginType.class, replaceStringMeta);
		int fieldTransferLength = extractBean.getFieldTransfers().length;
		replaceStringMeta.allocate(fieldTransferLength);
		for (int i = 0; i < fieldTransferLength; i++) {
			FieldTransfer fieldTransfer = extractBean.getFieldTransfers()[i];
			replaceStringMeta.getFieldInStream()[i] = fieldTransfer.getField();
			replaceStringMeta.getReplaceString()[i] = fieldTransfer.getSrc();
			replaceStringMeta.getReplaceByString()[i] = fieldTransfer
					.getTarget();
			//是否使用正则表达式
			replaceStringMeta.getUseRegEx()[i] = fieldTransfer.isRegEx() ? 1
					: 0;
		}

		if (extractBean.getFieldTransfers() == null || fieldTransferLength <= 0) {
			replaceStringMeta.allocate(0);
		}

		/*
		 * 8. 添加步骤到转换中
		 */
		StepMeta replaceStringStepMeta = new StepMeta(replaceStringPluginId,
				"replaceString", (StepMetaInterface) replaceStringMeta);
		transMeta.addStep(replaceStringStepMeta);
		/*
		 * 9. 添加hop把两个步骤关联起来
		 */
		transMeta.addTransHop(new TransHopMeta(selectValuesStepMeta,
				replaceStringStepMeta));

		/*
		 * 10. 获取目标库信息
		 */
		DatabaseConn destDB = extractBean.getDestDB();
		final String destDBName = "destDB";
		DatabaseMeta destDbMeta = new DatabaseMeta(destDBName, destDB.getType(),
				destDB.getAccess(), destDB.getServer(), destDB.getDatabase(),
				destDB.getPort(), destDB.getUsername(), destDB.getPassword());
		
		/*
		 * 11.字段选择
		 */
//		SelectValuesMeta selectValuesMeta = new SelectValuesMeta();
//		StepMeta selectValuesStep = new StepMeta();

		
		/*
		 * 12.输出到表中
		 */
		TableOutputMeta tableOutputMeta = new TableOutputMeta();
		tableOutputMeta.setDatabaseMeta(destDbMeta);
		
		//设置目标表的 schema和表名
		tableOutputMeta.setSchemaName(null);
		tableOutputMeta.setTablename(extractBean.getDestTable());
		
		//指定目标表数据库字段
		tableOutputMeta.setSpecifyFields(true);
		/**
		 * 在此处如果是oracle中含有大字段的有一个问题:
		 * ORA-24816: 在实际的 LONG 或 LOB 列之后提供了扩展的非 LONG 绑定数据
		解决方法:在实际的 LONG 或 LOB 列之后提供了扩展的非 LONG 绑定数据错误,
		这个错误是因为在绑定参数时把数据库中字段类型为LONG的字段放置在其他字段前设置了,
		只要将类型为Lob的字段绑定参数时放在最后设置即可解决。
		
		即设置  destFields为:  long ,clob,blob这样的顺序,long类型等不能放在clob后面
		 */
		tableOutputMeta.setFieldStream(extractBean.getDestFields());
		tableOutputMeta.setFieldDatabase(extractBean.getDestFields());
		
		String tableOutputPluginId = registry.getPluginId(StepPluginType.class, tableOutputMeta);
		StepMeta tableOutputStep = new StepMeta(tableOutputPluginId, "tableOutput" , (StepMetaInterface) tableOutputMeta);
		
		//将步骤添加进去
		transMeta.addStep(tableOutputStep);
		
		//将步骤和上一步关联起来
		transMeta.addTransHop(new TransHopMeta(replaceStringStepMeta, tableOutputStep));
		

		//这个的作用是用来,设置select这个SQL中分页的
		transMeta.setVariable("VAR_FROM", "0");
		transMeta.setVariable("VAR_TO", "20");

		Trans trans = new Trans(transMeta);
		trans.setMonitored(true);
		trans.setInitializing(true);
		trans.setPreparing(true);
		trans.setLogLevel(LogLevel.BASIC);
		trans.setRunning(true);
		trans.setSafeModeEnabled(true);

		trans.addTransListener(new TransAdapter() {
			@Override
			public void transFinished(Trans trans) {
				System.out.println("转换执行完成");
			}
		});

		try {
			// trans.startThreads();
			trans.execute(null);
		} catch (KettleException e) {
			e.printStackTrace();
		}
		trans.waitUntilFinished();
		if (trans.getErrors() > 0) {
			System.out.println("抽取数据出错.");
			return;
		}

		System.out.println("抽取加载数据完成");
		
	}

 

 

 

posted @ 2020-09-18 12:35  伟衙内  阅读(217)  评论(0编辑  收藏  举报