csv交换到excel
目录
问题、没有设置ExcelOutputMeta的ExcelField
一、准备工作
准备一个csv文件,内容如下,
在spoon工具中配置如下交换,并保存为ktr文件,并实验交换是正常的。
二、 分析ktr
通过上一篇得知应该看哪些元素,那么此处就忽略重复的了,简略描述下。 通过ktr文件写交换代码_lw18751836671的专栏-CSDN博客目录一、ktr文件生成二、分析ktr文件解析类文件路径文件字段图形化信息输出节点三、交换代码初始化环境 输入节点 输入字段列设置 输出节点输出节点列设置设置Trans将步骤添加到Trans步骤关联交换执行四、问题点问题、no class found 没有jxl包和poi包问题、 没有设置includeSubFolders属性问题、没有设置fileMask和excludeFileMask 问题、没有设...https://blog.csdn.net/lw18751836671/article/details/121406210?spm=1001.2014.3001.5501 首先确定 CsvInputMeta和ExcelOutputMeta两个对象,
然后确定CsvInputMeta中需要设置fileName,separator,enclosure,header,buffer_size等属性值。
<filename>F:\kette_test\input\person.csv</filename>
<filename_field/>
<rownum_field/>
<include_filename>N</include_filename>
<separator>,</separator>
<enclosure>"</enclosure>
<header>Y</header>
<buffer_size>50000</buffer_size>
再设置CsvInputMeta的fields,为了篇幅问题,就省略一些field的xml了。
<fields>
<field>
<name>id</name>
<type>String</type>
<format/>
<currency>¥</currency>
<decimal>.</decimal>
<group>,</group>
<length>3</length>
<precision>-1</precision>
<trim_type>none</trim_type>
</field>
<field>
<name>set</name>
<type>Integer</type>
<format>#</format>
<currency>¥</currency>
<decimal>.</decimal>
<group>,</group>
<length>15</length>
<precision>0</precision>
<trim_type>none</trim_type>
</field>
</fields>
注意项:
如下图所示,输出我并没有定义字段,在spoon是可以正常,但是后面代码中没有设置ExcelField就报空指针。
在ktr中的xml也是没有内容的,
<fields>
</fields>
三、代码分析
定义一个CsvInputMeta对象,
/*
1.输入
*/
CsvInputMeta inputMeta = new CsvInputMeta();
下面要开始设置属性,
String filePath = "F:\\kette_test\\input\\person.csv";
inputMeta.setFilename(filePath);
//设置列分割符
inputMeta.setDelimiter(",");
//设置封闭符
inputMeta.setEnclosure("\"");
//表头
inputMeta.setHeaderPresent(true);
inputMeta.setBufferSize("50000");
关于分割符,也就是xml中的separator节点,在CsvInputMeta中并没有separator这个属性,但是通过搜索separator找到一个getSeparator方法,返回delimiter。
public String getSeparator() {
return delimiter;
}
这个封闭符enclosure设置为了什么我也不知道,我随便设置了一个“\\”也可以运行交换。
下面是字段列的设置,为了篇幅问题,就列举一个,
//字段列
String[] fieldsName = new String[]{"id","name","age","set"};
TextFileInputField[] inputFields = new TextFileInputField[fieldsName.length];
inputFields[0] = new TextFileInputField(fieldsName[0],-1,3);
inputFields[0].setType(ValueMetaInterface.TYPE_STRING);
inputFields[0].setDecimalSymbol(".");
inputFields[0].setGroupSymbol(",");
至此,关于输入CsvInputMeta的代码就完成了,其他的就不再赘述。
四、 问题项
问题、没有设置bufferSize
解决方式:
inputMeta.setBufferSize("50000");
问题、没有设置ExcelOutputMeta的ExcelField
这个问题也是我之前说的,在spoon中并没有设置字段,但是在代码中不设置字段就报错了,
ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Error writing line :java.lang.NullPointerException
解决方式:这里用了allocate方法,和上一篇不一样,但效果相同,
outputMeta.allocate(fieldsName.length);
ExcelField[] excelFields = outputMeta.getOutputFields();
excelFields[0] = new ExcelField();
excelFields[0].setName(fieldsName[0]);
excelFields[0].setType(ValueMetaInterface.TYPE_STRING);
excelFields[1] = new ExcelField();
excelFields[1].setName(fieldsName[1]);
excelFields[1].setType(ValueMetaInterface.TYPE_STRING);
excelFields[2] = new ExcelField();
excelFields[2].setName(fieldsName[2]);
excelFields[2].setType(ValueMetaInterface.TYPE_INTEGER);
excelFields[2].setFormat("0");
excelFields[3] = new ExcelField();
excelFields[3].setName(fieldsName[3]);
excelFields[3].setType(ValueMetaInterface.TYPE_INTEGER);
excelFields[3].setFormat("0");
outputMeta.setOutputFields(excelFields);
五、运行
备注:
使用的spoon工具是 pdi-ce-8.2.0.0-342 这个版本的。
完整代码
/**
* csv 到excel交换
* @throws KettleException
*/
@Test
public void exchangeCsv2Excel() throws KettleException{
/*
1.输入
*/
CsvInputMeta inputMeta = new CsvInputMeta();
/**
* 文件内容:
* id,name,age,set
id1,name1,20,1
id2,name2,21,1
id3,name3,22,1
id4,name4,23,0
id5,name5,24,0
id6,name6,25,0
*/
String filePath = "F:\\kette_test\\input\\person.csv";
inputMeta.setFilename(filePath);
//设置列分割符
inputMeta.setDelimiter(",");
//设置封闭符
inputMeta.setEnclosure("\"");
//表头
inputMeta.setHeaderPresent(true);
inputMeta.setBufferSize("50000");
//字段列
String[] fieldsName = new String[]{"id","name","age","set"};
TextFileInputField[] inputFields = new TextFileInputField[fieldsName.length];
inputFields[0] = new TextFileInputField(fieldsName[0],-1,3);
inputFields[0].setType(ValueMetaInterface.TYPE_STRING);
inputFields[0].setDecimalSymbol(".");
inputFields[0].setGroupSymbol(",");
inputFields[1] = new TextFileInputField(fieldsName[1],-1,5);
inputFields[1].setType(ValueMetaInterface.TYPE_STRING);
inputFields[1].setDecimalSymbol(".");
inputFields[1].setGroupSymbol(",");
inputFields[2] = new TextFileInputField(fieldsName[2],-1,15);
inputFields[2].setType(ValueMetaInterface.TYPE_INTEGER);
inputFields[2].setFormat("#");
inputFields[2].setDecimalSymbol(".");
inputFields[2].setGroupSymbol(",");
inputFields[2].setPrecision(0);
inputFields[3] = new TextFileInputField(fieldsName[3],-1,15);
inputFields[3].setType(ValueMetaInterface.TYPE_INTEGER);
inputFields[3].setFormat("#");
inputFields[3].setDecimalSymbol(".");
inputFields[3].setGroupSymbol(",");
inputFields[3].setPrecision(0);
inputMeta.setInputFields(inputFields);
/*
2.输出
*/
ExcelOutputMeta outputMeta = new ExcelOutputMeta();
outputMeta.setAppend(false);
outputMeta.setHeaderEnabled(true);
outputMeta.setFooterEnabled(false);
outputMeta.setFileName("F:\\kette_test\\output\\person_2");
//设置扩展名
outputMeta.setExtension("xls");
outputMeta.setDoNotOpenNewFileInit(false);
outputMeta.setCreateParentFolder(false);
outputMeta.allocate(fieldsName.length);
ExcelField[] excelFields = outputMeta.getOutputFields();
excelFields[0] = new ExcelField();
excelFields[0].setName(fieldsName[0]);
excelFields[0].setType(ValueMetaInterface.TYPE_STRING);
excelFields[1] = new ExcelField();
excelFields[1].setName(fieldsName[1]);
excelFields[1].setType(ValueMetaInterface.TYPE_STRING);
excelFields[2] = new ExcelField();
excelFields[2].setName(fieldsName[2]);
excelFields[2].setType(ValueMetaInterface.TYPE_INTEGER);
excelFields[2].setFormat("0");
excelFields[3] = new ExcelField();
excelFields[3].setName(fieldsName[3]);
excelFields[3].setType(ValueMetaInterface.TYPE_INTEGER);
excelFields[3].setFormat("0");
outputMeta.setOutputFields(excelFields);
/*
3. 步骤添加
*/
TransMeta transMeta = new TransMeta();
transMeta.setName("csv2excel交换");
PluginRegistry registry = PluginRegistry.getInstance();
String inputPluginId = registry.getPluginId(StepPluginType.class, inputMeta);
StepMeta inputStep = new StepMeta(inputPluginId, "csv-input", (StepMetaInterface) inputMeta);
//给步骤添加在spoon工具中的显示位置
inputStep.setDraw(true);
inputStep.setLocation(200, 200);
//将步骤添加进去
transMeta.addStep(inputStep);
String outPluginId = registry.getPluginId(StepPluginType.class, outputMeta);
StepMeta outputStep = new StepMeta(outPluginId, "excel-output", (StepMetaInterface) outputMeta);
//给步骤添加在spoon工具中的显示位置
outputStep.setDraw(true);
outputStep.setLocation(300, 200);
transMeta.addStep(outputStep);
/*
4.步骤关联
*/
transMeta.addTransHop(new TransHopMeta(inputStep, outputStep));
/*
5.执行交换
*/
Trans trans = new Trans(transMeta);
//执行转换
trans.execute(null);
//等待完成
trans.waitUntilFinished();
if (trans.getErrors() > 0) {
System.out.println("交换出错.");
return;
}
}
完整ktr
<?xml version="1.0" encoding="UTF-8"?>
<transformation>
<info>
<name>csv2excel</name>
<description/>
<extended_description/>
<trans_version/>
<trans_type>Normal</trans_type>
<directory>/</directory>
<parameters>
</parameters>
<log>
<trans-log-table>
<connection/>
<schema/>
<table/>
<size_limit_lines/>
<interval/>
<timeout_days/>
<field>
<id>ID_BATCH</id>
<enabled>Y</enabled>
<name>ID_BATCH</name>
</field>
<field>
<id>CHANNEL_ID</id>
<enabled>Y</enabled>
<name>CHANNEL_ID</name>
</field>
<field>
<id>TRANSNAME</id>
<enabled>Y</enabled>
<name>TRANSNAME</name>
</field>
<field>
<id>STATUS</id>
<enabled>Y</enabled>
<name>STATUS</name>
</field>
<field>
<id>LINES_READ</id>
<enabled>Y</enabled>
<name>LINES_READ</name>
<subject/>
</field>
<field>
<id>LINES_WRITTEN</id>
<enabled>Y</enabled>
<name>LINES_WRITTEN</name>
<subject/>
</field>
<field>
<id>LINES_UPDATED</id>
<enabled>Y</enabled>
<name>LINES_UPDATED</name>
<subject/>
</field>
<field>
<id>LINES_INPUT</id>
<enabled>Y</enabled>
<name>LINES_INPUT</name>
<subject/>
</field>
<field>
<id>LINES_OUTPUT</id>
<enabled>Y</enabled>
<name>LINES_OUTPUT</name>
<subject/>
</field>
<field>
<id>LINES_REJECTED</id>
<enabled>Y</enabled>
<name>LINES_REJECTED</name>
<subject/>
</field>
<field>
<id>ERRORS</id>
<enabled>Y</enabled>
<name>ERRORS</name>
</field>
<field>
<id>STARTDATE</id>
<enabled>Y</enabled>
<name>STARTDATE</name>
</field>
<field>
<id>ENDDATE</id>
<enabled>Y</enabled>
<name>ENDDATE</name>
</field>
<field>
<id>LOGDATE</id>
<enabled>Y</enabled>
<name>LOGDATE</name>
</field>
<field>
<id>DEPDATE</id>
<enabled>Y</enabled>
<name>DEPDATE</name>
</field>
<field>
<id>REPLAYDATE</id>
<enabled>Y</enabled>
<name>REPLAYDATE</name>
</field>
<field>
<id>LOG_FIELD</id>
<enabled>Y</enabled>
<name>LOG_FIELD</name>
</field>
<field>
<id>EXECUTING_SERVER</id>
<enabled>N</enabled>
<name>EXECUTING_SERVER</name>
</field>
<field>
<id>EXECUTING_USER</id>
<enabled>N</enabled>
<name>EXECUTING_USER</name>
</field>
<field>
<id>CLIENT</id>
<enabled>N</enabled>
<name>CLIENT</name>
</field>
</trans-log-table>
<perf-log-table>
<connection/>
<schema/>
<table/>
<interval/>
<timeout_days/>
<field>
<id>ID_BATCH</id>
<enabled>Y</enabled>
<name>ID_BATCH</name>
</field>
<field>
<id>SEQ_NR</id>
<enabled>Y</enabled>
<name>SEQ_NR</name>
</field>
<field>
<id>LOGDATE</id>
<enabled>Y</enabled>
<name>LOGDATE</name>
</field>
<field>
<id>TRANSNAME</id>
<enabled>Y</enabled>
<name>TRANSNAME</name>
</field>
<field>
<id>STEPNAME</id>
<enabled>Y</enabled>
<name>STEPNAME</name>
</field>
<field>
<id>STEP_COPY</id>
<enabled>Y</enabled>
<name>STEP_COPY</name>
</field>
<field>
<id>LINES_READ</id>
<enabled>Y</enabled>
<name>LINES_READ</name>
</field>
<field>
<id>LINES_WRITTEN</id>
<enabled>Y</enabled>
<name>LINES_WRITTEN</name>
</field>
<field>
<id>LINES_UPDATED</id>
<enabled>Y</enabled>
<name>LINES_UPDATED</name>
</field>
<field>
<id>LINES_INPUT</id>
<enabled>Y</enabled>
<name>LINES_INPUT</name>
</field>
<field>
<id>LINES_OUTPUT</id>
<enabled>Y</enabled>
<name>LINES_OUTPUT</name>
</field>
<field>
<id>LINES_REJECTED</id>
<enabled>Y</enabled>
<name>LINES_REJECTED</name>
</field>
<field>
<id>ERRORS</id>
<enabled>Y</enabled>
<name>ERRORS</name>
</field>
<field>
<id>INPUT_BUFFER_ROWS</id>
<enabled>Y</enabled>
<name>INPUT_BUFFER_ROWS</name>
</field>
<field>
<id>OUTPUT_BUFFER_ROWS</id>
<enabled>Y</enabled>
<name>OUTPUT_BUFFER_ROWS</name>
</field>
</perf-log-table>
<channel-log-table>
<connection/>
<schema/>
<table/>
<timeout_days/>
<field>
<id>ID_BATCH</id>
<enabled>Y</enabled>
<name>ID_BATCH</name>
</field>
<field>
<id>CHANNEL_ID</id>
<enabled>Y</enabled>
<name>CHANNEL_ID</name>
</field>
<field>
<id>LOG_DATE</id>
<enabled>Y</enabled>
<name>LOG_DATE</name>
</field>
<field>
<id>LOGGING_OBJECT_TYPE</id>
<enabled>Y</enabled>
<name>LOGGING_OBJECT_TYPE</name>
</field>
<field>
<id>OBJECT_NAME</id>
<enabled>Y</enabled>
<name>OBJECT_NAME</name>
</field>
<field>
<id>OBJECT_COPY</id>
<enabled>Y</enabled>
<name>OBJECT_COPY</name>
</field>
<field>
<id>REPOSITORY_DIRECTORY</id>
<enabled>Y</enabled>
<name>REPOSITORY_DIRECTORY</name>
</field>
<field>
<id>FILENAME</id>
<enabled>Y</enabled>
<name>FILENAME</name>
</field>
<field>
<id>OBJECT_ID</id>
<enabled>Y</enabled>
<name>OBJECT_ID</name>
</field>
<field>
<id>OBJECT_REVISION</id>
<enabled>Y</enabled>
<name>OBJECT_REVISION</name>
</field>
<field>
<id>PARENT_CHANNEL_ID</id>
<enabled>Y</enabled>
<name>PARENT_CHANNEL_ID</name>
</field>
<field>
<id>ROOT_CHANNEL_ID</id>
<enabled>Y</enabled>
<name>ROOT_CHANNEL_ID</name>
</field>
</channel-log-table>
<step-log-table>
<connection/>
<schema/>
<table/>
<timeout_days/>
<field>
<id>ID_BATCH</id>
<enabled>Y</enabled>
<name>ID_BATCH</name>
</field>
<field>
<id>CHANNEL_ID</id>
<enabled>Y</enabled>
<name>CHANNEL_ID</name>
</field>
<field>
<id>LOG_DATE</id>
<enabled>Y</enabled>
<name>LOG_DATE</name>
</field>
<field>
<id>TRANSNAME</id>
<enabled>Y</enabled>
<name>TRANSNAME</name>
</field>
<field>
<id>STEPNAME</id>
<enabled>Y</enabled>
<name>STEPNAME</name>
</field>
<field>
<id>STEP_COPY</id>
<enabled>Y</enabled>
<name>STEP_COPY</name>
</field>
<field>
<id>LINES_READ</id>
<enabled>Y</enabled>
<name>LINES_READ</name>
</field>
<field>
<id>LINES_WRITTEN</id>
<enabled>Y</enabled>
<name>LINES_WRITTEN</name>
</field>
<field>
<id>LINES_UPDATED</id>
<enabled>Y</enabled>
<name>LINES_UPDATED</name>
</field>
<field>
<id>LINES_INPUT</id>
<enabled>Y</enabled>
<name>LINES_INPUT</name>
</field>
<field>
<id>LINES_OUTPUT</id>
<enabled>Y</enabled>
<name>LINES_OUTPUT</name>
</field>
<field>
<id>LINES_REJECTED</id>
<enabled>Y</enabled>
<name>LINES_REJECTED</name>
</field>
<field>
<id>ERRORS</id>
<enabled>Y</enabled>
<name>ERRORS</name>
</field>
<field>
<id>LOG_FIELD</id>
<enabled>N</enabled>
<name>LOG_FIELD</name>
</field>
</step-log-table>
<metrics-log-table>
<connection/>
<schema/>
<table/>
<timeout_days/>
<field>
<id>ID_BATCH</id>
<enabled>Y</enabled>
<name>ID_BATCH</name>
</field>
<field>
<id>CHANNEL_ID</id>
<enabled>Y</enabled>
<name>CHANNEL_ID</name>
</field>
<field>
<id>LOG_DATE</id>
<enabled>Y</enabled>
<name>LOG_DATE</name>
</field>
<field>
<id>METRICS_DATE</id>
<enabled>Y</enabled>
<name>METRICS_DATE</name>
</field>
<field>
<id>METRICS_CODE</id>
<enabled>Y</enabled>
<name>METRICS_CODE</name>
</field>
<field>
<id>METRICS_DESCRIPTION</id>
<enabled>Y</enabled>
<name>METRICS_DESCRIPTION</name>
</field>
<field>
<id>METRICS_SUBJECT</id>
<enabled>Y</enabled>
<name>METRICS_SUBJECT</name>
</field>
<field>
<id>METRICS_TYPE</id>
<enabled>Y</enabled>
<name>METRICS_TYPE</name>
</field>
<field>
<id>METRICS_VALUE</id>
<enabled>Y</enabled>
<name>METRICS_VALUE</name>
</field>
</metrics-log-table>
</log>
<maxdate>
<connection/>
<table/>
<field/>
<offset>0.0</offset>
<maxdiff>0.0</maxdiff>
</maxdate>
<size_rowset>10000</size_rowset>
<sleep_time_empty>50</sleep_time_empty>
<sleep_time_full>50</sleep_time_full>
<unique_connections>N</unique_connections>
<feedback_shown>Y</feedback_shown>
<feedback_size>50000</feedback_size>
<using_thread_priorities>Y</using_thread_priorities>
<shared_objects_file/>
<capture_step_performance>N</capture_step_performance>
<step_performance_capturing_delay>1000</step_performance_capturing_delay>
<step_performance_capturing_size_limit>100</step_performance_capturing_size_limit>
<dependencies>
</dependencies>
<partitionschemas>
</partitionschemas>
<slaveservers>
</slaveservers>
<clusterschemas>
</clusterschemas>
<created_user>-</created_user>
<created_date>2021/11/10 10:25:53.880</created_date>
<modified_user>-</modified_user>
<modified_date>2021/11/10 10:25:53.880</modified_date>
<key_for_session_key/>
<is_key_private>N</is_key_private>
</info>
<notepads>
</notepads>
<order>
<hop>
<from>CSV文件输入</from>
<to>Excel输出</to>
<enabled>Y</enabled>
</hop>
</order>
<step>
<name>CSV文件输入</name>
<type>CsvInput</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<filename>F:\kette_test\input\person.csv</filename>
<filename_field/>
<rownum_field/>
<include_filename>N</include_filename>
<separator>,</separator>
<enclosure>"</enclosure>
<header>Y</header>
<buffer_size>50000</buffer_size>
<lazy_conversion>Y</lazy_conversion>
<add_filename_result>N</add_filename_result>
<parallel>N</parallel>
<newline_possible>N</newline_possible>
<encoding/>
<fields>
<field>
<name>id</name>
<type>String</type>
<format/>
<currency>¥</currency>
<decimal>.</decimal>
<group>,</group>
<length>3</length>
<precision>-1</precision>
<trim_type>none</trim_type>
</field>
<field>
<name>name</name>
<type>String</type>
<format/>
<currency>¥</currency>
<decimal>.</decimal>
<group>,</group>
<length>5</length>
<precision>-1</precision>
<trim_type>none</trim_type>
</field>
<field>
<name>age</name>
<type>Integer</type>
<format>#</format>
<currency>¥</currency>
<decimal>.</decimal>
<group>,</group>
<length>15</length>
<precision>0</precision>
<trim_type>none</trim_type>
</field>
<field>
<name>set</name>
<type>Integer</type>
<format>#</format>
<currency>¥</currency>
<decimal>.</decimal>
<group>,</group>
<length>15</length>
<precision>0</precision>
<trim_type>none</trim_type>
</field>
</fields>
<attributes/>
<cluster_schema/>
<remotesteps>
<input>
</input>
<output>
</output>
</remotesteps>
<GUI>
<xloc>240</xloc>
<yloc>144</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Excel输出</name>
<type>ExcelOutput</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<header>Y</header>
<footer>N</footer>
<encoding/>
<append>N</append>
<add_to_result_filenames>Y</add_to_result_filenames>
<file>
<name>F:\kette_test\output\person</name>
<extention>xls</extention>
<do_not_open_newfile_init>N</do_not_open_newfile_init>
<create_parent_folder>N</create_parent_folder>
<split>N</split>
<add_date>N</add_date>
<add_time>N</add_time>
<SpecifyFormat>N</SpecifyFormat>
<date_time_format/>
<sheetname>Sheet1</sheetname>
<autosizecolums>N</autosizecolums>
<nullisblank>N</nullisblank>
<protect_sheet>N</protect_sheet>
<password>Encrypted </password>
<splitevery>0</splitevery>
<usetempfiles>N</usetempfiles>
<tempdirectory/>
</file>
<template>
<enabled>N</enabled>
<append>N</append>
<filename>template.xls</filename>
</template>
<fields>
</fields>
<custom>
<header_font_name>arial</header_font_name>
<header_font_size>10</header_font_size>
<header_font_bold>N</header_font_bold>
<header_font_italic>N</header_font_italic>
<header_font_underline>no</header_font_underline>
<header_font_orientation>horizontal</header_font_orientation>
<header_font_color>black</header_font_color>
<header_background_color>none</header_background_color>
<header_row_height>255</header_row_height>
<header_alignment>left</header_alignment>
<header_image/>
<row_font_name>arial</row_font_name>
<row_font_size>10</row_font_size>
<row_font_color>black</row_font_color>
<row_background_color>none</row_background_color>
</custom>
<attributes/>
<cluster_schema/>
<remotesteps>
<input>
</input>
<output>
</output>
</remotesteps>
<GUI>
<xloc>384</xloc>
<yloc>144</yloc>
<draw>Y</draw>
</GUI>
</step>
<step_error_handling>
</step_error_handling>
<slave-step-copy-partition-distribution>
</slave-step-copy-partition-distribution>
<slave_transformation>N</slave_transformation>
<attributes/>
</transformation>