gaarakseven

导航

opencsv 将对象数组导出为 csv 文件时、文件列按对象字段定义顺序排序的实现

前言

有这个需求的朋友应该已经大致熟悉使用opencsv将 bean[] 转 csv 的基本操作,本文掠过 opencsv 的使用方法介绍。
还不熟悉的朋友可以参考这篇博文 Java之利用openCsv导出csv文件
想深究 opencsv 的朋友,去跟官方文档做斗争吧

ps: 以下解决思路为个人阅读源码而想到的, 可能存在错误、过于繁琐或者官方文档有更简洁方式等情况。写这篇文章主要是抛砖引玉,若有朋友知晓更好的实现方式,希望留个言指点迷津。

需求场景

将对象数组转换为 csv 文件,并且 csv 的列顺序跟对象字段的定义顺序一致。

环境

openjdk11 + opencsv 5.0

冲突点

  1. opencsv 的默认排序策略是按照对象属性名的字母升序排序,不满足笔者的需求。
  2. 在前言中提到的博文实际上提供了一种解决思路,使用 @CsvBindByPosition 注解显式声明字段的位置。使用该方式要满足笔者的需求就得一个一个字段地加上,太繁琐; 并且使用该方式之后 opencsv 不能自动读取对象属性名为 csv 表头, 需要要额外传入表头。

解决思路

急于完成业务需求的可以直接看第一点

1. 直接使用版

完整代码

    @SneakyThrows
    public <T> String generateCsvFile(List<? extends T> exportResults, String fileName)
            throws IOException, CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {
        String finalFileName = new File(nginxDownloadPath,
                fileName + System.currentTimeMillis() + ".csv").getPath();
        Writer writer = new FileWriter(finalFileName);
        CSVWriter csvWriter = new CSVWriter(
                writer,
                CSVWriter.DEFAULT_SEPARATOR,
                CSVWriter.DEFAULT_QUOTE_CHARACTER,
                CSVWriter.NO_ESCAPE_CHARACTER,
                CSVWriter.DEFAULT_LINE_END);
//        csvWriter.writeNext(header);
        if (exportResults.size() > 0) {
            //写内容
            StatefulBeanToCsv beanToCsv = new StatefulBeanToCsvBuilder<T>(writer).
                    withMappingStrategy(new OrderColumnMappingStrategy(exportResults.get(0).getClass())).
                    withIgnoreField(exportResults.get(0).getClass(), Arrays.stream(exportResults.get(0).getClass().getDeclaredFields()).filter(one -> {
                        one.setAccessible(true);
                        return one.isAnnotationPresent(CsvIgnore.class);
                    }).findFirst().orElse(null)).
                    build();
            beanToCsv.write(exportResults);
        }
        csvWriter.close();
        writer.close();
        return finalFileName;
    }

    public class OrderColumnMappingStrategy<T> extends HeaderColumnNameMappingStrategy<T> {
        private Locale errorLocale = Locale.getDefault();

        public OrderColumnMappingStrategy(Class<? extends T> type) {
            super();
            this.setErrorLocale(errorLocale);
            this.setType(type);
        }

        @Override
        public String[] generateHeader(T bean) throws CsvRequiredFieldEmptyException {
            if (type == null) {
                throw new IllegalStateException(ResourceBundle
                        .getBundle(ICSVParser.DEFAULT_BUNDLE_NAME, errorLocale)
                        .getString("type.before.header"));
            }

            if (headerIndex.isEmpty()) {
                List<String> realHeaderList = new ArrayList<>();
				/**getFieldNameForCsvHeader()方法是通过反射获取对象的字段, 字段
				是按照定义顺序返回的. 这里就不贴出代码了*/
                getFieldNameForCsvHeader(type).forEach(one -> {
                    realHeaderList.add(one.toUpperCase());
                });
                String[] header = realHeaderList.toArray(new String[0]);
                headerIndex.initializeHeaderIndex(header);
                return header;
            }
            return headerIndex.getHeaderIndex();
        }
    }

关键代码摘出说明

  1. 继承 HeaderColumnNameMappingStrategy 类并重写 String[] generateHeader(T bean) 方法, 思路是改变 headerIndex 对象的初始化内容
   public class OrderColumnMappingStrategy<T> extends HeaderColumnNameMappingStrategy<T> {
        private Locale errorLocale = Locale.getDefault();
        public OrderColumnMappingStrategy(Class<? extends T> type) {
            super();
            this.setErrorLocale(errorLocale);
            this.setType(type);
        }

        @Override
        public String[] generateHeader(T bean) throws CsvRequiredFieldEmptyException {
            if(type == null) {
                throw new IllegalStateException(ResourceBundle
                        .getBundle(ICSVParser.DEFAULT_BUNDLE_NAME, errorLocale)
                        .getString("type.before.header"));
            }
            if(headerIndex.isEmpty()) {
                List<String> realHeaderList = new ArrayList<>();
				//getFieldNameForCsvHeader()方法是通过反射获取对象的字段, 字段是按照定义顺序返回的. 这里就不贴出代码了
                getFieldNameForCsvHeader(type).forEach(one -> {
                    realHeaderList.add(one.toUpperCase());

                });
                String[] header = realHeaderList.toArray(new String[0]);
                // 实际上, 最终 csv 文件的列排序和数据按序获取都去都是通过 headerIndex 对象完成的, 所以这一步针对 header 重新赋值为想要的顺序即可. header 最终会用于初始化 headerIndex
                headerIndex.initializeHeaderIndex(header);
                return header;
            }
            return headerIndex.getHeaderIndex();
        }
    }
  1. 新建 StatefulBeanToCsv 对象时将自定义的 Mapping 策略注册进去
//注意这个 .withMappingStrategy() 方法, 这就是将自定义的
StatefulBeanToCsv beanToCsv = new StatefulBeanToCsvBuilder<T>(writer).
                    withMappingStrategy(new OrderColumnMappingStrategy(exportResults.get(0).getClass())).
                    withIgnoreField(exportResults.get(0).getClass(), Arrays.stream(exportResults.get(0).getClass().getDeclaredFields()).filter(one -> {
                        one.setAccessible(true);
                        return one.isAnnotationPresent(CsvIgnore.class);
                    }).findFirst().orElse(null)).
                    build();
  1. 写入结果文件, 看一下效果吧
beanToCsv.write(exportResults);

2. 太长不看版

opencsv 底层使用一个SortedMap 实例 simpleMap 来存储 csv 列信息和对象字段(Field)信息的对应关系;simpleMap 的key 为对象字段名的的大写形式,value 则为 对象字段的 Field 对象;由于是 SortedMap, 所以 simpleMap 的键值对是依照 key 的自然顺序排序的(字母升序)。随后使用 simpleMap 的keySet 生成一个 MultiValuedMap<String, Integer> 的 map 实例 headerToPosition,其中 key 为对象字段名(也就是csv 的表头字段)的大写形式,value 为该字段在 csv 文件中的位置。在后续真正开始写文件时,opencsv 根据 headerToPosition 、使用索引找到对象字段名(也就是csv 的表头字段)的大写形式,再返回 simpleMap 找到 对应的 Field 对象实例。
以上就是 opencsv 将 对象 list 写 csv 文件的大概过程。只要针对 headerToPosition 做出修改,就能满足笔者的需求。如果还想深究的朋友,可以接着往下看。建议感兴趣的朋友在看以下内容时能自己 debugger 代码比对着看,若发现与笔者的说法有出入,欢迎讨论和指出错误。

3. 啰啰嗦嗦底层代码探究版

从调用 StatefulBeanToCsv 对象的 write(T Object) 开始, 就到了 opencsv 的地盘


    /**处理过的注解
Writes a list of beans out to the Writer provided to the constructor.
Params:
beans – A list of beans to be written to a CSV destination
Throws:
CsvDataTypeMismatchException – If a field of the beans is annotated improperly or an unsupported data type is supposed to be written
CsvRequiredFieldEmptyException – If a field is marked as required, but the source is null
     */
    public void write(List<T> beans) throws CsvDataTypeMismatchException,
            CsvRequiredFieldEmptyException {
        if (CollectionUtils.isNotEmpty(beans)) {
            write(beans.iterator());
        }
    }

我们跟着上述 write 方法的调用进一步查看代码, 在下述代码块必要的地方我会附上中文注释

    public void write(Iterator<T> iBeans) throws CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {

        PeekingIterator<T> beans = new PeekingIterator<>(iBeans);
        T firstBean = beans.peek();

        if (!beans.hasNext()) {
            return;
        }

        // Write header
        if (!headerWritten) {
        //这里准备写 csv 文件所需的条件, 列的顺序也在这一步进行定义. 也就是说, 在这一步的代码找到一个适合的切入点就能满足笔者的需求
            beforeFirstWrite(firstBean);
        }

        executor = new BeanExecutor<>(orderedResults);
        executor.prepare();

        // Process the beans
        try {
            //这里使用多线程进行 csv 文件的写.
            submitAllLines(beans);
        } catch (RejectedExecutionException e) {
            // An exception in one of the bean writing threads prompted the
            // executor service to shutdown before we were done.
            if (executor.getTerminalException() instanceof RuntimeException) {
                throw (RuntimeException) executor.getTerminalException();
            }
            if (executor.getTerminalException() instanceof CsvDataTypeMismatchException) {
                throw (CsvDataTypeMismatchException) executor.getTerminalException();
            }
            if (executor.getTerminalException() instanceof CsvRequiredFieldEmptyException) {
                throw (CsvRequiredFieldEmptyException) executor
                        .getTerminalException();
            }
            throw new RuntimeException(ResourceBundle.getBundle(ICSVParser.DEFAULT_BUNDLE_NAME, errorLocale)
                    .getString("error.writing.beans"), executor.getTerminalException());
        } catch (Exception e) {
            // Exception during parsing. Always unrecoverable.
            // I can't find a way to create this condition in the current
            // code, but we must have a catch-all clause.
            executor.shutdownNow();
            if (executor.getTerminalException() instanceof RuntimeException) {
                throw (RuntimeException) executor.getTerminalException();
            }
            throw new RuntimeException(ResourceBundle.getBundle(ICSVParser.DEFAULT_BUNDLE_NAME, errorLocale)
                    .getString("error.writing.beans"), e);
        }

        capturedExceptions.addAll(executor.getCapturedExceptions());
        executor.resultStream().forEach(l -> csvwriter.writeNext(l, applyQuotesToAll));
    }

基于上述的代码理解, 我们进入到 beforeFirstWrite(firstBean) 方法查看 opencsv 为了写 csv 文件都做了哪些准备以及查找笔者的切入点

    private void beforeFirstWrite(T bean) throws CsvRequiredFieldEmptyException {

        //我们不注册 mappingStrategy时,opencsv 会自动选择一个 mappingStrategy 实例
        // Determine mapping strategy
        if (mappingStrategy == null) {
            mappingStrategy = OpencsvUtils.determineMappingStrategy((Class<T>) bean.getClass(), errorLocale);
        }

        // Ignore fields. It's possible the mapping strategy has already been
        // primed, so only pass on our data if the user actually gave us
        // something.
        if(!ignoredFields.isEmpty()) {
            mappingStrategy.ignoreFields(ignoredFields);
        }

        // Build CSVWriter
        if (csvwriter == null) {
            csvwriter = new CSVWriter(writer, separator, quotechar, escapechar, lineEnd);
        }

		//这一步就是笔者的切入点, 为何是这一步呢?我们先观察一下默认的 mappingStrategy 的表头生成策略
        // Write the header
        String[] header = mappingStrategy.generateHeader(bean);
        if (header.length > 0) {
            csvwriter.writeNext(header, applyQuotesToAll);
        }
        headerWritten = true;
    }

查看 OpencsvUtils.determineMappingStrategy 方法

    static <T> MappingStrategy<T> determineMappingStrategy(Class<? extends T> type, Locale errorLocale) {
        // Check for annotations
        boolean positionAnnotationsPresent = Stream.of(FieldUtils.getAllFields(type)).anyMatch(
                f -> f.isAnnotationPresent(CsvBindByPosition.class)
                || f.isAnnotationPresent(CsvBindAndSplitByPosition.class)
                || f.isAnnotationPresent(CsvBindAndJoinByPosition.class)
                || f.isAnnotationPresent(CsvCustomBindByPosition.class));

        // Set the mapping strategy according to what we've found.
        MappingStrategy<T> mappingStrategy = positionAnnotationsPresent ?
                new ColumnPositionMappingStrategy<>() :
                new HeaderColumnNameMappingStrategy<>();
        mappingStrategy.setErrorLocale(errorLocale);
        mappingStrategy.setType(type);
        return mappingStrategy;
    }

我们可以看出, 当不使用位置绑定相关注解时, 默认使用 HeaderColumnNameMappingStrategy 类的实例(这里用一个缩写代替该实例 HCNMS-Instance). HCNSMS-instance 自身并没有实现 generateHeader(T bean)方法, 故 HCNSMS-instance.generateHeader(T bean) 方法最终调用父类的父类 AbstractMappingStrategy 的方法 AbstractMappingStrategy.generateHeader(T bean) 方法

    public String[] generateHeader(T bean) throws CsvRequiredFieldEmptyException {
        if(type == null) {
            throw new IllegalStateException(ResourceBundle
                    .getBundle(ICSVParser.DEFAULT_BUNDLE_NAME, errorLocale)
                    .getString("type.before.header"));
        }
        
        // Always take what's been given or previously determined first.
        if(headerIndex.isEmpty()) {
        	/** 
        	getFieldMap() 方法获取 FieldMapByName 对象, 该
        	对象用一个 SortetMap 的实例 simpleMap 来存储对象属性信息、用
        	以生成与之顺序对应的表头; SortedMap 的属性会根据 key 的自然顺序排序.
        	`FieldMapByName.generateHeader` 方法还做了多个属性映射到一列的操作, 笔者
        	的业务需求没有一点, 故在前述代码中(太长不看版那儿)直接注释掉.
        	*/
            String[] header = getFieldMap().generateHeader(bean);
            /**这一步传入的 header 决定了写入 csv 文件 的列顺序, 因为 HeaderIndex 使用 header 数组的数据来初始化, 并以此决定最终 csv 列的顺序/*
            headerIndex.initializeHeaderIndex(header);
            return header;
        }
        
        // Otherwise, put headers in the right places.
        return headerIndex.getHeaderIndex();
    }

在上述 AbstractMappingStrategy.generateHeader(T bean) 方法中,主要做了 HeaderIndex 对象的实例化,并返回 一个表头数组,该数组与最终写入 csv 的列顺序保持一致.
到这儿我们不禁要问, 这个 HeaderIndex 是如何决定 csv 列的顺序呢?
我们看一下写文件的逻辑. 通过线程池提交的任务最终执行代码如下

    @Override
    public void run() {
        try {
        /*这里的 mappingStrategy.transmuteBean(bean) 就是进行真正的 csv 文件写入*/
            OpencsvUtils.queueRefuseToAcceptDefeat(resultantLineQueue,
                    new OrderedObject<>(lineNumber, mappingStrategy.transmuteBean(bean)));
        }
        catch (CsvException e) {
            e.setLineNumber(lineNumber);
            if(throwExceptions) {
                throw new RuntimeException(e);
            }
            OpencsvUtils.queueRefuseToAcceptDefeat(thrownExceptionsQueue,
                    new OrderedObject<>(lineNumber, e));
        }
        catch(CsvRuntimeException csvre) {
            // Rethrowing exception here because I do not want the CsvRuntimeException caught and rewrapped in the catch below.
            throw csvre;
        }
        catch(Exception t) {
            throw new RuntimeException(t);
        }
    }

我们进入 mappingStrategy.transmuteBean(bean) 方法, 这个方法最终是返回一个顺序与固定列一致的对象属性值 数组:

   @Override
    public String[] transmuteBean(T bean) throws CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {
        int numColumns = headerIndex.findMaxIndex()+1;
        BeanField<T, K> firstBeanField, subsequentBeanField;
        K firstIndex, subsequentIndex;
        List<String> contents = new ArrayList<>(Math.max(numColumns, 0));

        // Create a map of types to instances of subordinate beans
        Map<Class<?>, Object> instanceMap;
        try {
            instanceMap = indexBean(bean);
        }
        catch(IllegalAccessException | InvocationTargetException e) {
            // Our testing indicates these exceptions probably can't be thrown,
            // but they're declared, so we have to deal with them. It's an
            // alibi catch block.
            CsvBeanIntrospectionException csve = new CsvBeanIntrospectionException(
                    ResourceBundle.getBundle(
                            ICSVParser.DEFAULT_BUNDLE_NAME, errorLocale)
                            .getString("error.introspecting.beans"));
            csve.initCause(e);
            throw csve;
        }

		/*这个 for 循环最终按固定列顺序返回一个对象的各属性的值集合*/
        for(int i = 0; i < numColumns;) {

			//这一步就是通过经 HeaderIndex 实例查找索引 i 对应的表头以及属性字段对象 (Field)
            // Determine the first value
            firstBeanField = findField(i);
            firstIndex = chooseMultivaluedFieldIndexFromHeaderIndex(i);
            String[] fields = firstBeanField != null
                    ? firstBeanField.write(instanceMap.get(firstBeanField.getType()), firstIndex)
                    : ArrayUtils.EMPTY_STRING_ARRAY;

            if(fields.length == 0) {

                // Write the only value
                contents.add(StringUtils.EMPTY);
                i++; // Advance the index
            }
            else {

                // Multiple values. Write the first.
                contents.add(StringUtils.defaultString(fields[0]));

                // Now write the rest.
                // We must make certain that we don't write more fields
                // than we have columns of the correct type to cover them.
                int j = 1;
                int displacedIndex = i+j;
                subsequentBeanField = findField(displacedIndex);
                subsequentIndex = chooseMultivaluedFieldIndexFromHeaderIndex(displacedIndex);
                while(j < fields.length
                        && displacedIndex < numColumns
                        && Objects.equals(firstBeanField, subsequentBeanField)
                        && Objects.equals(firstIndex, subsequentIndex)) {
                    // This field still has a header, so add it
                    contents.add(StringUtils.defaultString(fields[j]));

                    // Prepare for the next loop through
                    displacedIndex = i + (++j);
                    subsequentBeanField = findField(displacedIndex);
                    subsequentIndex = chooseMultivaluedFieldIndexFromHeaderIndex(displacedIndex);
                }

                i = displacedIndex; // Advance the index

                // And here's where we fill in any fields that are missing to
                // cover the number of columns of the same type
                if(i < numColumns) {
                    subsequentBeanField = findField(i);
                    subsequentIndex = chooseMultivaluedFieldIndexFromHeaderIndex(i);
                    while(Objects.equals(firstBeanField, subsequentBeanField)
                            && Objects.equals(firstIndex, subsequentIndex)
                            && i < numColumns) {
                        contents.add(StringUtils.EMPTY);
                        subsequentBeanField = findField(++i);
                        subsequentIndex = chooseMultivaluedFieldIndexFromHeaderIndex(i);
                    }
                }
            }
        }
        return contents.toArray(new String[0]);
    }

上述代码的 for 循环里有一行很关键的代码 firstBeanField = findField(i), 这一行就是使用 索引 i 去确定 Field 对象实例.

    @Override
    protected BeanField<T, String> findField(int col) throws CsvBadConverterException {
        BeanField<T, String> beanField = null;
        //将索引传入这个方法,找到索引对应的大写对象属性名, 详见下个代码块
        String columnName = getColumnName(col);
        if (columnName == null) {
            return null;
        }
        columnName = columnName.trim();
        if (!columnName.isEmpty()) {
        //在这一步使用大写的字段名获取 simpleMap (fieledMap 内存储列字段信息的 map 数组) 对应的 Field 实例
            beanField = fieldMap.get(columnName.toUpperCase());
        }
        return beanField;
    }


    String getColumnName(int col) {
    	// 这里访问 headerIndex.getByPosition 方法
        // headerIndex is never null because it's final
        return headerIndex.getByPosition(col);
    }

    public String getByPosition(int i) {
        if(i < positionToHeader.length) {
        //在这里我们可以看到, 通过索引检索对应的大写字段名
            return positionToHeader[i];
        }
        return null;
    }

到这儿, 基本就已经确定最终写入 csv 文件的列顺序了. 后续是收集这些数组, 写入 csv 文件.

写在最后

由于个人能力比较差, 加上看源码时比较匆忙, 可能会存在错误或纰漏, 欢迎各位朋友发现错误时在评论区指出!

posted on 2021-02-08 15:48  gaarakseven  阅读(2032)  评论(0编辑  收藏  举报