如何自定义一个Collector

Collectors类提供了很多方便的方法，假如现有的实现不能满足需求，我们如何自定义一个Collector呢?

Collector接口提供了一个of方法，调用该方法就可以实现定制Collector。

    public static<T, A, R> Collector<T, A, R> of(Supplier<A> supplier,
                                                 BiConsumer<A, T> accumulator,
                                                 BinaryOperator<A> combiner,
                                                 Function<A, R> finisher,
                                                 Characteristics... characteristics) {
        ...
        Set<Characteristics> cs = Collectors.CH_NOID;
        if (characteristics.length > 0) {
            cs = EnumSet.noneOf(Characteristics.class);
            Collections.addAll(cs, characteristics);
            cs = Collections.unmodifiableSet(cs);
        }
        return new Collectors.CollectorImpl<>(supplier, accumulator, combiner, finisher, cs);
    }

supplier提供一个 A 类型的对象，用于保存累加操作的结果；
accumulator提供累加操作的实现，接收 T 类型输入数据和 A 类型对象，将计算结果保存到 A 类型对象；
combiner用于多个输入对象的合并，在普通串行(sequential)的情况下只有一个输入对象可以忽略，假如steam使用了并发操作(parallel)时就必须进行对象合并了；
finisher用于将输入参数类型 A 转化为我们最终需要的类型 R，假如A与R的类型一致，则无需转化，可以忽略finisher的设置；
characteristics用于指定操作的优化类型；

值得注意的是，注释中要求提供的T对象参数的类型必须为可变的(mutable)；

@param <A> the mutable accumulation type of the reduction operation (often hidden as an implementation detail)

在自定义之前，我们先来看看Collectors类源代码是怎么实现Collector接口。以常用的toList方法为例，

static final Set<Collector.Characteristics> CH_ID = Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));

public static <T> Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, 
                               List::add,
                               (left, right) -> { 
                                    left.addAll(right); 
                                    return left; 
                               },
                               CH_ID);
}

在toList方法中，supplier为ArrayList::new，构造一个ArrayList对象用于累加的容器，使用List::add作为accumulator累加操作，combiner实现中调用List的addAll
合并两个列表，由于最终类型就是List，因此toList忽略finisher，使用IDENTITY_FINISH优化类型，标明不需要进行finisher转换类型，直接返回计算结果。

参考Collectors类方法的实现，我们来自定义一个Collector。假设有个需求，存在一个包含多个A类型对象的列表，要求计算该列表所有A对象属性count的总和。

 public class A {
    
    private int count;

    public A(int count) {
        this.count = count;
    }

    public int getCount() {
        return count;
    }
}

对于该需求，一般可以简单的使用mapToInt或者Collectors类提供的summingInt方法实现。

var aList = new ArrayList<A>();
...
int totalUseMap = aList.stream().mapToInt(A::getCount).sum();
// or
int totalUseCollect = aList.stream().collect(Collectors.summingInt(Obj::getCount));

但是假如使用自定义Collector的话，我们应该如何实现呢？方法如下：

var aList = new ArrayList<A>();
...
int total = aList.parallelStream().collect(Collector.of(() -> new int[1],
                                                (result, a) -> result[0] += a.getCount(),
                                                (a, b) -> {
                                                    a[0] += b[0];
                                                    return a;
                                                },
                                                result -> result[0],
                                                Collector.Characteristics.CONCURRENT));

supplier必须提供可变类型对象，所以不能简单的提供() -> 0，因为int是不可变的类型。假如使用int类型，会导致返回结果还是提供int对象的初始值，无法完成需求。
此外计算结果类型要求为int，与提供的累加对象int[]类型不一致，因此我们通过finisher函数将int[]类型的对象转化为int类型数据返回。accumulator的计算过程并不影响并
发，设置characteristics参数为CONCURRENT，支持并发操作，提高性能。

posted on 2019-12-05 15:47 yeyu456 阅读(262) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· TypeScript + Deepseek 打造卜卦网站：技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
· 三行代码完成国际化适配，妙~啊~

yeyu456

如何自定义一个Collector

导航

公告

搜索

常用链接

我的标签

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论