JAVA 8 collector 收集器接口源码解析

简介

Java 8 中的 .collect() 方法,是流的终端操作,目的是将流中的所有项目合并到一个结果

<R, A> R collect(Collector<? super T, A, R> collector);

// 简单示例
// 具体做法是通过定义新的 Collector 接口来定义的,
menu.stream().collect(Collectors.counting())
menu.stream().collect(Collectors.toList())

收集器接口

可以定义 Collector 接口来实现自己想要的操作

/**
*	<T> – the type of input elements to the reduction operation
*	<A> – the mutable accumulation type of the reduction operation (often hidden as an implementation detail) 累加器类型
*	<R> – the result type of the reduction operation
*/
public interface Collector<T, A, R> {

    Supplier<A> supplier();

    BiConsumer<A, T> accumulator();

    BinaryOperator<A> combiner();

    Function<A, R> finisher();

    Set<Characteristics> characteristics();
}

Collectors.toList() 为例

Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, 
                               		List::add,
                               		(left, right) -> { left.addAll(right); return left; },
                               		CH_ID);
}

过程分析

建立新的结果容器 supplier()

/**
 * A function that creates and returns a new mutable result container.
 *
 * 创建一个新的结果容器
 *
 * @return a function which returns a new, mutable result container
 */
Supplier<A> supplier();

@FunctionalInterface
public interface Supplier<T> {
    /**
     * Gets a result.
     *
     * @return a result
     */
    T get();
}

// ToListCollector 中的实现
public Supplier<List<T>> supplier() {
    return () -> new ArrayList(T);
    // 等同于
    return ArrayList::new;
}

将元素添加到结果容器中:accumulator()

/**
* A function that folds a value into a mutable result container.
* 
* 将值注入到容器中
* 
* @return a function which folds a value into a mutable result container
*/
BiConsumer<A, T> accumulator();

@FunctionalInterface
public interface BiConsumer<T, U> {

    /**
     * Performs this operation on the given arguments.
     *
     * @param t the first input argument
     * @param u the second input argument
     */
    void accept(T t, U u);
}

// ToListCollector 中的实现
public BiConsumerr<List<T>, T> accumulator() {
    return (list, item) -> list.add(item);
    // 等同于
    return List::add;
}

对结果容器应用最终转换:finisher()

/**
 * Perform the final transformation from the intermediate accumulation type
 * {@code A} to the final result type {@code R}.
 *
 * 将中间的累积的类型执行最终转换
 *
 * <p>If the characteristic {@code IDENTITY_TRANSFORM} is
 * set, this function may be presumed to be an identity transform with an
 * unchecked cast from {@code A} to {@code R}.
 *
 * @return a function which transforms the intermediate result to the final
 * result
 */
Function<A, R> finisher();

@FunctionalInterface
public interface Function<T, R> {

    /**
     * Applies this function to the given argument.
     *
     * @param t the function argument
     * @return the function result
     */
    R apply(T t);
}

// ToListCollector 中的实现
// 累加器对象恰好符合预期的最终结果,因此无需进行转换。
private static Function<List<T>, List<T>> finisher() {
    return list -> (List<T>) list;
}

supplier()、accumulator()、finisher() 三个方法已经足以对流进行顺序归约,

image-20211209102940954

合并两个结果容器:combiner()

该方法的引入,是为了实现对流的并行归约操作。

  • 原始流会以递归的方式拆分为子流,直到定义流是否需要进一步拆分的条件为非。
  • 所有的子流都可以并行处理。每个子流执行对应的顺序规约操作。
  • 最后使用 combiner() 返回的函数,将所有结果两两合并。

image-20211209102910662

/**
 * A function that accepts two partial results and merges them.  The
 * combiner function may fold state from one argument into the other and
 * return that, or may return a new result container.
 * 
 * 将两个部分的结果合并:将一个参数的状态放入另一个,然后将其返回;或者返回一个新的结果容器
 *
 * @return a function which combines two partial results into a combined
 * result
 */
BinaryOperator<A> combiner();

@FunctionalInterface
public interface BinaryOperator<T> extends BiFunction<T,T,T> { }

@FunctionalInterface
public interface BiFunction<T, U, R> {
    /**
     * Applies this function to the given arguments.
     *
     * @param t the first function argument
     * @param u the second function argument
     * @return the function result
     */
    R apply(T t, U u);
}

// ToListCollector 中的实现
public BinaryOperator<List<T>> combiner() {
    return (left, right) -> { 
        left.addAll(right); 
        return left; 
    },
}

定义收集器行为:characteristics()

/**
 * Returns a {@code Set} of {@code Collector.Characteristics} indicating
 * the characteristics of this Collector.  This set should be immutable.
 *
 * 返回一个不可改变的 Characteristics 集合,定义了收集器的行为
 *
 * @return an immutable set of collector characteristics
 */
Set<Characteristics> characteristics();

enum Characteristics {
    /**
     * Indicates that this collector is <em>concurrent</em>, meaning that
     * the result container can support the accumulator function being
     * called concurrently with the same result container from multiple
     * threads.
     *
     * 指该收集器是并发的
     *
     * <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
     * then it should only be evaluated concurrently if applied to an
     * unordered data source.
     */
    CONCURRENT,

    /**
     * Indicates that the collection operation does not commit to preserving
     * the encounter order of input elements.  (This might be true if the
     * result container has no intrinsic order, such as a {@link Set}.)
     *
     * 结果是无序的
     */
    UNORDERED,

    /**
     * Indicates that the finisher function is the identity function and
     * can be elided.  If set, it must be the case that an unchecked cast
     * from A to R will succeed.
     * 
     * 这表明 finish() 返回的函数是一个恒等函数,可以跳过。
     * 这种情况下,累加器对象将会直接用作归约过程的最终结果。
     * 这也意味着,将累加器A不加检查地转换为结果R是安全的。
     */
    IDENTITY_FINISH
}


// ToListCollector 中的实现
static final Set<Collector.Characteristics> CH_ID =
    Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));

Collectors.toList() 底层源码

// toList 源码
Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, 
                               		List::add,
                               		(left, right) -> { left.addAll(right); return left; },
                               		CH_ID);
}

// CollectorImpl() 实现
static class CollectorImpl<T, A, R> implements Collector<T, A, R> {
    CollectorImpl(Supplier<A> supplier,
                      BiConsumer<A, T> accumulator,
                      BinaryOperator<A> combiner,
                      Set<Characteristics> characteristics) {
            this(supplier, accumulator, combiner, castingIdentity(), characteristics);
    }

    CollectorImpl(Supplier<A> supplier,
                      BiConsumer<A, T> accumulator,
                      BinaryOperator<A> combiner,
                      Function<A,R> finisher,
                      Set<Characteristics> characteristics) {
            this.supplier = supplier;
            this.accumulator = accumulator;
            this.combiner = combiner;
            this.finisher = finisher;
            this.characteristics = characteristics;
    }
    
    // CollectorImpl() 中的 castingIdentity() 实现
    private static <I, R> Function<I, R> castingIdentity() {
          return i -> (R) i;
    }

}

// CH_ID 参数所代表的值
static final Set<Collector.Characteristics> CH_ID =
    Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));

开发自己的收集器

public class ToListCollector<T> implements Collector<T, List<T>, List<T>> {
    @Override
    public Supplier<List<T>> supplier() {
        return () -> new ArrayList<>();
    }

    @Override
    public BiConsumer<List<T>, T> accumulator() {
        return (list, t) -> list.add(t);
    }

    @Override
    public BinaryOperator<List<T>> combiner() {
        return (list, list2) -> {
            list.addAll(list2);
            return list;
        };
    }

    @Override
    public Function<List<T>, List<T>> finisher() {
        // t -> t;
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));
    }
}

// 调用方式
List<String> collect = transactions.stream()
	.map(transaction -> transaction.getTrader().getCity())
	.distinct()
	.collect(new ToListCollector<String>());

// 等同于
List<String> collect = transactions.stream()
	.map(transaction -> transaction.getTrader().getCity())
	.distinct()
	.collect(Collectors.toList());

不实现 Collector,进行自定义收集

对于 IDENTITY_FINISH 的收集操作,Stream 有一个重载的方法 collect 可以接受另外三个参数——supplier、accumulator 和 combiner

<R> R collect(Supplier<R> supplier,
              BiConsumer<R, ? super T> accumulator,
              BiConsumer<R, R> combiner);


List<String> collect = transactions.stream()
	.map(transaction -> transaction.getTrader().getCity())
	.distinct()
    .collect(
 		ArrayList::new, // supplier
 		List::add,		// accumulator
 		List::addAll
);  // finisher

参考

《Java 8 实战》

posted @ 2021-12-09 11:07  平安QAQ  阅读(413)  评论(0编辑  收藏  举报