JAVA 8 collector 收集器接口源码解析
简介
Java 8 中的 .collect()
方法,是流的终端操作,目的是将流中的所有项目合并到一个结果
<R, A> R collect(Collector<? super T, A, R> collector);
// 简单示例
// 具体做法是通过定义新的 Collector 接口来定义的,
menu.stream().collect(Collectors.counting())
menu.stream().collect(Collectors.toList())
收集器接口
可以定义 Collector
接口来实现自己想要的操作
/**
* <T> – the type of input elements to the reduction operation
* <A> – the mutable accumulation type of the reduction operation (often hidden as an implementation detail) 累加器类型
* <R> – the result type of the reduction operation
*/
public interface Collector<T, A, R> {
Supplier<A> supplier();
BiConsumer<A, T> accumulator();
BinaryOperator<A> combiner();
Function<A, R> finisher();
Set<Characteristics> characteristics();
}
以 Collectors.toList()
为例
Collector<T, ?, List<T>> toList() {
return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new,
List::add,
(left, right) -> { left.addAll(right); return left; },
CH_ID);
}
过程分析
建立新的结果容器 supplier()
/**
* A function that creates and returns a new mutable result container.
*
* 创建一个新的结果容器
*
* @return a function which returns a new, mutable result container
*/
Supplier<A> supplier();
@FunctionalInterface
public interface Supplier<T> {
/**
* Gets a result.
*
* @return a result
*/
T get();
}
// ToListCollector 中的实现
public Supplier<List<T>> supplier() {
return () -> new ArrayList(T);
// 等同于
return ArrayList::new;
}
将元素添加到结果容器中:accumulator()
/**
* A function that folds a value into a mutable result container.
*
* 将值注入到容器中
*
* @return a function which folds a value into a mutable result container
*/
BiConsumer<A, T> accumulator();
@FunctionalInterface
public interface BiConsumer<T, U> {
/**
* Performs this operation on the given arguments.
*
* @param t the first input argument
* @param u the second input argument
*/
void accept(T t, U u);
}
// ToListCollector 中的实现
public BiConsumerr<List<T>, T> accumulator() {
return (list, item) -> list.add(item);
// 等同于
return List::add;
}
对结果容器应用最终转换:finisher()
/**
* Perform the final transformation from the intermediate accumulation type
* {@code A} to the final result type {@code R}.
*
* 将中间的累积的类型执行最终转换
*
* <p>If the characteristic {@code IDENTITY_TRANSFORM} is
* set, this function may be presumed to be an identity transform with an
* unchecked cast from {@code A} to {@code R}.
*
* @return a function which transforms the intermediate result to the final
* result
*/
Function<A, R> finisher();
@FunctionalInterface
public interface Function<T, R> {
/**
* Applies this function to the given argument.
*
* @param t the function argument
* @return the function result
*/
R apply(T t);
}
// ToListCollector 中的实现
// 累加器对象恰好符合预期的最终结果,因此无需进行转换。
private static Function<List<T>, List<T>> finisher() {
return list -> (List<T>) list;
}
supplier()、accumulator()、finisher()
三个方法已经足以对流进行顺序归约,
合并两个结果容器:combiner()
该方法的引入,是为了实现对流的并行归约操作。
- 原始流会以递归的方式拆分为子流,直到定义流是否需要进一步拆分的条件为非。
- 所有的子流都可以并行处理。每个子流执行对应的顺序规约操作。
- 最后使用
combiner()
返回的函数,将所有结果两两合并。
/**
* A function that accepts two partial results and merges them. The
* combiner function may fold state from one argument into the other and
* return that, or may return a new result container.
*
* 将两个部分的结果合并:将一个参数的状态放入另一个,然后将其返回;或者返回一个新的结果容器
*
* @return a function which combines two partial results into a combined
* result
*/
BinaryOperator<A> combiner();
@FunctionalInterface
public interface BinaryOperator<T> extends BiFunction<T,T,T> { }
@FunctionalInterface
public interface BiFunction<T, U, R> {
/**
* Applies this function to the given arguments.
*
* @param t the first function argument
* @param u the second function argument
* @return the function result
*/
R apply(T t, U u);
}
// ToListCollector 中的实现
public BinaryOperator<List<T>> combiner() {
return (left, right) -> {
left.addAll(right);
return left;
},
}
定义收集器行为:characteristics()
/**
* Returns a {@code Set} of {@code Collector.Characteristics} indicating
* the characteristics of this Collector. This set should be immutable.
*
* 返回一个不可改变的 Characteristics 集合,定义了收集器的行为
*
* @return an immutable set of collector characteristics
*/
Set<Characteristics> characteristics();
enum Characteristics {
/**
* Indicates that this collector is <em>concurrent</em>, meaning that
* the result container can support the accumulator function being
* called concurrently with the same result container from multiple
* threads.
*
* 指该收集器是并发的
*
* <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
* then it should only be evaluated concurrently if applied to an
* unordered data source.
*/
CONCURRENT,
/**
* Indicates that the collection operation does not commit to preserving
* the encounter order of input elements. (This might be true if the
* result container has no intrinsic order, such as a {@link Set}.)
*
* 结果是无序的
*/
UNORDERED,
/**
* Indicates that the finisher function is the identity function and
* can be elided. If set, it must be the case that an unchecked cast
* from A to R will succeed.
*
* 这表明 finish() 返回的函数是一个恒等函数,可以跳过。
* 这种情况下,累加器对象将会直接用作归约过程的最终结果。
* 这也意味着,将累加器A不加检查地转换为结果R是安全的。
*/
IDENTITY_FINISH
}
// ToListCollector 中的实现
static final Set<Collector.Characteristics> CH_ID =
Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));
Collectors.toList()
底层源码
// toList 源码
Collector<T, ?, List<T>> toList() {
return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new,
List::add,
(left, right) -> { left.addAll(right); return left; },
CH_ID);
}
// CollectorImpl() 实现
static class CollectorImpl<T, A, R> implements Collector<T, A, R> {
CollectorImpl(Supplier<A> supplier,
BiConsumer<A, T> accumulator,
BinaryOperator<A> combiner,
Set<Characteristics> characteristics) {
this(supplier, accumulator, combiner, castingIdentity(), characteristics);
}
CollectorImpl(Supplier<A> supplier,
BiConsumer<A, T> accumulator,
BinaryOperator<A> combiner,
Function<A,R> finisher,
Set<Characteristics> characteristics) {
this.supplier = supplier;
this.accumulator = accumulator;
this.combiner = combiner;
this.finisher = finisher;
this.characteristics = characteristics;
}
// CollectorImpl() 中的 castingIdentity() 实现
private static <I, R> Function<I, R> castingIdentity() {
return i -> (R) i;
}
}
// CH_ID 参数所代表的值
static final Set<Collector.Characteristics> CH_ID =
Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));
开发自己的收集器
public class ToListCollector<T> implements Collector<T, List<T>, List<T>> {
@Override
public Supplier<List<T>> supplier() {
return () -> new ArrayList<>();
}
@Override
public BiConsumer<List<T>, T> accumulator() {
return (list, t) -> list.add(t);
}
@Override
public BinaryOperator<List<T>> combiner() {
return (list, list2) -> {
list.addAll(list2);
return list;
};
}
@Override
public Function<List<T>, List<T>> finisher() {
// t -> t;
return Function.identity();
}
@Override
public Set<Characteristics> characteristics() {
return Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));
}
}
// 调用方式
List<String> collect = transactions.stream()
.map(transaction -> transaction.getTrader().getCity())
.distinct()
.collect(new ToListCollector<String>());
// 等同于
List<String> collect = transactions.stream()
.map(transaction -> transaction.getTrader().getCity())
.distinct()
.collect(Collectors.toList());
不实现 Collector,进行自定义收集
对于 IDENTITY_FINISH
的收集操作,Stream
有一个重载的方法 collect
可以接受另外三个参数——supplier、accumulator 和 combiner
。
<R> R collect(Supplier<R> supplier,
BiConsumer<R, ? super T> accumulator,
BiConsumer<R, R> combiner);
List<String> collect = transactions.stream()
.map(transaction -> transaction.getTrader().getCity())
.distinct()
.collect(
ArrayList::new, // supplier
List::add, // accumulator
List::addAll
); // finisher
参考
《Java 8 实战》