java8新特性-引用流执行流程、filter,map,collect操作

例子:

public class User implements Comparable<User> {
    private String name;
    private Integer age;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Integer getAge() {
        return age;
    }

    public void setAge(Integer age) {
        this.age = age;
    }

    public User(){}

    public User(String name, Integer age) {
        this.name = name;
        this.age = age;
    }

    @Override
    public String toString() {
        return "User{" +
                "name='" + name + '\'' +
                ", age=" + age +
                '}';
    }


    @Override
    public int compareTo(User o) {
        return age.compareTo(o.getAge());
    }
}

测试代码:

    List<User> users = new ArrayList<>();
    users.add(new User("张三",30));
    users.add(new User("李四",34));
    users.add(new User("王五",20));

    List<String> list = users.stream().filter(user -> user.getAge() != null && user.getAge() >= 30).map(User::getName).collect(Collectors.toList());
    System.out.println(list);

以上代码是求出User的age大于等于30的name并收集成List,打印。

在上面的例子,collect是个终端操作,执行后关闭流。users.stream()创建了ReferencePipeline.Head,表示流操作的头,主要是引用了Spliterator(数据)以及流标志。users.stream()后collect之前的是中间操作。每个操作都抽象成ReferencePipeline。即StatelessOp是ReferencePipeline。在StatelessOp中由Sink负责聚合操作。并且在执行终端操作时才处理数据。中间操作的之间的StatelessOp由AbstractPipeline的previousStage链接上一个流操作,nextStage字段链接下一个流操作,每个中间操作的sourceStage字段都链接到ReferencePipeline.Head。每调用一个中间操作,depth(深度)在前一个流的depth基础上加一。

源码解析:

ReferencePipeline#filter(Predicate<? super P_OUT> predicate)

@Override
public final Stream<P_OUT> filter(Predicate<? super P_OUT> predicate) {
    Objects.requireNonNull(predicate);
    return new StatelessOp<P_OUT, P_OUT>(this, StreamShape.REFERENCE,
                                 StreamOpFlag.NOT_SIZED) {
        @Override
        Sink<P_OUT> opWrapSink(int flags, Sink<P_OUT> sink) {
            return new Sink.ChainedReference<P_OUT, P_OUT>(sink) {
                @Override
                public void begin(long size) {
                    downstream.begin(-1);
                }

                @Override
                public void accept(P_OUT u) {
                    if (predicate.test(u))
                        downstream.accept(u);
                }
            };
        }
    };
}

将filter操作抽象成StatelessOp返回,并将this(上一个Stream,在这个例子中是ReferencePipeline.Head)StatelessOp是无状态的ReferencePipeline。

 abstract static class StatelessOp<E_IN, E_OUT>
        extends ReferencePipeline<E_IN, E_OUT> {
   
    StatelessOp(AbstractPipeline<?, E_IN, ?> upstream,
                StreamShape inputShape,
                int opFlags) {
        super(upstream, opFlags);
        assert upstream.getOutputShape() == inputShape;
    }

    @Override
    final boolean opIsStateful() {
        return false;
    }
}

ReferencePipeline#map(Function<? super P_OUT, ? extends R> mapper)

  public final <R> Stream<R> map(Function<? super P_OUT, ? extends R> mapper) {
    Objects.requireNonNull(mapper);
    return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
                                 StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
        @Override
        Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
            return new Sink.ChainedReference<P_OUT, R>(sink) {
                @Override
                public void accept(P_OUT u) {
                    downstream.accept(mapper.apply(u));
                }
            };
        }
    };
}

将map操作抽象成StatelessOp并返回,可看到此时并没有处理数据。

ReferencePipeline#collect(Collector<? super P_OUT, A, R> collector)

   public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
    A container;
    if (isParallel()
            && (collector.characteristics().contains(Collector.Characteristics.CONCURRENT))
            && (!isOrdered() || collector.characteristics().contains(Collector.Characteristics.UNORDERED))) {
        container = collector.supplier().get();
        BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
        forEach(u -> accumulator.accept(container, u));
    }
    else {
        container = evaluate(ReduceOps.makeRef(collector));
    }
    return collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)
           ? (R) container
           : collector.finisher().apply(container);
}

collect是终止操作。首先进行并行流的处理。evaluate(ReduceOps.makeRef(collector))处理顺序流。可看到将collect抽象成ReduceOp,通过ReduceOps.makeRef实现的。调用AbstractPipeline#evaluate(TerminalOp<E_OUT, R> terminalOp)执行实际处理。

ReduceOps#makeRef(Collector collector)

public static <T, I> TerminalOp<T, I>
makeRef(Collector<? super T, I, ?> collector) {
    Supplier<I> supplier = Objects.requireNonNull(collector).supplier();
    BiConsumer<I, ? super T> accumulator = collector.accumulator();
    BinaryOperator<I> combiner = collector.combiner();
    class ReducingSink extends Box<I>
            implements AccumulatingSink<T, I, ReducingSink> {
        @Override
        public void begin(long size) {
            state = supplier.get();
        }

        @Override
        public void accept(T t) {
            accumulator.accept(state, t);
        }

        @Override
        public void combine(ReducingSink other) {
            state = combiner.apply(state, other.state);
        }
    }
    return new ReduceOp<T, I, ReducingSink>(StreamShape.REFERENCE) {
        @Override
        public ReducingSink makeSink() {
            return new ReducingSink();
        }

        @Override
        public int getOpFlags() {
            return collector.characteristics().contains(Collector.Characteristics.UNORDERED)
                   ? StreamOpFlag.NOT_ORDERED
                   : 0;
        }
    };
}

ReduceOps#makeRef抽象ReduceOp。

AbstractPipeline#evaluate(TerminalOp<E_OUT, R> terminalOp)

    final <R> R evaluate(TerminalOp<E_OUT, R> terminalOp) {
    assert getOutputShape() == terminalOp.inputShape();
    if (linkedOrConsumed)
        throw new IllegalStateException(MSG_STREAM_LINKED);
    linkedOrConsumed = true;

    return isParallel()
           ? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
           : terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));
}

在这里调用terminalOp.evaluateSequential处理顺序流。通过sourceSpliterator获取数据,即流的源sourceSpliterator。Spliterator中保留了要处理的数据。terminalOp是ReduceOp。

ReduceOp#evaluateSequential(PipelineHelper helper,Spliterator<P_IN> spliterator)

 @Override
    public <P_IN> R evaluateSequential(PipelineHelper<T> helper,
                                       Spliterator<P_IN> spliterator) {
        return helper.wrapAndCopyInto(makeSink(), spliterator).get();
    }

makeSink()调用上面collect方法中ReduceOps.makeRef实现的makeSink()返回ReducingSink。往下看wrapAndCopyInto。

AbstractPipeline#helper.wrapAndCopyInto

 final <P_IN, S extends Sink<E_OUT>> S wrapAndCopyInto(S sink, Spliterator<P_IN> spliterator) {
    copyInto(wrapSink(Objects.requireNonNull(sink)), spliterator);
    return sink;
}

调用wrapSink从后往前链接中间操作的Sink。copyInto执行数据处理。

AbstractPipeline#wrapSink(Sink<E_OUT> sink)

  final <P_IN> Sink<P_IN> wrapSink(Sink<E_OUT> sink) {
    Objects.requireNonNull(sink);

    for ( @SuppressWarnings("rawtypes") AbstractPipeline p=AbstractPipeline.this; p.depth > 0; p=p.previousStage) {
        sink = p.opWrapSink(p.previousStage.combinedFlags, sink);
    }
    return (Sink<P_IN>) sink;
}

从当前对象,用previousStage字段往前遍历,直到depth等于0,依次调用AbstractPipeline.opWrapSink封装当前Sink。在这个例子中collect的前一个操作是map,map函数返回的StatelessOp重写了opWrapSink方法。opWrapSink方法返回了Sink.ChainedReference。ChainedReference是Sink的实现类。再往前是filter操作,filter()返回的StatelessOp重写了opWrapSink方法,opWrapSink方法返回了Sink.ChainedReference。可看到wrapSink方法没有封装ReferencePipeline.Head。执行完wrapSink方法后,sink为:

通过downstream关联下一个Sink。再看AbstractPipeline#copyInto

 final <P_IN> void copyInto(Sink<P_IN> wrappedSink, Spliterator<P_IN> spliterator) {
    Objects.requireNonNull(wrappedSink);

    if (!StreamOpFlag.SHORT_CIRCUIT.isKnown(getStreamAndOpFlags())) {
        wrappedSink.begin(spliterator.getExactSizeIfKnown());
        spliterator.forEachRemaining(wrappedSink);
        wrappedSink.end();
    }
    else {
        copyIntoWithCancel(wrappedSink, spliterator);
    }
}

首先处理非短路操作的Stream,copyIntoWithCancel处理短路操作的Stream。此例子是非短路的Stream。wrappedSink.begin从上图中的第一个Sink开始执行,filter()抽象的StatelessOp#begin(long size):

public void begin(long size) {
                    downstream.begin(-1);
                }

调用map抽象的begin方法。map()抽象的begin(long size):

public void begin(long size) {
        downstream.begin(size);
    }

继续调用collect抽象的begin:

  public void begin(long size) {
            state = supplier.get();
        }

supplier是collect的参数Collectors.toList()提供的。从下面可知supplier.get()调用ArrayList::new实例化ArrayList。

Collectors#toList()

  public static <T>
Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_ID);
}


 CollectorImpl(Supplier<A> supplier,
                  BiConsumer<A, T> accumulator,
                  BinaryOperator<A> combiner,
                  Set<Characteristics> characteristics) {
        this(supplier, accumulator, combiner, castingIdentity(), characteristics);
    }

 private static <I, R> Function<I, R> castingIdentity() {
    return i -> (R) i;
}

可知:supplier为ArrayList::new,accumulator为List::add,combiner为(left, right) -> { left.addAll(right); return left; },finisher为 i -> (R) i,characteristics为Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH))。

再回到AbstractPipeline#copyInto中,调用spliterator.forEachRemaining(wrappedSink);处理数据。spliterator是ArrayList的内部类ArrayListSpliterator。

ArrayList#ArrayListSpliterator#forEachRemaining(Consumer<? super E> action)

public void forEachRemaining(Consumer<? super E> action) {
        int i, hi, mc; // hoist accesses and checks from loop
        ArrayList<E> lst; Object[] a;
        if (action == null)
            throw new NullPointerException();
        if ((lst = list) != null && (a = lst.elementData) != null) {
            if ((hi = fence) < 0) {
                mc = lst.modCount;
                hi = lst.size;
            }
            else
                mc = expectedModCount;
            if ((i = index) >= 0 && (index = hi) <= a.length) {
                for (; i < hi; ++i) {
                    @SuppressWarnings("unchecked") E e = (E) a[i];
                    action.accept(e);
                }
                if (lst.modCount == mc)
                    return;
            }
        }
        throw new ConcurrentModificationException();
    }

遍历所有元素,依次调用action.accept,action在这里是上面的wrappedSink。此时调用Sink.ChainedReference(编号a)的accept(P_OUT u) 方法:

 public void accept(P_OUT u) {
                    if (predicate.test(u))
                        downstream.accept(u);
                }

predicate是user -> user.getAge() != null && user.getAge() >= 30。如果predicate.test断言成功则调用Sink.ChainedReference(编号b)的accept方法。此时过滤了数据。

Sink.ChainedReference(编号b)的accept(P_OUT u) :

  public void accept(P_OUT u) {
                    downstream.accept(mapper.apply(u));
                }

mapper是User::getName。对每个User调用getName方法后调用ReducingSink的accept方法:

public void accept(T t) {
            accumulator.accept(state, t);
        }

从上面可知accumulator是List::add,将元素加到刚才创建的ArrayList中。state是刚才创建的ArrayList。

当spliterator.forEachRemaining(wrappedSink);遍历完所有数据后调用 wrappedSink.end()。这里什么不做。将刚才创建的ArrayList返回。结束

posted @   shigp1  阅读(100)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
点击右上角即可分享
微信分享提示