Flink源码解读之状态管理
一、从何说起
State要能发挥作用,就需要持久化到可靠存储中,flink中持久化的动作就是checkpointing,那么从TM中执行的Task的基类StreamTask的checkpoint逻辑说起。
1.streamTask
1 StreamTask 2 3 protected OperatorChain<OUT, OP> operatorChain; 4 CheckpointStreamFactory createCheckpointStreamFactory(StreamOperator<?> operator) 5 <K> AbstractKeyedStateBackend<K> createKeyedStateBackend( 6 7 TypeSerializer<K> keySerializer, 8 9 int numberOfKeyGroups, 10 11 KeyGroupRange keyGroupRange) 12 OperatorStateBackend createOperatorStateBackend( 13 14 StreamOperator<?> op, Collection<OperatorStateHandle> restoreStateHandles) 15 CheckpointStreamFactory createSavepointStreamFactory(StreamOperator<?> operator, String targetLocation) 16 StateBackend createStateBackend() 17 boolean triggerCheckpoint(CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions) 18 void triggerCheckpointOnBarrier( 19 20 CheckpointMetaData checkpointMetaData, 21 22 CheckpointOptions checkpointOptions, 23 24 CheckpointMetrics checkpointMetrics) 25 boolean performCheckpoint( 26 27 CheckpointMetaData checkpointMetaData, 28 29 CheckpointOptions checkpointOptions, 30 31 CheckpointMetrics checkpointMetrics) 32 void checkpointState( 33 34 CheckpointMetaData checkpointMetaData, 35 36 CheckpointOptions checkpointOptions, 37 38 CheckpointMetrics checkpointMetrics)
triggerCheckpoint->performCheckpoint->checkpointState,最终来到了checkpointingOperation。
2.checkpointingOperation
1 CheckpointingOperation 2 void executeCheckpointing(){ 3 …… 4 for (StreamOperator<?> op : allOperators) { 5 checkpointStreamOperator(op); 6 } 7 8 …… 9 } 10 void checkpointStreamOperator(StreamOperator<?> op) 11 …… 12 op.snapshotState( 13 14 checkpointMetaData.getCheckpointId(), 15 16 checkpointMetaData.getTimestamp(), 17 18 checkpointOptions) 19 ……
这个类中,直接对streamTask中传入的每一个operator调用其snapshotState方法。
那就再看Operator的基类。
3.StreamOperator
1 StreamOperator 2 OperatorSnapshotResult snapshotState( 3 4 long checkpointId, 5 6 long timestamp, 7 8 CheckpointOptions checkpointOptions) 9 void initializeState(OperatorSubtaskState stateHandles) 10 void notifyOfCompletedCheckpoint(long checkpointId)
StreamOperator是一个接口,其中包含了这三个接口,意味着继承它的Operator都必须实现这几个方法。
4.AbstractStreamOperator
1 AbstractStreamOperator 2 Final OperatorSnapshotResult snapshotState(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) 3 4 …… 5 snapshotState(snapshotContext); 6 …… 7 if (null != operatorStateBackend) { 8 9 snapshotInProgress.setOperatorStateManagedFuture( 10 11 operatorStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions)); 12 13 } 14 15 16 17 if (null != keyedStateBackend) { 18 19 snapshotInProgress.setKeyedStateManagedFuture( 20 21 keyedStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions)); 22 23 } 24 …… 25 void notifyOfCompletedCheckpoint(long checkpointId) 26 if (keyedStateBackend != null) { 27 28 keyedStateBackend.notifyCheckpointComplete(checkpointId); 29 30 } 31 void snapshotState(StateSnapshotContext context) 32 void initializeState(StateInitializationContext context)
AbstractStreamOperator是对StreamOperator的基础实现,在它的snapshotState方法中,分别调用了OperatorStateBackend和KeyedStateBackend的snapshot方法。
特别注意,在调用这两个方法之前的snapshotState(snapshotContext)这个调用,它一方面实现了Raw的State的snapshot,一方面也实现了用户自定义的函数的State的更新。
再说一下,后面的两个函数,snapshotState和initializeState,他们的形参都是一个context,是提供给用户来重新实现用户自己的state的checkpoints的。
这个类有一个很重要的子类,AbstractUdfStreamOperator,很多Operator都从这个类开始继承。
5.AbstractUdfStreamOperator
AbstractUdfStreamOperator void initializeState(StateInitializationContext context) throws Exception { super.initializeState(context); StreamingFunctionUtils.restoreFunctionState(context, userFunction); void snapshotState(StateSnapshotContext context) throws Exception { super.snapshotState(context); StreamingFunctionUtils.snapshotFunctionState(context, getOperatorStateBackend(), userFunction)
这里可以很明显的看到,在实现父类的方法的过程中,它添加了东西,就是userFunction的restore和snapshot。
看看上面这些子类,真正会被实例化的Operator。
6.StreamingFunctionUtils
1 StreamingFunctionUtils 2 void snapshotFunctionState( 3 4 StateSnapshotContext context, 5 6 OperatorStateBackend backend, 7 8 Function userFunction { 9 …… 10 while (true) { 11 12 13 14 if (trySnapshotFunctionState(context, backend, userFunction)) { 15 16 break; 17 18 } 19 20 21 22 // inspect if the user function is wrapped, then unwrap and try again if we can snapshot the inner function 23 24 if (userFunction instanceof WrappingFunction) { 25 26 userFunction = ((WrappingFunction<?>) userFunction).getWrappedFunction(); 27 28 } else { 29 30 break; 31 32 } 33 34 } 35 36 } 37 boolean trySnapshotFunctionState( 38 39 StateSnapshotContext context, 40 41 OperatorStateBackend backend, 42 43 Function userFunction) throws Exception { 44 45 if (userFunction instanceof CheckpointedFunction) { 46 47 …… 48 49 return true; 50 51 } 52 53 if (userFunction instanceof ListCheckpointed) { 54 55 …… 56 return true; 57 58 } 59 60 return false; 61 62 }
从上面可以看到,这个Util的作用,就用就是把用户实现的CheckpointedFunction和ListCheckpointed来做restore和snapshot。
二、工厂
上面从task和operator的层面说明了state保存的过程,那么保存到哪里?就由下面的三个工厂类来提供。
7.State backend
|
MemoryStateBackend |
FsStateBackend |
RocksDBStateBackend |
CheckpointStream |
MemCheckpointStreamFactory |
FsCheckpointStreamFactory |
FsCheckpointStreamFactory |
SavepointStream |
MemCheckpointStreamFactory |
FsSavepointStreamFactory |
FsSavepointStreamFactory |
KeyedStateBackend |
HeapKeyedStateBackend |
HeapKeyedStateBackend |
RocksDBKeyedStateBackend |
OperatorStateBackend |
DefaultOperatorStateBackend |
DefaultOperatorStateBackend |
DefaultOperatorStateBackend |
RocksDBStateBackend的构造函数可以传入一个AbstractStateBackend,否则默认采用FsStateBackend
可以看到,从OperatorState的角度来讲,目前Flink只有一个实现,即DefaultOperatorStateBackend,它将List风格的State保存在内存中。
从KeyedState的角度来讲,目前有两种实现,HeapKeyedStateBackend将state保存在内存中,而RocksDbKeyedStateBackend将State保存在TM本地的RocksDB中。相对而言,前者在内存中,速度会快,效率高,但一方面会限制state的大小,另一方面也会造成JVM自己的内存问题;后者在本地文件中,就会涉及序列化和反序列化,效率不及前者,但可以保存的state的大小会很大。
从checkpoint和savepoint的角度来看,Memory工厂方法都保存在内存中,显然不能在生产环境使用,而Fs工厂方法和RocksDb工厂方法,则统一都放在文件系统中,比如HDFS。
三、房子
具体存储State的目前有三种,以DefaultOperatorStateBackend作为OperatorState的例子,以及HeapKeyedStateBackend作为KeyedState的例子来看。
8.DefaultOperatorStateBackend
DefaultOperatorStateBackend Map<String, PartitionableListState<?>> registeredStates; RunnableFuture<OperatorStateHandle> snapshot( final long checkpointId, final long timestamp, final CheckpointStreamFactory streamFactory, final CheckpointOptions checkpointOptions) …… if (registeredStates.isEmpty()) { return DoneFuture.nullValue(); } …… for (Map.Entry<String, PartitionableListState<?>> entry : this.registeredStates.entrySet()) …… ListState<S> getListState(ListStateDescriptor<S> stateDescriptor)
这里截取了三个方法,其中registeredStates可以看到,其还是以map的方式在存储,snapshotState方法具体实现了刚才在AbstractStreamOperator中调用snapshotState的方法,后面的getListState提供了在用户编程中提供ListState实例的接口。
1 PartitionableListState<S> 2 /** 3 4 * The internal list the holds the elements of the state 5 6 */ 7 8 private final ArrayList<S> internalList;
由此可以看出 OperatorState都保存在内存中,本质上还是一个ArrayList。
9.HeapKeyedStateBackend
1 * @param <K> The key by which state is keyed. 2 3 HeapKeyedStateBackend<K> 4 5 /** 6 * Map of state tables that stores all state of key/value states. We store it centrally so 7 * that we can easily checkpoint/restore it. 8 * 9 * <p>The actual parameters of StateTable are {@code StateTable<NamespaceT, Map<KeyT, StateT>>} 10 * but we can't put them here because different key/value states with different types and 11 * namespace types share this central list of tables. 12 */ 13 private final HashMap<String, StateTable<K, ?, ?>> stateTables = new HashMap<>(); 14 15 <N, V> InternalValueState<N, V> createValueState( 16 TypeSerializer<N> namespaceSerializer, 17 ValueStateDescriptor<V> stateDesc){ 18 StateTable<K, N, V> stateTable = tryRegisterStateTable(namespaceSerializer, stateDesc); 19 return new HeapValueState<>(stateDesc, stateTable, keySerializer, namespaceSerializer); 20 } 21 22 23 <N, T> InternalListState<N, T> createListState( 24 TypeSerializer<N> namespaceSerializer, 25 ListStateDescriptor<T> stateDesc) 26 27 new HeapListState<> 28 <N, T> InternalReducingState<N, T> createReducingState( 29 TypeSerializer<N> namespaceSerializer, 30 ReducingStateDescriptor<T> stateDesc) 31 32 new HeapReducingState<> 33 <N, T, ACC, R> InternalAggregatingState<N, T, R> createAggregatingState( 34 TypeSerializer<N> namespaceSerializer, 35 AggregatingStateDescriptor<T, ACC, R> stateDesc) 36 37 new HeapAggregatingState<> 38 <N, T, ACC> InternalFoldingState<N, T, ACC> createFoldingState( 39 TypeSerializer<N> namespaceSerializer, 40 FoldingStateDescriptor<T, ACC> stateDesc) 41 42 new HeapFoldingState<> 43 <N, UK, UV> InternalMapState<N, UK, UV> createMapState(TypeSerializer<N> namespaceSerializer, 44 MapStateDescriptor<UK, UV> stateDesc) 45 46 new HeapMapState<> 47 RunnableFuture<KeyedStateHandle> snapshot( 48 final long checkpointId, 49 final long timestamp, 50 final CheckpointStreamFactory streamFactory, 51 CheckpointOptions checkpointOptions) 52 53 …… 54 55 if (!hasRegisteredState()) { 56 return DoneFuture.nullValue(); 57 } 58 59 ……
这里也类似,几个create方法也都提供了在用户编程中可以调用的接口,分别返回对应类型的State。snapshotState也是对AbstractStreamOperator中调用的具体实现。
四、通道
所谓通道,也就是通过用户编程,如何使得用户使用的State和上面的DefaultOperatorStateBackend和HeapKeyedStateBackend发生关联。用户编程中首先面对的就是StreamingRuntimeContext这个类。
10.StreamingRuntimeContext
1 StreamingRuntimeContext 2 3 public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) { 4 KeyedStateStore keyedStateStore = checkPreconditionsAndGetKeyedStateStore(stateProperties); 5 stateProperties.initializeSerializerUnlessSet(getExecutionConfig()); 6 return keyedStateStore.getState(stateProperties); 7 }
这里只截取了getState的方法,其他类型的State的方法类似,这里也很简单,就是看看是否能拿到KeyedStateStore,然后用其去生成State。
11.PerWindowStateStore
1 PerWindowStateStore 2 3 @Override 4 public <T> ListState<T> getListState(ListStateDescriptor<T> stateProperties) { 5 try { 6 return WindowOperator.this.getPartitionedState(window, windowSerializer, stateProperties); 7 } catch (Exception e) { 8 throw new RuntimeException("Could not retrieve state", e); 9 } 10 }
PerWindowStateStore是KeyedStateStore的一个子类,具体实现了如何去拿。其中的getPartitionedState最终还是调到了AbstractStreamOperator。
12.AbstractStreamOperator
1 AbstractStreamOperator 2 3 protected <S extends State, N> S getPartitionedState( 4 N namespace, 5 TypeSerializer<N> namespaceSerializer, 6 StateDescriptor<S, ?> stateDescriptor) throws Exception { 7 8 /* 9 TODO: NOTE: This method does a lot of work caching / retrieving states just to update the namespace. 10 This method should be removed for the sake of namespaces being lazily fetched from the keyed 11 state backend, or being set on the state directly. 12 */ 13 14 if (keyedStateStore != null) { 15 return keyedStateBackend.getPartitionedState(namespace, namespaceSerializer, stateDescriptor); 16 } else { 17 throw new RuntimeException("Cannot create partitioned state. The keyed state " + 18 "backend has not been set. This indicates that the operator is not " + 19 "partitioned/keyed."); 20 } 21 }
这里也就是一个二传手的作用,还是调回了keyedStateBackend的方法。
13.AbstractKeyedStateBackend
1 AbstractKeyedStateBackend 2 3 <N, S extends State> S getPartitionedState( 4 final N namespace, 5 final TypeSerializer<N> namespaceSerializer, 6 final StateDescriptor<S, ?> stateDescriptor) 7 8 <N, S extends State, V> S getOrCreateKeyedState( 9 final TypeSerializer<N> namespaceSerializer, 10 StateDescriptor<S, V> stateDescriptor) 11 12 // create a new blank key/value state 13 S state = stateDescriptor.bind(new StateBinder() { 14 @Override 15 public <T> ValueState<T> createValueState(ValueStateDescriptor<T> stateDesc) throws Exception { 16 return AbstractKeyedStateBackend.this.createValueState(namespaceSerializer, stateDesc); 17 } 18 19 @Override 20 public <T> ListState<T> createListState(ListStateDescriptor<T> stateDesc) throws Exception { 21 return AbstractKeyedStateBackend.this.createListState(namespaceSerializer, stateDesc); 22 } 23 24 @Override 25 public <T> ReducingState<T> createReducingState(ReducingStateDescriptor<T> stateDesc) throws Exception { 26 return AbstractKeyedStateBackend.this.createReducingState(namespaceSerializer, stateDesc); 27 } 28 29 @Override 30 public <T, ACC, R> AggregatingState<T, R> createAggregatingState( 31 AggregatingStateDescriptor<T, ACC, R> stateDesc) throws Exception { 32 return AbstractKeyedStateBackend.this.createAggregatingState(namespaceSerializer, stateDesc); 33 } 34 35 @Override 36 public <T, ACC> FoldingState<T, ACC> createFoldingState(FoldingStateDescriptor<T, ACC> stateDesc) throws Exception { 37 return AbstractKeyedStateBackend.this.createFoldingState(namespaceSerializer, stateDesc); 38 } 39 40 @Override 41 public <UK, UV> MapState<UK, UV> createMapState(MapStateDescriptor<UK, UV> stateDesc) throws Exception { 42 return AbstractKeyedStateBackend.this.createMapState(namespaceSerializer, stateDesc); 43 } 44 45 });
可以看到这里才是真正实现State生成的逻辑,在stateDescriptor.bind这里实现了一个向上绑定,还是比较微妙的。其实在真正的运行中,这里的this就会变成HeapKeyedStateBacked或者RocksDbKeyedStateBackend,它们才真正负责最后的生成。
14.StateInitializationContextImpl
1 StateInitializationContextImpl 2 public OperatorStateStore getOperatorStateStore() { 3 4 return operatorStateStore; 5 6 }
这个是OperatorState的部分,最终也会调到DefaultOperatorStateBackend的getListState方法,创建state,并注册state。
五、状态
说完了用处,存储和发生关联,这里才是State本尊的介绍。先来看看如果要实现OperatorState怎么弄。
15.CheckpointedFunction and ListCheckpointed
1 interface CheckpointedFunction { 2 3 void snapshotState(FunctionSnapshotContext context) throws Exception; 4 5 void initializeState(FunctionInitializationContext context) throws Exception; 6 7 }
1 public interface ListCheckpointed<T extends Serializable> { 2 3 List<T> snapshotState(long checkpointId, long timestamp) throws Exception; 4 5 void restoreState(List<T> state) throws Exception; 6 7 }
1 public class BufferingSink 2 3 implements SinkFunction<Tuple2<String, Integer>>, 4 5 CheckpointedFunction { 6 7 8 9 private final int threshold; 10 11 12 //pay attention here, the definition of the state 13 14 private transient ListState<Tuple2<String, Integer>> checkpointedState; 15 16 17 18 private List<Tuple2<String, Integer>> bufferedElements; 19 20 21 22 public BufferingSink(int threshold) { 23 24 this.threshold = threshold; 25 26 this.bufferedElements = new ArrayList<>(); 27 28 } 29 30 31 32 @Override 33 34 public void invoke(Tuple2<String, Integer> value) throws Exception { 35 36 bufferedElements.add(value); 37 38 if (bufferedElements.size() == threshold) { 39 40 for (Tuple2<String, Integer> element: bufferedElements) { 41 42 // send it to the sink 43 44 } 45 46 bufferedElements.clear(); 47 48 } 49 50 } 51 52 53 54 @Override 55 56 public void snapshotState(FunctionSnapshotContext context) throws Exception { 57 58 checkpointedState.clear(); 59 60 for (Tuple2<String, Integer> element : bufferedElements) { 61 62 checkpointedState.add(element); 63 64 } 65 66 } 67 68 69 70 @Override 71 72 public void initializeState(FunctionInitializationContext context) throws Exception { 73 74 //new a descriptor 75 ListStateDescriptor<Tuple2<String, Integer>> descriptor = 76 77 new ListStateDescriptor<>( 78 79 "buffered-elements", 80 81 TypeInformation.of(new TypeHint<Tuple2<String, Integer>>() {})); 82 83 //get the state by OperatorStateStor 84 85 checkpointedState = context.getOperatorStateStore().getListState(descriptor); 86 87 //unlike keyed state, flink will do the restore, user should take care of the restore of the operator state 88 89 if (context.isRestored()) { 90 91 for (Tuple2<String, Integer> element : checkpointedState.get()) { 92 93 bufferedElements.add(element); 94 95 } 96 97 } 98 99 } 100 101 }
创建一个ListStateDescriptor,然后从context中获取OperatorStateStore,也就是刚才的DefaultOperatorStateStore来具体生成状态。
这里关键的一点在于initializeState方法中的isRestored的判断,需要用户自己来决定如何恢复State。
16.RichFunction
获取任何的KeyedState都必须在RichFunction的子类中才能进行。
1 public class CountWindowAverage extends RichFlatMapFunction<Tuple2<Long, Long>, Tuple2<Long, Long>> { 2 3 4 5 /** 6 7 * The ValueState handle. The first field is the count, the second field a running sum. 8 9 */ 10 11 private transient ValueState<Tuple2<Long, Long>> sum;//the Keyed State definition 12 13 14 15 @Override 16 17 public void flatMap(Tuple2<Long, Long> input, Collector<Tuple2<Long, Long>> out) throws Exception { 18 19 20 21 // access the state value 22 23 Tuple2<Long, Long> currentSum = sum.value(); 24 25 26 27 // update the count 28 29 currentSum.f0 += 1; 30 31 32 33 // add the second field of the input value 34 35 currentSum.f1 += input.f1; 36 37 38 39 // make sure to update the state 40 41 sum.update(currentSum); 42 43 44 45 // if the count reaches 2, emit the average and clear the state 46 47 if (currentSum.f0 >= 2) { 48 49 out.collect(new Tuple2<>(input.f0, currentSum.f1 / currentSum.f0)); 50 51 sum.clear(); 52 53 } 54 55 } 56 57 58 59 @Override 60 61 public void open(Configuration config) { 62 //new a descriptor according to the Keyed State 63 64 ValueStateDescriptor<Tuple2<Long, Long>> descriptor = 65 66 new ValueStateDescriptor<>( 67 68 "average", // the state name 69 70 TypeInformation.of(new TypeHint<Tuple2<Long, Long>>() {}), // type information 71 72 Tuple2.of(0L, 0L)); // default value of the state, if nothing was set 73 74 //using the context to get the Keyed State 75 sum = getRuntimeContext().getState(descriptor); 76 77 } 78 79 } 80 81 82 83 // this can be used in a streaming program like this (assuming we have a StreamExecutionEnvironment env) 84 85 env.fromElements(Tuple2.of(1L, 3L), Tuple2.of(1L, 5L), Tuple2.of(1L, 7L), Tuple2.of(1L, 4L), Tuple2.of(1L, 2L)) 86 87 .keyBy(0) 88 89 .flatMap(new CountWindowAverage()) 90 91 .print(); 92 93 94 95 // the printed output will be (1,4) and (1,5)
这里的Open方法也类似,都是定义一个descriptor,然后直接在context上获取对应的State。
17.State type
|
Managed State |
Raw State |
Keyed State |
RichFunction |
1,2 |
OperatorState |
CheckpointedFunction |
|
ListCheckpointed |
1. AbstractStreamOperator.initializeState(StateInitializationContext context)
2. AbstractStreamOperator.snapshotState(StateSnapshotContext context)
Keyed State:
ValueState<T>
:保持一个可以更新和获取的值(每个Key一个value),可以用来update(T)更新,用来T value()获取。
ListState<T>
: 保持一个值的列表,用add(T)
或者 addAll(List<T>)来添加,用Iterable<T> get()来获取。
ReducingState<T>
: 保持一个值,这个值是状态的很多值的聚合结果,接口和ListState类似,但是可以用相应的ReduceFunction来聚合。
AggregatingState<IN, OUT>
:保持很多值的聚合结果的单一值,与ReducingState相比,不同点在于聚合类型可以和元素类型不同,提供AggregateFunction来实现聚合。
FoldingState<T, ACC>
: 与AggregatingState类似,除了使用FoldFunction进行聚合。
MapState<UK, UV>
: 保持一组映射,可以将kv放进这个状态,使用put(UK, UV)
or putAll(Map<UK, UV>)添加,或者使用get(UK)获取。
18.FlinkKafkaConsumerBase
1 FlinkKafkaConsumerBase 2 3 final void initializeState(FunctionInitializationContext context) throws Exception { 4 OperatorStateStore stateStore = context.getOperatorStateStore(); 5 ListState<Tuple2<KafkaTopicPartition, Long>> oldRoundRobinListState = 6 stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME); 7 this.unionOffsetStates = stateStore.getUnionListState(new ListStateDescriptor<>( 8 OFFSETS_STATE_NAME, 9 TypeInformation.of(new TypeHint<Tuple2<KafkaTopicPartition, Long>>() {}))); 10 11 …… 12 13 final void snapshotState(FunctionSnapshotContext context){ 14 15 …… 16 17 unionOffsetStates.add(Tuple2.of(subscribedPartition.getKey(), subscribedPartition.getValue())); 18 ……
作为source和operator state的示例。
19.ElasticsearchSinkBase
1 abstract class ElasticsearchSinkBase 2 3 @Override 4 public void initializeState(FunctionInitializationContext context) throws Exception { 5 // no initialization needed 6 } 7 8 @Override 9 public void snapshotState(FunctionSnapshotContext context) throws Exception { 10 checkErrorAndRethrow(); 11 12 if (flushOnCheckpoint) { 13 do { 14 bulkProcessor.flush(); 15 checkErrorAndRethrow(); 16 } while (numPendingRequests.get() != 0); 17 } 18 }
In all the subclass of this, no one override these two method.
作为sink和operatorstate的实例。
六、恢复
20.Restore
20.1 Introduction
无状态的重分布,直接数据重分布就可。有了状态,就需要先把状态存下来,然后再拆分,以一定的策略来重分布。
20.2 OperatorState
目前flink官方只实现了如下的重分布方案。
RoundRobinOperatorStateRepartitioner
20.3 KeyedState
20.3.1 key distribution
hash(key) mod parallelism
对keyedState而言,只是跟随key的分布即可。但是为了提高效率,引入了KeyGroup的概念。
20.3.2 KeyGroup
20.3.2.1 Introduce of KeyGroup
Without KeyGroup, the keys in the subtask are wrote sequentially, which is not easy to rescale on parallelism adjust. KeyGroup may have a range of keys, and can be assigned to subtask. Then when checkpointing, keys within the KeyGroup will be wrote together, when rescaling, KeyState of the keys within the same KeyGroup will be read sequeatially. The number of KeyGroup is the upper limit for parallelism, and the number of KeyGroup must be determined before the job is started and cannot be changed after the fact.
20.3.2.2 Determine of KeyGroup
setMaxParallelism,the lower limit is 0<, and the upper limit is <=32768.
KeyGroup的数量和maxParallelism的值是一致的。
七、其他
21.Misc
1.能否在非keyby的语句后面直接接一个RichFunction来使用KeyedState?
在构造StreamGraph的过程中,会判断当前的transform是否有keySelector,如果有,就会在streamNode上设置keySerializer。
然后在Operator的初始化过程中,会判断是否有KeySerializer,如果有,才会生成KeyedStateBackend。
后续利用KeyedstateBackend来生成相应的KeyedState。
如果没有keyby,直接实现一个RichMapFunction,则可以判断出没有KeyedStateBackend,在运行时会抛出异常。
2.究竟KeyedState中的ListState和OperatorState中的ListState是不是一回事?
首先来看ListState是个啥
public interface ListState<T> extends MergingState<T, Iterable<T>> {}
显然它只是一个空接口,用命名的方式来增加一种约束说明。下面是它的继承图。
可以看到最初的基类以及中间的父类,分布都通过命名的方式来增加约束,其中State只定义了clear方法,AppendingState定义了get和add方法,MergingState的意义和ListState的类似。
然后我们看DefaultOperatorStateBackend中定义了生成state的接口,
1 <S> ListState<S> getListState(ListStateDescriptor<S> stateDescriptor)
的确,它返回的是一个ListState,但别忘了,这只是一个接口,实际返回是什么了?是PartitionableListState<S>,那就来看看他的继承关系:
可以看到,他实现了ListState这个接口,具体的代码也比较简单,内部以一个ArrayList来存储泛型S类型的State。
好,回过头来,我们看看KeyedState的逻辑,看看最外面的接口KeyedStateStore的声明方式:
1 @PublicEvolving 2 public interface KeyedStateStore { 3 @PublicEvolving 4 <T> ListState<T> getListState(ListStateDescriptor<T> stateProperties);
看到这里,我们看到,声明的出参和OperatorState的是一致,可是我们也知道这个只是个空接口,实际如何了?
还得回到HeapKeyedStateBackend来看下,
@Override public <N, T> InternalListState<N, T> createListState( TypeSerializer<N> namespaceSerializer, ListStateDescriptor<T> stateDesc) throws Exception { …… return new HeapListState<>(stateDesc, stateTable, keySerializer, namespaceSerializer); }
中间部分我们都略去,看到这里其实变了,实际函数的出参是InternalListState,可以理解它是ListState的一个子类,但最终返回的是一个HeapListState,同样,来看看它的继承图:
从这个图上也能看到,HeapListState实现了InternalListState进而间接实现了ListState,但其实这两个接口都是空接口,都只是一种声明,没有任何的动作或者方法包含在里面。
所以,回到问题上,KeyedState的ListState和OperatorState的ListState是一回事吗?
还是不好回答,从语法上来讲,的确是一回事,因为就是同一个类型啊;可是在实际运行当中,前面也看到了,还是有很大不同的两个类。
那肯定又有人问了,PartitionableListState和HeapListState有什么区别?如果直接回答,一个是用在OperatorState中的,一个是用在KeyedState中,估计你肯定不满意。孔子曰,神经病。