Flink -- Keyed State
/* <pre>{@code * DataStream<MyType> stream = ...; * KeyedStream<MyType> keyedStream = stream.keyBy("id"); * * keyedStream.map(new RichMapFunction<MyType, Tuple2<MyType, Long>>() { * * private ValueState<Long> count; * * public void open(Configuration cfg) { * state = getRuntimeContext().getState( * new ValueStateDescriptor<Long>("count", LongSerializer.INSTANCE, 0L)); * } * * public Tuple2<MyType, Long> map(MyType value) { * long count = state.value() + 1; * state.update(value); * return new Tuple2<>(value, count); * } * }); * }</pre> */
在使用keyed state时,首先需要初始化,这里以ValueState为例子,
state = getRuntimeContext().getState(new ValueStateDescriptor<Long>("count", LongSerializer.INSTANCE, 0L));
1. 每个state需要一个标识,ValueStateDescriptor,包含唯一名字,Class,和default值
public ValueStateDescriptor(String name, Class<T> typeClass, T defaultValue)
2. getState,向stateBackend注册keyed state,
StreamingRuntimeContext
public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) { KeyedStateStore keyedStateStore = checkPreconditionsAndGetKeyedStateStore(stateProperties); stateProperties.initializeSerializerUnlessSet(getExecutionConfig()); return keyedStateStore.getState(stateProperties); }
调用keyedStateStore.getState(stateProperties)
KeyedStateStore其实就是KeyedStateBackend的封装
public class DefaultKeyedStateStore implements KeyedStateStore { private final KeyedStateBackend<?> keyedStateBackend; private final ExecutionConfig executionConfig; @Override public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) { try { stateProperties.initializeSerializerUnlessSet(executionConfig); return getPartitionedState(stateProperties); } catch (Exception e) { throw new RuntimeException("Error while getting state", e); } }
最终是调用到,keyedStateBackend
private <S extends State> S getPartitionedState(StateDescriptor<S, ?> stateDescriptor) throws Exception { return keyedStateBackend.getPartitionedState( VoidNamespace.INSTANCE, VoidNamespaceSerializer.INSTANCE, stateDescriptor); }
AbstractKeyedStateBackend
public <N, S extends State> S getPartitionedState( final N namespace, final TypeSerializer<N> namespaceSerializer, final StateDescriptor<S, ?> stateDescriptor) throws Exception { final S state = getOrCreateKeyedState(namespaceSerializer, stateDescriptor); final InternalKvState<N> kvState = (InternalKvState<N>) state; return state; }
getOrCreateKeyedState
public <N, S extends State, V> S getOrCreateKeyedState( final TypeSerializer<N> namespaceSerializer, StateDescriptor<S, V> stateDescriptor) throws Exception { InternalKvState<?> existing = keyValueStatesByName.get(stateDescriptor.getName()); if (existing != null) { @SuppressWarnings("unchecked") S typedState = (S) existing; return typedState; //如果keyValueStatesByName有直接返回 } // create a new blank key/value state S state = stateDescriptor.bind(new StateBinder() { @Override public <T> ValueState<T> createValueState(ValueStateDescriptor<T> stateDesc) throws Exception { return AbstractKeyedStateBackend.this.createValueState(namespaceSerializer, stateDesc); } }); InternalKvState<N> kvState = (InternalKvState<N>) state; keyValueStatesByName.put(stateDescriptor.getName(), kvState); //把新产生的state注册到keyValueStatesByName
3. ValueState读写,value,update
看下ValueState的定义,
HeapValueState
public class HeapValueState<K, N, V> extends AbstractHeapState<K, N, V, ValueState<V>, ValueStateDescriptor<V>> implements InternalValueState<N, V> { /** * Creates a new key/value state for the given hash map of key/value pairs. * * @param stateDesc The state identifier for the state. This contains name * and can create a default state value. * @param stateTable The state tab;e to use in this kev/value state. May contain initial state. */ public HeapValueState( ValueStateDescriptor<V> stateDesc, StateTable<K, N, V> stateTable, TypeSerializer<K> keySerializer, TypeSerializer<N> namespaceSerializer) { super(stateDesc, stateTable, keySerializer, namespaceSerializer); } @Override public V value() { final V result = stateTable.get(currentNamespace); if (result == null) { return stateDesc.getDefaultValue(); } return result; } @Override public void update(V value) { if (value == null) { clear(); return; } stateTable.put(currentNamespace, value); } }
都是通过StateTable,
CopyOnWriteStateTable
@Override public S get(N namespace) { return get(keyContext.getCurrentKey(), namespace); } @Override public boolean containsKey(N namespace) { return containsKey(keyContext.getCurrentKey(), namespace); } @Override public void put(N namespace, S state) { put(keyContext.getCurrentKey(), namespace, state); }
可以看到value不光是记录一个value,而是记录key,namespace,value的关系
其中key是通过,keyContext.getCurrentKey()去到的
keyContext就是KeyedStateBackend
在StreamInputProcessor.processInput的时候,会通过
streamOperator.setKeyContextElement1(record);
把当前的key设置到KeyedStateBackend
这就是为何,对state的操作都是按key隔离开的