Flink 状态编程
流式计算分为无状态和有状态两种情况。无状态的计算观察每个独立事件,并根据最后一个事件输出结果。例如,流处理应用程序从传感器接收温度读数,并在温度超过 90 度时发出警告。有状态的计算则会基于多个事件输出结果。简单来说,有状态的计算不仅处理当前的数据,还要和以前接收到的数据进行比较、聚合等操作。所以需要一个状态来对之前的数据进行记录。
方式一:
在如下的代码中,数据先进行keyBy,然后进行process,在处理中记录了上一次数据的温度状态。这种进行分区后维护的状态也叫键控状态(keyed state)。
object StateTest { def main(args: Array[String]): Unit = { val env = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val socketStream = env.socketTextStream("hadoop102", 7777) val dataStream: DataStream[SensorReading] = socketStream.map(d => { val arr = d.split(",") SensorReading(arr(0).trim, arr(1).trim.toLong, arr(2).toDouble) }) .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SensorReading](Time.seconds(1)) { override def extractTimestamp(t: SensorReading): Long = t.timestamp * 1000 }) //温度变动超过10度报警 val processStream = dataStream.keyBy(_.id) .process(new TempChangeAlert(10.0)) dataStream.print("data stream") processStream.print("alert stream") env.execute("test") } } class TempChangeAlert(threshold: Double) extends KeyedProcessFunction[String, SensorReading, String] { //维护一个状态 lazy val lastTemp = getRuntimeContext.getState(new ValueStateDescriptor[Double]("tempState", classOf[Double])) override def processElement(value: SensorReading, ctx: KeyedProcessFunction[String, SensorReading, String]#Context, out: Collector[String]): Unit = { //取出上一个温度 val lastTemperature = lastTemp.value() val diff = (lastTemperature - value.temperature).abs if (diff > threshold) { out.collect(value.id + "," + lastTemperature + "," + value.temperature) } lastTemp.update(value.temperature) } }
方式二:
如果用不到ProcessFunction中的时间服务等内容,可以简单使用富函数实现同样的功能。关键代码如下
//温度变动超过10度报警 val processStream = dataStream.keyBy(_.id) .flatMap(new TempChangeAlert2(10.0))
自定义类继承富函数类
class TempChangeAlert2(threshold:Double) extends RichFlatMapFunction[SensorReading,(String,Double,Double)]{ private var lastTemp: ValueState[Double] = _ override def open(parameters: Configuration): Unit = { lastTemp = getRuntimeContext.getState(new ValueStateDescriptor[Double]("tempState2", classOf[Double])) } override def flatMap(value: SensorReading, out: Collector[(String, Double, Double)]): Unit = { //取出上一个温度 val lastTemperature = lastTemp.value() val diff = (lastTemperature - value.temperature).abs if (diff > threshold) { out.collect((value.id,lastTemperature,value.temperature)) } lastTemp.update(value.temperature) } }
方式三:
直接使用带状态的flatMapWithState方法
val alertStream3 = dataStream.keyBy(_.id) .flatMapWithState[(String,Double,Double),Double]{ //入参1:stream中的数据 //入参2:上一次的状态 //出参1:输出的内容 //出参2:更新后的状态 case (input:SensorReading,None) => (List.empty,Some(input.temperature)) case (input:SensorReading,lastTemp:Some[Double])=>{ val diff = (input.temperature-lastTemp.get).abs if (diff>10.0){ (List((input.id,lastTemp.get,input.temperature)),Some(input.temperature)) }else{ (List.empty,Some(input.temperature)) } } }