Flink 状态编程

  流式计算分为无状态和有状态两种情况。无状态的计算观察每个独立事件,并根据最后一个事件输出结果。例如,流处理应用程序从传感器接收温度读数,并在温度超过 90 度时发出警告。有状态的计算则会基于多个事件输出结果。简单来说,有状态的计算不仅处理当前的数据,还要和以前接收到的数据进行比较、聚合等操作。所以需要一个状态来对之前的数据进行记录。
方式一:  
  在如下的代码中,数据先进行keyBy,然后进行process,在处理中记录了上一次数据的温度状态。这种进行分区后维护的状态也叫键控状态(keyed state)。
object StateTest {

  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

    val socketStream = env.socketTextStream("hadoop102", 7777)

    val dataStream: DataStream[SensorReading] = socketStream.map(d => {
      val arr = d.split(",")
      SensorReading(arr(0).trim, arr(1).trim.toLong, arr(2).toDouble)
    })
      .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SensorReading](Time.seconds(1)) {
        override def extractTimestamp(t: SensorReading): Long = t.timestamp * 1000
      })

    //温度变动超过10度报警
    val processStream = dataStream.keyBy(_.id)
      .process(new TempChangeAlert(10.0))

    dataStream.print("data stream")

    processStream.print("alert stream")



    env.execute("test")
  }

}

class TempChangeAlert(threshold: Double) extends KeyedProcessFunction[String, SensorReading, String] {

  //维护一个状态
  lazy val lastTemp = getRuntimeContext.getState(new ValueStateDescriptor[Double]("tempState", classOf[Double]))

  override def processElement(value: SensorReading,
                              ctx: KeyedProcessFunction[String, SensorReading, String]#Context,
                              out: Collector[String]): Unit = {
    //取出上一个温度
    val lastTemperature = lastTemp.value()

    val diff = (lastTemperature - value.temperature).abs
    if (diff > threshold) {
      out.collect(value.id + "," + lastTemperature + "," + value.temperature)
    }
    lastTemp.update(value.temperature)
  }
}

 

方式二:  

  如果用不到ProcessFunction中的时间服务等内容,可以简单使用富函数实现同样的功能。关键代码如下

//温度变动超过10度报警
val processStream = dataStream.keyBy(_.id)
  .flatMap(new TempChangeAlert2(10.0))

  自定义类继承富函数类

class TempChangeAlert2(threshold:Double) extends RichFlatMapFunction[SensorReading,(String,Double,Double)]{

  private var lastTemp: ValueState[Double] = _

  override def open(parameters: Configuration): Unit = {
    lastTemp = getRuntimeContext.getState(new ValueStateDescriptor[Double]("tempState2", classOf[Double]))
  }

  override def flatMap(value: SensorReading, out: Collector[(String, Double, Double)]): Unit = {
    //取出上一个温度
    val lastTemperature = lastTemp.value()

    val diff = (lastTemperature - value.temperature).abs
    if (diff > threshold) {
      out.collect((value.id,lastTemperature,value.temperature))
    }

    lastTemp.update(value.temperature)
  }
  
}

 

方式三:

  直接使用带状态的flatMapWithState方法

val alertStream3 = dataStream.keyBy(_.id)
  .flatMapWithState[(String,Double,Double),Double]{
    //入参1:stream中的数据
    //入参2:上一次的状态
    //出参1:输出的内容
    //出参2:更新后的状态
    case (input:SensorReading,None) => (List.empty,Some(input.temperature))
    case (input:SensorReading,lastTemp:Some[Double])=>{
      val diff = (input.temperature-lastTemp.get).abs
      if (diff>10.0){
        (List((input.id,lastTemp.get,input.temperature)),Some(input.temperature))
      }else{
        (List.empty,Some(input.temperature))
      }
    }
  }

  

  

 

 

posted @ 2020-05-17 15:05  地中有山  阅读(361)  评论(0编辑  收藏  举报