output-operations-on-dstreams

Finally, this can be further optimized by reusing connection objects across multiple RDDs/batches. One can maintain a static pool of connection objects than can be reused as RDDs of multiple batches are pushed to the external system, thus further reducing the overheads.

Scala
Python
dstream.foreachRDD { rdd =>
  rdd.foreachPartition { partitionOfRecords =>
    // ConnectionPool is a static, lazily initialized pool of connections
    val connection = ConnectionPool.getConnection()
    partitionOfRecords.foreach(record => connection.send(record))
    ConnectionPool.returnConnection(connection)  // return to the pool for future reuse
  }
}

http://spark.apache.org/docs/1.6.1/streaming-programming-guide.html#output-operations-on-dstreams

posted @ 2017-08-29 09:28  牵牛花  阅读(162)  评论(0编辑  收藏  举报