alpakka-kafka(1)-producer
alpakka项目是一个基于akka-streams流处理编程工具的scala/java开源项目,通过提供connector连接各种数据源并在akka-streams里进行数据处理。alpakka-kafka就是alpakka项目里的kafka-connector。对于我们来说:可以用alpakka-kafka来对接kafka,使用kafka提供的功能。或者从另外一个角度讲:alpakka-kafka就是一个用akka-streams实现kafka功能的scala开发工具。
alpakka-kafka提供了kafka的核心功能:producer、consumer,分别负责把akka-streams里的数据写入kafka及从kafka中读出数据并输入到akka-streams里。用akka-streams集成kafka的应用场景通常出现在业务集成方面:在一项业务A中产生一些业务操作指令写入kafka,然后通过kafka把指令传送给另一项业务B,业务B从kafka中获取操作指令并进行相应的业务操作。如:有两个业务模块:收货管理和库存管理,一方面收货管理向kafka写入收货记录。另一头库存管理从kafka中读取收货记录并更新相关库存数量记录。注意,这两项业务是分别操作的。在alpakka中,实际的业务操作基本就是在akka-streams里的数据处理(transform),其实是典型的CQRS模式:读写两方互不关联,写时不管受众是谁,如何使用、读者不关心谁是写方。这里的写和读两方分别代表kafka里的producer和consumer。
本篇我们先介绍alpakka-kafka的producer功能及其使用方法。如前所述:alpakka是用akka-streams实现了kafka-producer功能。alpakka提供的producer也就是akka-streams的一种组件,可以与其它的akka-streams组件组合形成更大的akka-streams个体。构建一个producer需要先完成几个配件类构成:
1、producer-settings配置:alpakka-kafka在reference.conf里的akka.kafka.producer配置段落提供了足够支持基本运作的默认producer配置。用户可以通过typesafe config配置文件操作工具来灵活调整配置
2、de/serializer序列化工具:alpakka-kafka提供了String类型的序列化/反序列化函数,可以直接使用
4、bootstrap-server:一个以逗号分隔的kafka-cluster节点ip清单文本
下面是一个具体的例子:
implicit val system = ActorSystem("kafka_sys") val bootstrapServers = "localhost:9092" val config = system.settings.config.getConfig("akka.kafka.producer") val producerSettings = ProducerSettings(config, new StringSerializer, new StringSerializer) .withBootstrapServers(bootstrapServers)
这里使用ActorSystem只是为了读取.conf文件里的配置,还没有使用任何akka-streams组件。akka.kafka.producer配置段落在alpakka-kafka的reference.conf里提供了默认配置,不需要在application.conf里重新定义。
alpakka-kafka提供了一个最基本的producer,非akka-streams组件,sendProducer。下面我们示范一下sendProducer的使用和效果:
import akka.actor.ActorSystem import akka.kafka.scaladsl.{Consumer, SendProducer} import org.apache.kafka.clients.producer.{ProducerRecord, RecordMetadata} import akka.kafka._ import org.apache.kafka.common.serialization._ import scala.concurrent.duration._ import scala.concurrent.{Await, Future} object SendProducerDemo extends App { implicit val system = ActorSystem("kafka_sys") implicit val executionContext = system.dispatcher val bootstrapServers = "localhost:9092" val config = system.settings.config.getConfig("akka.kafka.producer") val producerSettings = ProducerSettings(config, new StringSerializer, new StringSerializer) .withBootstrapServers(bootstrapServers) val producer = SendProducer(producerSettings) val topic = "greatings" val lstfut: Seq[Future[RecordMetadata]] = (100 to 200).reverse .map(_.toString) .map(value => new ProducerRecord[String, String](topic, s"hello-$value")) .map(msg => producer.send(msg)) val futlst = Future.sequence(lstfut) Await.result(futlst, 2.seconds) scala.io.StdIn.readLine() producer.close() system.terminate() }
以上示范用sendProducer向kafka写入100条hello消息。使用的是集合遍历,没有使用akka-streams的Source。为了检验具体效果,我们可以使用kafka提供的一些手工指令,如下:
\w> ./kafka-topics --create --topic greatings --bootstrap-server localhost:9092 Created topic greatings. \w> ./kafka-console-consumer --topic greatings --bootstrap-server localhost:9092 hello-100 hello-101 hello-102 hello-103 hello-104 hello-105 hello-106 ...
既然producer代表写入功能,那么在akka-streams里就是Sink或Flow组件的功能了。下面这个例子是producer Sink组件plainSink的示范:
import akka.Done import akka.actor.ActorSystem import akka.kafka.scaladsl._ import akka.kafka._ import akka.stream.scaladsl._ import org.apache.kafka.clients.producer.ProducerRecord import org.apache.kafka.common.serialization._ import scala.concurrent._ import scala.concurrent.duration._ object plain_sink extends App { implicit val system = ActorSystem("kafka_sys") val bootstrapServers = "localhost:9092" val config = system.settings.config.getConfig("akka.kafka.producer") val producerSettings = ProducerSettings(config, new StringSerializer, new StringSerializer) .withBootstrapServers(bootstrapServers) implicit val executionContext = system.dispatcher val topic = "greatings" val done: Future[Done] = Source(1 to 100) .map(_.toString) .map(value => new ProducerRecord[String, String](topic, s"hello-$value")) .runWith(Producer.plainSink(producerSettings)) Await.ready(done,3.seconds) scala.io.StdIn.readLine() system.terminate() }
这是一个典型的akka-streams应用实例,其中Producer.plainSink就是一个akka-streams Sink组件。
以上两个示范都涉及到构建一个ProducerRecord类型并将之写入kafka。ProducerRecord是一个基本的kafka消息类型:
public ProducerRecord(String topic, K key, V value) { this(topic, null, null, key, value, null); }
topic是String类型,key, value 是 Any 类型的。 alpakka-kafka在ProducerRecord之上又拓展了一个复杂点的消息类型ProducerMessage.Envelope类型:
1 2 3 4 5 6 7 8 9 10 11 12 13 | sealed trait Envelope[K, V, +PassThrough] { def passThrough: PassThrough def withPassThrough[PassThrough2](value: PassThrough2): Envelope[K, V, PassThrough2] } final case class Message[K, V, +PassThrough]( record: ProducerRecord[K, V], passThrough: PassThrough ) extends Envelope[K, V, PassThrough] { override def withPassThrough[PassThrough2](value: PassThrough2): Message[K, V, PassThrough2] = copy(passThrough = value) } |
ProducerMessage.Envelope增加了个PassThrough参数,用来与消息一道传递额外的元数据。alpakka-kafka streams组件使用这个消息类型作为流元素,最终把它转换成一或多条ProducerRecord写入kafka。如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | object EventMessages { //一对一条ProducerRecord def createMessage[KeyType,ValueType,PassThroughType]( topic: String, key: KeyType, value: ValueType, passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = { val single = ProducerMessage.single( new ProducerRecord[KeyType,ValueType](topic,key,value), passThrough ) single } //一对多条ProducerRecord def createMultiMessage[KeyType,ValueType,PassThroughType] ( topics: List[String], key: KeyType, value: ValueType, passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = { import scala.collection.immutable val msgs = topics.map { topic => new ProducerRecord(topic,key,value) }.toSeq val multi = ProducerMessage.multi( msgs, passThrough ) multi } //只传递通过型元数据 def createPassThroughMessage[KeyType,ValueType,PassThroughType]( topic: String, key: KeyType, value: ValueType, passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = { ProducerMessage.passThrough(passThrough) } } |
flexiFlow是一个alpakka-kafka Flow组件,流入ProducerMessage.Evelope,流出Results类型:
1 2 3 | def flexiFlow[K, V, PassThrough]( settings: ProducerSettings[K, V] ): Flow[Envelope[K, V, PassThrough], Results[K, V, PassThrough], NotUsed] = { ... } |
Results类型定义如下:
1 2 3 4 5 6 7 | final case class Result[K, V, PassThrough] private ( metadata: RecordMetadata, message: Message[K, V, PassThrough] ) extends Results[K, V, PassThrough] { def offset: Long = metadata.offset() def passThrough: PassThrough = message.passThrough } |
也就是说flexiFlow可以返回写入kafka后kafka返回的操作状态数据。我们再看看flexiFlow的使用案例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | import akka.kafka.ProducerMessage._ import akka.actor.ActorSystem import akka.kafka.scaladsl._ import akka.kafka.{ProducerMessage, ProducerSettings} import akka.stream.scaladsl.{Sink, Source} import org.apache.kafka.clients.producer.ProducerRecord import org.apache.kafka.common.serialization.StringSerializer import scala.concurrent._ import scala.concurrent.duration._ object flexi_flow extends App { implicit val system = ActorSystem( "kafka_sys" ) val bootstrapServers = "localhost:9092" val config = system.settings.config.getConfig( "akka.kafka.producer" ) val producerSettings = ProducerSettings(config, new StringSerializer, new StringSerializer) .withBootstrapServers(bootstrapServers) // needed for the future flatMap/onComplete in the end implicit val executionContext = system.dispatcher val topic = "greatings" val done = Source(1 to 100) .map { number => val value = number.toString EventMessages.createMessage(topic, "key" ,value,number) } .via(Producer.flexiFlow(producerSettings)) .map { case ProducerMessage.Result(metadata, ProducerMessage.Message(record, passThrough)) => s "${metadata.topic}/${metadata.partition} ${metadata.offset}: ${record.value}" case ProducerMessage.MultiResult(parts, passThrough) => parts .map { case MultiResultPart(metadata, record) => s "${metadata.topic}/${metadata.partition} ${metadata.offset}: ${record.value}" } .mkString( ", " ) case ProducerMessage.PassThroughResult(passThrough) => s "passed through" } .runWith(Sink. foreach (println(_))) Await.ready(done,3.seconds) scala.io.StdIn.readLine() system.terminate() } object EventMessages { def createMessage[KeyType,ValueType,PassThroughType]( topic: String, key: KeyType, value: ValueType, passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = { val single = ProducerMessage.single( new ProducerRecord[KeyType,ValueType](topic,key,value), passThrough ) single } def createMultiMessage[KeyType,ValueType,PassThroughType] ( topics: List[String], key: KeyType, value: ValueType, passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = { import scala.collection.immutable val msgs = topics.map { topic => new ProducerRecord(topic,key,value) }.toSeq val multi = ProducerMessage.multi( msgs, passThrough ) multi } def createPassThroughMessage[KeyType,ValueType,PassThroughType]( topic: String, key: KeyType, value: ValueType, passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = { ProducerMessage.passThrough(passThrough) } } |
producer除向kafka写入与业务相关的业务事件或业务指令外还会向kafka写入当前消息读取的具体位置offset,所以alpakka-kafka的produce可分成两种类型:上面示范的plainSink, flexiFlow只向kafka写业务数据。还有一类如commitableSink还包括了把消息读取位置offset写入commit的功能。如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 | val control = Consumer .committableSource(consumerSettings, Subscriptions.topics(topic1, topic2)) .map { msg => ProducerMessage.single( new ProducerRecord(targetTopic, msg.record.key, msg.record.value), msg.committableOffset ) } .toMat(Producer.committableSink(producerSettings, committerSettings))(DrainingControl.apply) .run() control.drainAndShutdown() |
如上所示,committableSource从kafka读取业务消息及读取位置committableOffsset,然后Producer.committableSink把业务消息和offset再写入kafka。
下篇讨论我们再具体介绍consumer。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架
2017-02-20 FunDA(11)- 数据库操作的并行运算:Parallel data processing