FlinkSQL - Table API Connector 之 Upsert Kafka
1. 简介
The Upsert Kafka connector allows for reading data from and writing data into Kafka topics in the upsert fashion.
Upsert Kafka 连接器允许以 upsert 方式从 Kafka 主题读取数据并将数据写入数据。
As a source, the upsert-kafka connector produces a changelog stream, where each data record represents an update or delete event. More precisely, the value in a data record is interpreted as an UPDATE of the last value for the same key, if any (if a corresponding key doesn’t exist yet, the update will be considered an INSERT). Using the table analogy, a data record in a changelog stream is interpreted as an UPSERT aka INSERT/UPDATE because any existing row with the same key is overwritten. Also, null values are interpreted in a special way: a record with a null value represents a “DELETE”.
作为源,upsert-kafka 连接器生成一个变更日志流,其中每个数据记录代表一个更新或删除事件。更准确地说,数据记录中的值被解释为同一键的最后一个值的更新,如果有的话(如果相应的键还不存在,则更新将被视为插入)。使用表类比,更改日志流中的数据记录被解释为 UPSERT 或 INSERT/UPDATE,因为具有相同键的任何现有行都将被覆盖。此外,空值以特殊方式解释:具有空值的记录表示“删除”。
As a sink, the upsert-kafka connector can consume a changelog stream. It will write INSERT/UPDATE_AFTER data as normal Kafka messages value, and write DELETE data as Kafka messages with null values (indicate tombstone for the key). Flink will guarantee the message ordering on the primary key by partition data on the values of the primary key columns, so the update/deletion messages on the same key will fall into the same partition.
作为接收器,upsert-kafka 连接器可以使用变更日志流。它将 INSERT/UPDATE_AFTER 数据作为普通的 Kafka 消息值写入,并将 DELETE 数据作为具有空值的 Kafka 消息写入(指示键的墓碑)。 Flink 将同一主键的变更value分配到同一分区来保证主键上的消息顺序,所以同一个键上的更新/删除消息会落入同一个分区。
2. 实践过程
maven依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
upsert-kafka作为维表,则只会将最新版本的维表数据与事实流关联。
// 把kafka中数据 映射成输入维表 - 实时变更的维表
tableEnv.executeSql(
"CREATE TABLE dim_source (" +
" id STRING," +
" name STRING," +
" update_time TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL, " +
" WATERMARK FOR update_time AS update_time, " +
" PRIMARY KEY (id) NOT ENFORCED" +
") WITH (" +
" 'connector' = 'upsert-kafka'," +
" 'topic' = 'flinksqldim'," +
" 'properties.bootstrap.servers' = 'ip:port'," +
" 'properties.group.id' = 'flinksqlDim'," +
" 'key.format' = 'json'," +
" 'value.format' = 'json')"
);
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人