使用confluent schema registry将protobuf schema转换成avro schema
confleunt提供了一些方法,可以将protobuf schema转换成avro schema,用于支持将kafka protobuf序列化的message落盘成avro格式的文件
1.引入依赖
<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> <repository> <id>confluent</id> <url>https://packages.confluent.io/maven/</url> </repository> </repositories> <dependencies> <!--pb--> <dependency> <groupId>com.google.protobuf</groupId> <artifactId>protobuf-java</artifactId> <version>3.21.7</version> </dependency> <!--confluent--> <dependency> <groupId>io.confluent</groupId> <artifactId>kafka-schema-registry</artifactId> <version>7.1.1</version> </dependency> <dependency> <groupId>io.confluent</groupId> <artifactId>kafka-protobuf-provider</artifactId> <version>7.1.1</version> </dependency> <dependency> <groupId>io.confluent</groupId> <artifactId>kafka-connect-avro-data</artifactId> <version>7.1.1</version> </dependency> <dependency> <groupId>io.confluent</groupId> <artifactId>kafka-connect-protobuf-converter</artifactId> <version>7.1.1</version> </dependency> <dependency> <groupId>io.confluent</groupId> <artifactId>kafka-connect-avro-data</artifactId> <version>7.1.1</version> </dependency> <!--kafka--> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>connect-api</artifactId> <version>1.1.0</version> </dependency> </dependencies>
2.定义protobuf schema
定义一个protobuf schema
syntax = "proto3"; package com.acme; message MyRecord { string f1 = 1; OtherRecord f2 = 2; } message OtherRecord { int32 other_id = 1; }
编译java代码
protoc -I=./ --java_out=./src/main/java ./src/main/proto/other.proto
得到schema的java代码
3.将protobuf schema转换成avro schema
confluent schema registry在处理处理protobuf,avro,json格式的数据的时候,会统一先将其转换成connect schema格式的数据,然后再将其写成parquet,avro等具体的文件格式
import com.acme.Other; import io.confluent.connect.avro.AvroData; import io.confluent.connect.avro.AvroDataConfig; import io.confluent.connect.protobuf.ProtobufData; import io.confluent.kafka.schemaregistry.protobuf.ProtobufSchema; import io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaUtils; import org.apache.kafka.connect.data.SchemaAndValue; public class ProtobufToAvro { public static void main(String[] args) { // 初始化protobuf定义的类 Other.MyRecord obj = Other.MyRecord.newBuilder().build(); // 获取pb schema ProtobufSchema pbSchema = ProtobufSchemaUtils.getSchema(obj); ProtobufData protobufData = new ProtobufData(); // SchemaAndValue result = protobufData.toConnectData(pbSchema, obj); // System.out.println(result); AvroDataConfig avroDataConfig = new AvroDataConfig.Builder() .with(AvroDataConfig.SCHEMAS_CACHE_SIZE_CONFIG, 1) .with(AvroDataConfig.CONNECT_META_DATA_CONFIG, false) .with(AvroDataConfig.ENHANCED_AVRO_SCHEMA_SUPPORT_CONFIG, true) .build(); AvroData avroData = new AvroData(avroDataConfig); // 先将protobuf schema转换成connect schema,然后再转换成avro schema org.apache.avro.Schema avroSchema = avroData.fromConnectSchema(protobufData.toConnectSchema(pbSchema)); System.out.println(avroSchema); } }
转换的avro schema输出如下
{ "type":"record", "name":"MyRecord", "fields":[ { "name":"f1", "type":[ "null", "string" ], "default":null }, { "name":"f2", "type":[ "null", { "type":"record", "name":"OtherRecord", "fields":[ { "name":"other_id", "type":[ "null", "int" ], "default":null } ] } ], "default":null } ] }
注意:confluent在具体实现的时候,比较严谨,在protobuf的uint32(0 到 2^32 -1)的时候,会统一转换成long(-2^63 ~ 2^63-1),不会出现越界的情况,参考源码
https://github.com/confluentinc/schema-registry/blob/v7.1.1/protobuf-converter/src/main/java/io/confluent/connect/protobuf/ProtobufData.java#L1485
转换实现参考源码
https://github.com/confluentinc/schema-registry/blob/v7.1.1/avro-data/src/test/java/io/confluent/connect/avro/AdditionalAvroDataTest.java
本文只发表于博客园和tonglin0325的博客,作者:tonglin0325,转载请注明原文链接:https://www.cnblogs.com/tonglin0325/p/4642622.html