Flink解析kafka canal未压平数据为message报错
canal使用非flatmessage方式获取mysql bin log日志发至kafka比直接发送json效率要高很多,数据发到kafka后需要实时解析为json,这里可以使用strom或者flink,公司本来就是使用strom解析,但是在吞吐量上有瓶颈,优化空间不大。所以试一试通过flink来做。
非flatmessage需要使用特定的反序列化方式来处理为Message对象,所以这里需要自定义一个类
1 /** 2 * 反序列化canal binlog 3 * 4 * @author @ 2019-02-20 5 * @version 1.0.0 6 */ 7 @PublicEvolving 8 public class MessageDeserializationSchema implements KeyedDeserializationSchema<Message> { 9 10 private static final long serialVersionUID = -678988040385271953L; 11 private MessageDeserializer mesDesc; 12 13 @Override 14 public Message deserialize(byte[] messageKey, byte[] message, String topic, int partition, long offset) throws IOException { 15 try { 16 if (mesDesc == null) { 17 mesDesc = new MessageDeserializer(); 18 } 19 Message result = mesDesc.deserialize(topic, message); 20 //result.setMetaData(topic, partition, offset); 21 return result; 22 } catch (Exception e) { 23 System.out.println(e); 24 } 25 return null; 26 } 27 28 @Override 29 public boolean isEndOfStream(Message nextElement) { 30 return false; 31 } 32 33 @Override 34 public TypeInformation<Message> getProducedType() { 35 return getForClass(Message.class); 36 } 37 }
然后就可以获取到DataStream[Message],但是在做算子操作的时候就报错了,意思是不支持kryo序列化
com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: props_ (com.alibaba.otter.canal.protocol.CanalEntry$Header) header_ (com.alibaba.otter.canal.protocol.CanalEntry$Entry) entries (com.alibaba.otter.canal.protocol.Message) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:730) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:730) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:231) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:577) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398) at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:89) at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:154) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:665) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) ... 29 more
参考官方文档,需要注册类的序列化方式:https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/custom_serializers.html
//message 不支持kryo序列化 不然在map flatmap的时候报错
env.getConfig.addDefaultKryoSerializer(classOf[Message], classOf[StringSerializer])
如果在算子之间会有其他对象传输的话,也同样需要注册。最后通过测试,flink解析的量大概在单个solt 1W+/s 左右。