Avro数据序列化与反序列化
Avro提供了两种序列化和反序列化的方式,一种是通过Schema文件来生成代码的方式,一种是不生成代码的通用方式。
下面通过一个简单的例子来进行演示:
1. 配置pom文件
<dependencies> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.9.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.9.1</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> <configuration> <sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory> <outputDirectory>${project.basedir}/src/main/java/</outputDirectory> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
2.需要定义一个模式文件 person.avsc 用于说明要序列化的数据的结构
{ "namespace":"com.zpark", "type":"record", "name":"Person", "fields":[ {"name":"id","type":"string"}, {"name":"name","type":"string"}, {"name":"age","type":["int","null"]} ] }
在编写模式文件时用到Avro提供的数据类型,可查阅官网 http://avro.apache.org/docs/current/spec.html
3. 通过使用avro的maven插件,根据person.avsc文件生成Person类
4. 根据生成的代码进行序列化和反序列化的测试
@Test public void testSerializing() throws Exception{ Person person = new Person("001","zhangsan",23); DatumWriter dw = new SpecificDatumWriter<Person>(Person.class); DataFileWriter<Person> dfw = new DataFileWriter<>(dw); dfw.create(person.getSchema(),new File("d://tmp/person.avro")) ; dfw.append(person); dfw.close(); } @Test public void testDeSerializing() throws Exception{ DatumReader<Person> dr = new SpecificDatumReader<Person>(Person.class) ; DataFileReader<Person> dfr = new DataFileReader<Person>(new File("d://tmp/person.avro"),dr) ; Person person = null ; while (dfr.hasNext()){ person = dfr.next() ; System.out.println(person); } }
以上是通过代码生成的方式来完成序列化和反序列化,下面我们使用通用的方式进行序列化和反序列化,这种方式更加灵活:
@Test public void testGenericSerializing() throws Exception{ InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("person.avsc") ; Schema schema = new Schema.Parser().parse(in) ; GenericRecord person = new GenericData.Record(schema) ; person.put("id","001") ; person.put("name","zhangsan"); person.put("age",44); DatumWriter<GenericRecord> dw = new GenericDatumWriter<>(schema) ; DataFileWriter df = new DataFileWriter(dw) ; df.create(schema,new File("d:\\tmp\\person1.avro")) ; df.append(person); df.close(); } @Test public void testGenericDeSerializing() throws Exception{ InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("person.avsc") ; Schema schema = new Schema.Parser().parse(in) ; GenericRecord person = null ; DatumReader<GenericRecord> dr = new GenericDatumReader<>(schema); DataFileReader<GenericRecord> dfr = new DataFileReader(new File("d://tmp/person1.avro"),dr) ; while (dfr.hasNext()){ person = dfr.next(); System.out.println(person); } }