Avro数据序列化与反序列化

Avro提供了两种序列化和反序列化的方式,一种是通过Schema文件来生成代码的方式,一种是不生成代码的通用方式。

下面通过一个简单的例子来进行演示:

1. 配置pom文件

 <dependencies>

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.9.1</version>
        </dependency>

        
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.avro</groupId>
                <artifactId>avro-maven-plugin</artifactId>
                <version>1.9.1</version>
                <executions>
                    <execution>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>schema</goal>
                        </goals>
                        <configuration>
                            <sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory>
                            <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>


        </plugins>
    </build>

 

2.需要定义一个模式文件 person.avsc 用于说明要序列化的数据的结构

{
    "namespace":"com.zpark",
    "type":"record",
    "name":"Person",
    "fields":[
        {"name":"id","type":"string"},
        {"name":"name","type":"string"},
        {"name":"age","type":["int","null"]}
    ]
}

在编写模式文件时用到Avro提供的数据类型,可查阅官网  http://avro.apache.org/docs/current/spec.html

3. 通过使用avro的maven插件,根据person.avsc文件生成Person类

4. 根据生成的代码进行序列化和反序列化的测试

 @Test
    public void testSerializing() throws Exception{
        Person person = new Person("001","zhangsan",23);
        DatumWriter dw = new SpecificDatumWriter<Person>(Person.class);


        DataFileWriter<Person> dfw = new DataFileWriter<>(dw);

        dfw.create(person.getSchema(),new File("d://tmp/person.avro")) ;
        dfw.append(person);
        dfw.close();
    }

    @Test
    public void testDeSerializing() throws Exception{
        DatumReader<Person> dr = new SpecificDatumReader<Person>(Person.class) ;
        DataFileReader<Person> dfr = new DataFileReader<Person>(new File("d://tmp/person.avro"),dr) ;
        Person person = null ;
        while (dfr.hasNext()){
            person = dfr.next() ;
            System.out.println(person);
        }


    }

以上是通过代码生成的方式来完成序列化和反序列化,下面我们使用通用的方式进行序列化和反序列化,这种方式更加灵活:

@Test
    public void testGenericSerializing() throws Exception{
        InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("person.avsc") ;
        Schema schema = new Schema.Parser().parse(in) ;

        GenericRecord person = new GenericData.Record(schema) ;
        person.put("id","001") ;
        person.put("name","zhangsan");
        person.put("age",44);

        DatumWriter<GenericRecord> dw = new GenericDatumWriter<>(schema) ;

        DataFileWriter  df = new DataFileWriter(dw) ;
        df.create(schema,new File("d:\\tmp\\person1.avro")) ;
        df.append(person);

        df.close();
    }

    @Test
    public void testGenericDeSerializing() throws Exception{
            InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("person.avsc") ;
            Schema schema = new Schema.Parser().parse(in) ;

            GenericRecord person = null ;

            DatumReader<GenericRecord> dr = new GenericDatumReader<>(schema);


            DataFileReader<GenericRecord> dfr = new DataFileReader(new File("d://tmp/person1.avro"),dr) ;
            while (dfr.hasNext()){
                person = dfr.next();
                System.out.println(person);
            }

        }

 

posted @ 2020-02-06 23:21  杭州胡欣  阅读(1310)  评论(0编辑  收藏  举报