Avro之二:入门demo
一、使用avro-maven插件为avsc文件生成对应的java类:
在项目的pom.xml中增加依赖及插件如下:
<dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.8.1</version> </dependency> ... <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.6</source> <target>1.6</target> </configuration> </plugin> <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.8.1</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> <configuration> <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory> <outputDirectory>${project.basedir}/src/main/java/</outputDirectory> </configuration> </execution> </executions> </plugin> </plugins> </build>
执行mvn的install命令后,提示:
[INFO] Final Memory: 16M/217M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.avro:avro-maven-plugin:1.8.1:schema (default) on project study: neither sourceDirectory: D:\fvp-workspace\study\src\main\avro or testSourceDirectory: D:\fvp-workspace\study\src\test\avro are directories -> [Help 1] [ERROR]
需要注意下,需要手动在${project.basedir}/src/main和${project.basedir}/src/test下建立avro文件夹。avro文件夹就是后面存放Avro的schema文件了(*.avsc)。
1.1、定义schema
使用JSON为Avro定义schema。schema由基本类型(null,boolean, int, long, float, double, bytes 和string)和复杂类型(record, enum, array, map, union, 和fixed)组成。例如,以下定义一个user的schema,在main目录下创建一个avro目录,然后在avro目录下新建文件 user.avsc :
{"namespace": "com.sf.study.avro", "type": "record", "name": "User", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]} ] }
如IDE的截图所示:
1.2、用schema生成类文件
在这里,因为使用avro插件,所以,直接输入以下命令,maven插件会自动帮我们生成类文件:
mvn clean install
然后在刚才配置的目录下就会生成相应的类,如下:
如果不使用插件,也可以使用avro-tools来生成:
1.3、使用前面生成的类
在前面,类文件已经创建好了,接下来,可以使用刚才自动生成的类来创建用户了:
package com.sf.study.avro; public class CreateUserTest { public static void main(String[] args) { User user1 = new User(); user1.setName("zhangsan"); user1.setFavoriteNumber(256); // Leave favorite color null // Alternate constructor User user2 = new User("lisi", 7, "red"); // Construct via builder User user3 = User.newBuilder() .setName("wangwu") .setFavoriteColor("blue") .setFavoriteNumber(null) .build(); } }
1.4、序列化
把前面创建的用户序列化并存储到磁盘文件:
// Serialize user1, user2 and user3 to disk DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class); DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter); try { dataFileWriter.create(user1.getSchema(), new File("users.avro")); dataFileWriter.append(user1); dataFileWriter.append(user2); dataFileWriter.append(user3); dataFileWriter.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }
这里,我们是序列化user到文件users.avro
1.5、反序列化
接下来,我们对序列化后的数据进行反序列化:
public static void unserialize() { try { // Deserialize Users from disk DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class); DataFileReader<User> dataFileReader; dataFileReader = new DataFileReader<User>(new File("users.avro"), userDatumReader); User user = null; while (dataFileReader.hasNext()) { // Reuse user object by passing it to next(). This saves us from // allocating and garbage collecting many objects for files with // many items. user = dataFileReader.next(user); System.out.println(user); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
输出结果为:
{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
{"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}