Avro学习笔记——avro-tools工具

1.下载avro-tools.jar

https://archive.apache.org/dist/avro/avro-1.10.1/java/

avro-tools.jar常用命令:Working with Apache Avro files in Amazon S3  

也可以查看help

java -jar ./avro-tools-1.10.1.jar help
Version 1.10.1 of Apache Avro
Copyright 2010-2015 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (https://www.apache.org/).
----------------
Available tools:
    canonical  Converts an Avro Schema to its canonical form
          cat  Extracts samples from files
      compile  Generates Java code for the given schema.
       concat  Concatenates avro files without re-compressing.
        count  Counts the records in avro files or folders
  fingerprint  Returns the fingerprint for the schemas.
   fragtojson  Renders a binary-encoded Avro datum as JSON.
     fromjson  Reads JSON records and writes an Avro data file.
     fromtext  Imports a text file into an avro data file.
      getmeta  Prints out the metadata of an Avro data file.
    getschema  Prints out schema of an Avro data file.
          idl  Generates a JSON schema from an Avro IDL file
 idl2schemata  Extract JSON schemata of the types from an Avro IDL file
       induce  Induce schema/protocol from Java class/interface via reflection.
   jsontofrag  Renders a JSON-encoded Avro datum as binary.
       random  Creates a file with randomly generated instances of a schema.
      recodec  Alters the codec of a data file.
       repair  Recovers data from a corrupt Avro Data file
  rpcprotocol  Output the protocol of a RPC service
   rpcreceive  Opens an RPC Server and listens for one message.
      rpcsend  Sends a single RPC message.
       tether  Run a tethered mapreduce job.
       tojson  Dumps an Avro data file as JSON, record per line or pretty.
       totext  Converts an Avro data file to a text file.
     totrevni  Converts an Avro data file to a Trevni file.
  trevni_meta  Dumps a Trevni file's metadata as JSON.
trevni_random  Create a Trevni file filled with random instances of a schema.
trevni_tojson  Dumps a Trevni file as JSON.

2.查看avro文件的schema

java -jar ./avro-tools-1.10.1.jar getschema ./xxxx.avro

3.查看avro文件内容的json格式

java -jar ./avro-tools-1.10.1.jar tojson ./nova_ads_access_log-0-0008589084.avro | less

4.使用avro-tools编译java代码

编译avro IDL文件,参考

https://avro.apache.org/docs/current/gettingstartedjava.html
https://yanbin.blog/convert-apache-avro-to-parquet-format-in-java/

定义schema文件kst.avsc

{
  "namespace": "com.linkedin.haivvreo",
  "name": "test_serializer",
  "type": "record",
  "fields": [
    { "name":"string1", "type":"string" },
    { "name":"int1", "type":"int" },
    { "name":"tinyint1", "type":"int" },
    { "name":"smallint1", "type":"int" },
    { "name":"bigint1", "type":"long" },
    { "name":"boolean1", "type":"boolean" },
    { "name":"float1", "type":"float" },
    { "name":"double1", "type":"double" },
    { "name":"list1", "type":{"type":"array", "items":"string"} },
    { "name":"map1", "type":{"type":"map", "values":"int"} },
    { "name":"struct1", "type":{"type":"record", "name":"struct1_name", "fields": [
          { "name":"sInt", "type":"int" }, { "name":"sBoolean", "type":"boolean" }, { "name":"sString", "type":"string" } ] } },
    { "name":"union1", "type":["float", "boolean", "string"] },
    { "name":"enum1", "type":{"type":"enum", "name":"enum1_values", "symbols":["BLUE","RED", "GREEN"]} },
    { "name":"nullableint", "type":["int", "null"] },
    { "name":"bytes1", "type":"bytes" },
    { "name":"fixed1", "type":{"type":"fixed", "name":"threebytes", "size":3} }
  ] }

编译avro IDL文件

java -jar ./src/main/resources/avro-tools-1.10.1.jar compile schema ./src/main/avro/kst.avsc ./src/main/java

 

这时编译出来的java代码中,IDL的string类型实际上是CharSequence

如果想编译成string,则可以添加-string参数

java -jar ./src/main/resources/avro-tools-1.10.1.jar compile -string schema ./src/main/avro/kst.avsc ./src/main/java

 

posted @ 2016-03-20 17:12  tonglin0325  阅读(1204)  评论(0编辑  收藏  举报