Hadoop中的序列化

Hadoop并没有采用Java的序列化,而是引入了它自己的系统。

Hadoop中定义了两个序列化相关的接口:Writable接口(hadoop)和Comparable接口(Java),这两个接口可以合成一个接口WritableComparable.

Writable接口,所有实现了Writable接口的类都可以被序列化和反序列化;

Comparable接口,主要是通过字节流比较序列化的对象以提高比较效率;

public interface Writable {
   void write(DataOutput out) throws IOException;  //序列化

   void readFields(DataInput in) throws IOException; //反序列化
}

//DoubleWritable的Comprator片段

public static class Comparator extends WritableComparator {
    public Comparator() {
      super(DoubleWritable.class);
    }

//其中s1和s2表示各自字节数组的起始位置,l1和l2表示各自字节数组在起始位置后的长度

    public int compare(byte[] b1, int s1, int l1,
                       byte[] b2, int s2, int l2) {
      double thisValue = readDouble(b1, s1);
      double thatValue = readDouble(b2, s2);
      return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
  }

  static {                                        //注册该类所用的Comparator
    WritableComparator.define(DoubleWritable.class, new Comparator());
  }

}

 

Hadoop中的序列化类型:

实现了WritableComparable接口的类:

  • 基础:BooleanWritable | ByteWritable

  • 数字:IntWritable | VIntWritable | FloatWritable | LongWritable | VLongWritable | DoubleWritable

  • 高级:NullWritable | Text | BytesWritable | MDSHash | ObjectWritable | GenericWritable

仅实现了Writable接口的类:

  • 数组:ArrayWritable | TwoDArrayWritable

  • 映射:AbstractMapWritable | MapWritable | SortedMapWritable

  • Note:VIntWritable和VLongWritable 这两个是同一個实现,将数字转化成变长的字節流,数字越小,字符流越短。

    Text 经常使用,序列化为字符流长度 + String的UTF8编码,最大2G。

posted @ 2012-12-17 17:12  shileiw  阅读(1276)  评论(0编辑  收藏  举报