Hadoop中的序列化

Hadoop并没有采用Java的序列化，而是引入了它自己的系统。

Hadoop中定义了两个序列化相关的接口：Writable接口（hadoop）和Comparable接口(Java)，这两个接口可以合成一个接口WritableComparable.

Writable接口，所有实现了Writable接口的类都可以被序列化和反序列化；

Comparable接口，主要是通过字节流比较序列化的对象以提高比较效率；

public interface Writable {
   void write(DataOutput out) throws IOException; //序列化

   void readFields(DataInput in) throws IOException; //反序列化
}

//DoubleWritable的Comprator片段

public static class Comparator extends WritableComparator {
    public Comparator() {
      super(DoubleWritable.class);
    }

//其中s1和s2表示各自字节数组的起始位置，l1和l2表示各自字节数组在起始位置后的长度

    public int compare(byte[] b1, int s1, int l1,
                       byte[] b2, int s2, int l2) {
      double thisValue = readDouble(b1, s1);
      double thatValue = readDouble(b2, s2);
      return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
}

static {                                        //注册该类所用的Comparator
    WritableComparator.define(DoubleWritable.class, new Comparator());
}

}

Hadoop中的序列化类型：

实现了WritableComparable接口的类：

基础：BooleanWritable | ByteWritable

数字：IntWritable | VIntWritable | FloatWritable | LongWritable | VLongWritable | DoubleWritable

高级：NullWritable | Text | BytesWritable | MDSHash | ObjectWritable | GenericWritable

仅实现了Writable接口的类：

数组：ArrayWritable | TwoDArrayWritable

映射：AbstractMapWritable | MapWritable | SortedMapWritable

Note：VIntWritable和VLongWritable 这两个是同一個实现，将数字转化成变长的字節流，数字越小，字符流越短。

Text 经常使用，序列化为字符流长度 + String的UTF8编码，最大2G。

posted @ 2012-12-17 17:12 shileiw 阅读(1276) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

jacob-wang

Hadoop中的序列化

公告