Writable Interface

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * Created by user on 16/3/17.
 */
public interface Writable {
    void write(DataOutput out) throws IOException;
    void readFields(DataInput in) throws  IOException;
}

Writable主要定义两个方法，一个writing its state to a DataOutput binary stream(我就不理解，为什么这里要用这个state),另一个就是从一个输入二进制流中读取它的状态。

  
IntWritable writable = new IntWritable(163);
public static byte[] serialize(Writable writable) throws IOException {
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DataOutputStream dataOut = new DataOutputStream(out);
        writable.write(dataOut);
        dataOut.close();
        return out.toByteArray();
    }

这里主要是测试IntWritable的序列化方式。

public static byte[] deserialize(Writable writable,byte[] bytes) throws  IOException{
    ByteArrayInputStream in = new ByteArrayInputStream(bytes);
    DataInputStream dataIn = new DataInputStream(in);
    writable.readFields(dataIn);
    dataIn.close();
    return bytes;
}

这个主要是反序列化IntWritable

public interface  WritableComparable<T> extends  Writable,Comparable<T> {}

public interface  RawComparator<T> extends  Comparable<T> {
    public int compare(byte[] b1,int s1, int l1, byte[] b2, int s2, int l2);
}

IntWritable实现了WritableComparable接口，它是Writable和Comparable的子接口。比较类型对于MapReduce是比较关键的，在排序阶段需要key和其他key进行比较。hadoop提供RawComparator（继承Comparator)
这个接口的实现允许比较从流中读取的记录，而不需要反序列化他们变成对象。避免了生成对象的开销。

Text class使用一个可变长的int来存储string编码的数量，因此一个Text最大的字节是2GB(为什么?)，Text使用标准的UTF-8,所以它和Java String class有一些区别，Text的charAt()返回的是一个unicode编码(String返回的是一个char),find()类似于与String 的indexOf()

BytesWritable封装了一个二进制数组，他的序列化格式是一个4字节的字段来标志字节的长度,如：

BytesWritable b = new BytesWritable(new byte[]{3,5});
byte[] bytes = serialize(b);
assertThat(StringUtils.byteToHexString(bytes),is("000000020305"));

posted @ 2016-03-17 11:45 dalu610 阅读(247) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

dalu610

Writable Interface

公告