Hadoop 序列化
Hadoop提供了Writable以提供序列化功能,write方法用于将数据写入流中,readFields方法用于从流中读取数据
public interface Writable { void write(DataOutput out) throws IOException; void readFields(DataInput in) throws IOException; }
Hadoop对于Java常用类都实现了对应的Writable方法,以BooleanWritable为例,对于基础的类型,只是简单的写入与读取
@Override public void readFields(DataInput in) throws IOException { value = in.readBoolean(); } @Override public void write(DataOutput out) throws IOException { out.writeBoolean(value); }
BooleanWritable实现了WritableComparable接口,为其提供比较的能力
同时Hadoop定义了RawComparator接口,以提供比较流中未被反序列化的数据的能力,提高比较的效率
public interface RawComparator<T> extends Comparator<T> { public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2); }
Hadoop提供了对RawComparator的基础实现WritableComparator
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { try { buffer.reset(b1, s1, l1); // parse key1 key1.readFields(buffer); buffer.reset(b2, s2, l2); // parse key2 key2.readFields(buffer); buffer.reset(null, 0, 0); // clean up reference } catch (IOException e) { throw new RuntimeException(e); } return compare(key1, key2); // compare them }
而在BooleanWritable类中,实现了继承WritableComparator的Comparator,并通过static方法注册到WritableComparator中
public static class Comparator extends WritableComparator { public Comparator() { super(BooleanWritable.class); } @Override public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return compareBytes(b1, s1, l1, b2, s2, l2); } } // 注册 static { WritableComparator.define(BooleanWritable.class, new Comparator()); }
可以通过WritableComparator.get方法根据具体的获取到对应比较器
public static WritableComparator get( Class<? extends WritableComparable> c, Configuration conf) { WritableComparator comparator = comparators.get(c); if (comparator == null) { forceInit(c); comparator = comparators.get(c); if (comparator == null) { comparator = new WritableComparator(c, conf, true); } } ReflectionUtils.setConf(comparator, conf); return comparator; }
Hadoop提供了通用的ObjectWritable类
private Class declaredClass; private Object instance; private Configuration conf;
declaredClass:对象的类
instance:对象实例
conf:对象运行时的配置
ObjectWritable可以用于Hadoop的远程方法调用以及序列化不同的对象到同一个字段,但是ObjectWritable作为一种通用实现,会将类声明作为字符串添加到每一个Key-Value对中,造成较大的性能损失,因此Hadoop提供了GenericWritable
对于少量类型数据,可以使用GenericWritable,预先缓存下所有的类型,通过类型编号对类型进行查找,以降低网络传输,但是GenericWritable也有明显的缺陷,即在类型不确定或者类型特别多的时候并不适用