Hadoop 序列化

Hadoop提供了org.apache.hadoop.io.serializer.Serialization接口用于序列化的抽象

public interface Serialization<T> {
  
  boolean accept(Class<?> c);// 判断是否支持指定类的序列化
 
  Serializer<T> getSerializer(Class<T> c);// 返回序列化器
  Deserializer<T> getDeserializer(Class<T> c); }// 返回返序列化器

Hadoop默认提供了JavaSerialization、WritableSerialization以及Avro的实现,这三个实现类的accept的实现方式一致,都是根据对象是否有指定的注解来判断是否支持序列化与反序列化

  public boolean accept(Class<?> c) {
    return Serializable.class.isAssignableFrom(c);
  }

Serializer和Deserializer分别实现了open(用于准备资源)、close(用于关闭资源)和序列化/反序列化接口

public interface Serializer<T> {
  void open(OutputStream out) throws IOException;
  void serialize(T t) throws IOException;
  void close() throws IOException;
}
public interface Deserializer<T> {
  void open(InputStream in) throws IOException;
  T deserialize(T t) throws IOException;
  void close() throws IOException;
}

Hadoop实现了SerializationFactory工厂用于生产具体的实现类

  public SerializationFactory(Configuration conf) {
    super(conf);
//根据配置io.serializations创建序列化实现类,如果没有配置,则为Hadoop默认配置的三个实现
for (String serializerName : conf.getTrimmedStrings( CommonConfigurationKeys.IO_SERIALIZATIONS_KEY, new String[]{WritableSerialization.class.getName(), AvroSpecificSerialization.class.getName(), AvroReflectSerialization.class.getName()})) { add(conf, serializerName); } } // 通过反射创建具体的实现类,并配置相关参数private void add(Configuration conf, String serializationName) { try { Class<? extends Serialization> serializionClass = (Class<? extends Serialization>) conf.getClassByName(serializationName); serializations.add((Serialization) ReflectionUtils.newInstance(serializionClass, getConf())); } catch (ClassNotFoundException e) { LOG.warn("Serialization class not found: ", e); } }
posted @ 2019-02-25 18:20  TylerJin  阅读(424)  评论(0编辑  收藏  举报