Hadoop 序列化
Hadoop提供了org.apache.hadoop.io.serializer.Serialization接口用于序列化的抽象
public interface Serialization<T> { boolean accept(Class<?> c);// 判断是否支持指定类的序列化 Serializer<T> getSerializer(Class<T> c);// 返回序列化器
Deserializer<T> getDeserializer(Class<T> c); }// 返回返序列化器
Hadoop默认提供了JavaSerialization、WritableSerialization以及Avro的实现,这三个实现类的accept的实现方式一致,都是根据对象是否有指定的注解来判断是否支持序列化与反序列化
public boolean accept(Class<?> c) { return Serializable.class.isAssignableFrom(c); }
Serializer和Deserializer分别实现了open(用于准备资源)、close(用于关闭资源)和序列化/反序列化接口
public interface Serializer<T> { void open(OutputStream out) throws IOException; void serialize(T t) throws IOException; void close() throws IOException; }
public interface Deserializer<T> { void open(InputStream in) throws IOException; T deserialize(T t) throws IOException; void close() throws IOException; }
Hadoop实现了SerializationFactory工厂用于生产具体的实现类
public SerializationFactory(Configuration conf) { super(conf);
//根据配置io.serializations创建序列化实现类,如果没有配置,则为Hadoop默认配置的三个实现 for (String serializerName : conf.getTrimmedStrings( CommonConfigurationKeys.IO_SERIALIZATIONS_KEY, new String[]{WritableSerialization.class.getName(), AvroSpecificSerialization.class.getName(), AvroReflectSerialization.class.getName()})) { add(conf, serializerName); } } // 通过反射创建具体的实现类,并配置相关参数private void add(Configuration conf, String serializationName) { try { Class<? extends Serialization> serializionClass = (Class<? extends Serialization>) conf.getClassByName(serializationName); serializations.add((Serialization) ReflectionUtils.newInstance(serializionClass, getConf())); } catch (ClassNotFoundException e) { LOG.warn("Serialization class not found: ", e); } }