Hadoop源码之Configuration

Configuration做为Hadoop的一个基础功能承担着重要的责任，为Yarn、HSFS、MapReduce、NFS、调度器等提供参数的配置、配置文件的分布式传输(实现了Writable接口)等重要功能。

　　Hadoop的加载配置文件的功能没有采用Java自己的java.util.Properties，也没有采用Apache Jakarta Commons中的Commons Configuration，而是自己单独实现了一个自己的Configuration类：org.apache.hadoop.conf.Configuration，在hadoop-common-project子工程中。它的实现子类有：HdfsConfiguration、YarnConfiguration、JobConf、NfsConfiguration、FairSchedulerConfiguration等。

　　一、Configuration类重要属性讲解

　　A、quitemode：boolean类型，配置信息加载过程中，是否处于安静模式，即有一些信息不会被记录，默认是true；

　　B、resources：ArrayList<Resource>类型，Resource是Configuration的内部类，有两个属性Object resource和String name；resources是一个对象数组，用于存储有关包含配置信息的对象；

　　C、finalParameters：Set<String>类型，所有被声明为final的变量集合，声明为final就表示不能被后续覆盖；

　　D、loadDefaults：boolean类型，是否加载默认配置；

　　E、REGISTRY：WeakHashMap<Configuration，Object>类型，用于多个对象的相关配置的注册及对它们进行管理，记录了所有的Configuration；

　　F、defaultResources：CopyOnWriteArrayList<String>类型，用于存储默认的配置资源名或路径；

　　G、properties：java内置的Properties类型，存储所有配置信息，KV值；

　　H、overlay：Properties类型，是用户设置的而不是通过对资源解析得到的；

　　I、classloader：ClassLoader类型，主要用于加载指定的类或者加载相关资源；

　　J、updatingResource：HashMap<String, String[]>类型，存储最近加载或修改的属性；

　　K、VAR_PATTERN：静态Pattern类型，用于正则匹配，Pattern.compile("\\$\\{[^\\}\\$\u0020]+\\}")，正则表达式中$、{、}都是保留字，所以需要用"\"进行转义，“\\$\\{”用于匹配${key}中的key前面的"${"；最后的"\\}"用于匹配key后的"}"；中间部分"[^\\}\\$\u0020]+"用于匹配属性扩展键，将匹配除了"$"、"}"和空格(\u0020指的是空格)以外的所有字符，还有"+"出现至少1次。

　　L、MAX_SUBST：静态int类型，默认值是20，MAX_SUBST是设定对带有环境变量的值所能够深入解析的层次数，超出这个最大的层数的值将不能够解析。

　　二、Configuration的初始化

　　A、静态代码块，用于加载默认的配置资源

    //是一个静态初始化块，用于加载默认的配置资源。
    static {
        //print deprecation warning if hadoop-site.xml is found in classpath
        ClassLoader cL = Thread.currentThread().getContextClassLoader();
        if (cL == null) {
            cL = Configuration.class.getClassLoader();
        }
        if (cL.getResource("hadoop-site.xml") != null) {
            LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
                    "Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, "
                    + "mapred-site.xml and hdfs-site.xml to override properties of " +
                    "core-default.xml, mapred-default.xml and hdfs-default.xml " +
                    "respectively");
        }
        addDefaultResource("core-default.xml");
        addDefaultResource("core-site.xml");
    }

　　以上代码会在调用构造方法之前执行，会加载core-default.xml和core-site.xml两个文件，Configuration的子类也会同样加载这两个文件。

　　B、三个构造方法，Configuration()、Configuration(boolean loadDefaults)、Configuration(Configuration other)，第一个会调用第二个参数是true；第二个确定是否加载默认配置；第三个就是将指定的Configuration对象重新复制一份。

　　三、加载资源

　　可以通过Configuration的addResource()方法或者静态方法addDefaultResource()(设置了loadDefaults标志)来添加资源到Configuration中。但是add之后资源并不会立即被加载，hadoop的Configuration被设计成了“懒加载”，即在需要时才会被加载。在add之后会调用reloadConfiguration()方法清空properties和finalParameters。

　　A、addDefaultResource方法，这是一个静态方法。通过这个方法可以添加系统的默认资源。在HDFS中会被用来加载hdfs-default.xml和hdfs-site.xml；在MapReduce中会被用来加载mapred-default.cml和mapred-site.xml，可以在相关的Configuration子类中找到相应地静态代码块。

   /**
     * Add a default resource. Resources are loaded in the order of the resources
     * added.
     *
     * @param name file name. File should be present in the classpath.
     */
    public static synchronized void addDefaultResource(String name) {
        if (!defaultResources.contains(name)) {
            defaultResources.add(name);
            for (Configuration conf : REGISTRY.keySet()) {
                if (conf.loadDefaults) {
                    conf.reloadConfiguration();
                }
            }
        }
    }

　　此方法通过遍历REGISTRY中得元素并在元素(Configuration对象)上调用reloadConfiguration方法，就会触发资源的重新加载。

　　B、addResource方法，该方法有6种形式：

　　public void addResource(Path file)　　

　　public void addResource(String name)

　　public void addResource(URL url)

　　public void addResource(InputStream in)

　　public void addResource(InputStream in, String name)

　　public void addResource(Configuration conf)

　　也就是可以add的形式：可以是一个输入流、HDFS文件路径、WEB URL、CLASSPATH资源、以及Configuration对象。这些方法都会将参数封装成Resource对象后，传递给addResourceObject方法并调用该方法。在addResourceObject方法中会将Resource对象加入resources中并调用reloadConfiguration方法。代码如下：

public synchronized void reloadConfiguration() {
        properties = null;                            // trigger reload
        finalParameters.clear();                      // clear site-limits
    }

    private synchronized void addResourceObject(Resource resource) {
        resources.add(resource);                      // add to resources
        reloadConfiguration();
    }

　　四、get*取值

　　A、get*方法，get*方法一般有两个参数，一个是需要获取属性的名字，另外一个是默认值，以便找不到值时就返回默认值。这些方法都会先通过getTrimmed(name)去掉两端的空格，然后调用get(String name)方法取值。get方法中经过处理过期键之后会调用substituteVars消除属性扩展情况。在调用substituteVars之前会先调用getProps方法，这个方法在发现properties为null时会通过loadResources加载配置资源。

protected synchronized Properties getProps() {
        if (properties == null) {
            properties = new Properties();
            HashMap<String, String[]> backup =
                    new HashMap<String, String[]>(updatingResource);
            loadResources(properties, resources, quietmode);
            if (overlay != null) {
                properties.putAll(overlay);
                for (Map.Entry<Object, Object> item : overlay.entrySet()) {
                    String key = (String) item.getKey();
                    updatingResource.put(key, backup.get(key));
                }
            }
        }
        return properties;
    }

loadResources相关调用代码如下：

private void loadResources(Properties properties,
                               ArrayList<Resource> resources,
                               boolean quiet) {
        if (loadDefaults) { //加载默认配置资源
            for (String resource : defaultResources) {
                loadResource(properties, new Resource(resource), quiet);
            }

            //support the hadoop-site.xml as a deprecated case
            if (getResource("hadoop-site.xml") != null) {
                loadResource(properties, new Resource("hadoop-site.xml"), quiet);
            }
        }

        for (int i = 0; i < resources.size(); i++) {    //其他配置资源
            Resource ret = loadResource(properties, resources.get(i), quiet);
            if (ret != null) {
                resources.set(i, ret);
            }
        }
    }

    private Resource loadResource(Properties properties, Resource wrapper, boolean quiet) {
        String name = UNKNOWN_RESOURCE;
        try {
            Object resource = wrapper.getResource();
            name = wrapper.getName();
            //得到用于创建DOM解析器的工厂
            DocumentBuilderFactory docBuilderFactory
                    = DocumentBuilderFactory.newInstance();
            //ignore all comments inside the xml file忽略XML中得注释
            docBuilderFactory.setIgnoringComments(true);

            //allow includes in the xml file提供对XML命名空间的支持
            docBuilderFactory.setNamespaceAware(true);
            try {
                //设置XInclude处理状态为true，即允许XInclude机制
                docBuilderFactory.setXIncludeAware(true);
            } catch (UnsupportedOperationException e) {
                LOG.error("Failed to set setXIncludeAware(true) for parser "
                                + docBuilderFactory
                                + ":" + e,
                        e);
            }
            //获取解析XML的DocumentBuilder对象
            DocumentBuilder builder = docBuilderFactory.newDocumentBuilder();
            Document doc = null;
            Element root = null;
            boolean returnCachedProperties = false;
            //根据不同资源，做预处理并调用相应刑事的DocumentBuilder.parse
            if (resource instanceof URL) {                  // an URL resource
                doc = parse(builder, (URL) resource);
            } else if (resource instanceof String) {        // a CLASSPATH resource
                URL url = getResource((String) resource);
                doc = parse(builder, url);
            } else if (resource instanceof Path) {          // a file resource
                // Can't use FileSystem API or we get an infinite loop
                // since FileSystem uses Configuration API.  Use java.io.File instead.
                File file = new File(((Path) resource).toUri().getPath())
                        .getAbsoluteFile();
                if (file.exists()) {
                    if (!quiet) {
                        LOG.debug("parsing File " + file);
                    }
                    doc = parse(builder, new BufferedInputStream(
                            new FileInputStream(file)), ((Path) resource).toString());
                }
            } else if (resource instanceof InputStream) {
                doc = parse(builder, (InputStream) resource, null);
                returnCachedProperties = true;
            } else if (resource instanceof Properties) {
                overlay(properties, (Properties) resource);
            } else if (resource instanceof Element) {
                root = (Element) resource;
            }

            if (root == null) {
                if (doc == null) {
                    if (quiet) {
                        return null;
                    }
                    throw new RuntimeException(resource + " not found");
                }
                root = doc.getDocumentElement();
            }
            Properties toAddTo = properties;
            if (returnCachedProperties) {
                toAddTo = new Properties();
            }
            //根节点应该是configuration
            if (!"configuration".equals(root.getTagName()))
                LOG.fatal("bad conf file: top-level element not <configuration>");
            //获取根节点的所有子节点
            NodeList props = root.getChildNodes();
            DeprecationContext deprecations = deprecationContext.get();
            for (int i = 0; i < props.getLength(); i++) {
                Node propNode = props.item(i);
                if (!(propNode instanceof Element))
                    continue;   //如果子节点不是Element，则忽略
                Element prop = (Element) propNode;
                if ("configuration".equals(prop.getTagName())) {
                //如果子节点是configuration，递归调用loadResource进行处理，这意味着configuration的子节点可以是configuration
                    loadResource(toAddTo, new Resource(prop, name), quiet);
                    continue;
                }
                //子节点是property
                if (!"property".equals(prop.getTagName()))
                    LOG.warn("bad conf file: element not <property>");
                NodeList fields = prop.getChildNodes();
                String attr = null;
                String value = null;
                boolean finalParameter = false;
                LinkedList<String> source = new LinkedList<String>();
                //查找name、value、final的值
                for (int j = 0; j < fields.getLength(); j++) {
                    Node fieldNode = fields.item(j);
                    if (!(fieldNode instanceof Element))
                        continue;
                    Element field = (Element) fieldNode;
                    if ("name".equals(field.getTagName()) && field.hasChildNodes())
                        attr = StringInterner.weakIntern(
                                ((Text) field.getFirstChild()).getData().trim());
                    if ("value".equals(field.getTagName()) && field.hasChildNodes())
                        value = StringInterner.weakIntern(
                                ((Text) field.getFirstChild()).getData());
                    if ("final".equals(field.getTagName()) && field.hasChildNodes())
                        finalParameter = "true".equals(((Text) field.getFirstChild()).getData());
                    if ("source".equals(field.getTagName()) && field.hasChildNodes())
                        source.add(StringInterner.weakIntern(
                                ((Text) field.getFirstChild()).getData()));
                }
                source.add(name);

                // Ignore this parameter if it has already been marked as 'final'
                if (attr != null) {
                    if (deprecations.getDeprecatedKeyMap().containsKey(attr)) {
                        DeprecatedKeyInfo keyInfo =
                                deprecations.getDeprecatedKeyMap().get(attr);
                        keyInfo.clearAccessed();
                        for (String key : keyInfo.newKeys) {
                            // update new keys with deprecated key's value
                            loadProperty(toAddTo, name, key, value, finalParameter,
                                    source.toArray(new String[source.size()]));
                        }
                    } else {
                        loadProperty(toAddTo, name, attr, value, finalParameter,
                                source.toArray(new String[source.size()]));
                    }
                }
            }

            if (returnCachedProperties) {
                overlay(properties, toAddTo);
                return new Resource(toAddTo, name);
            }
            return null;
        } catch (IOException e) {
            LOG.fatal("error parsing conf " + name, e);
            throw new RuntimeException(e);
        } catch (DOMException e) {
            LOG.fatal("error parsing conf " + name, e);
            throw new RuntimeException(e);
        } catch (SAXException e) {
            LOG.fatal("error parsing conf " + name, e);
            throw new RuntimeException(e);
        } catch (ParserConfigurationException e) {
            LOG.fatal("error parsing conf " + name, e);
            throw new RuntimeException(e);
        }
    }

View Code

　　如上，如果允许(loadDefaults==true)加载默认资源则会优先加载defaultResources中得资源，如果CLASSPATH下还有hadoop-site.xml文件也会加载；最后将指定的资源进行加载，因为有顺序，所以有同名的话会被覆盖，除非是final类型的。

　　通过以上getProps就会获得所有配置信息了，调用其getProperty方法就可以获取需要属性的值了。再传递给substituteVars进行属性扩展，代码如下:

//是配合正则表达式对象对含有环境变量的参数值进行解析的方法
    private String substituteVars(String expr) {
        if (expr == null) {
            return null;
        }
        Matcher match = VAR_PATTERN.matcher("");
        String eval = expr;
        //循环，最多做MAX_SUBST次属性扩展
        for (int s = 0; s < MAX_SUBST; s++) {
            match.reset(eval);
            if (!match.find()) {
                return eval;    //什么都没找到，返回
            }
            String var = match.group();
            var = var.substring(2, var.length() - 1); // remove ${ .. }获得属性扩展的键
            String val = null;
            try {
                //俺看java虚拟机的系统属性有没有var对应的val，这一步保证了优先使用java的系统属性
                val = System.getProperty(var);
            } catch (SecurityException se) {
                LOG.warn("Unexpected SecurityException in Configuration", se);
            }
            if (val == null) {
                val = getRaw(var);  //然后是Configuration对象中得配置属性
            }
            if (val == null) {
                //属性扩展中得var没有绑定，不做扩展，返回
                return eval; // return literal ${var}: var is unbound
            }
            // substitute替换${ ... }，完成属性扩展
            eval = eval.substring(0, match.start()) + val + eval.substring(match.end());
        }
        //属性扩展次数太多，抛出异常
        throw new IllegalStateException("Variable substitution depth too large: "
                + MAX_SUBST + " " + expr);
    }

　　这里会限制扩展次数，优先考虑配置的系统属性，然后是Configuration中配置的属性。java系统属性，可以通过-DXXX=YYY的方式在jvm或者启动命令中指定。

　　这样set*获取到string类型的值了，然后可以根据返回类型进行处理。

　　五、set*设置配置项，set相对于get则要简单一些，set*方法最终会调用set(String name, String value, String source)方法，source方法用来说明configuration的来源，一般设置为null，这个方法会调用properties和overlay的setProperty()方法，保存传入的键值对，同时也会更新updatingResource。

　　在编写mapreduce时，可能需要各个task共享一些数据，可以通过Configuration的set*方法来配置，并在mapper或者reducer中setup方法中的context获取。

　　总之来说，一、创建对象，会加载默认资源(前提是loadResource=true)；二、add资源(可选，没有这步就是hadoop默认的资源了)，会清楚原来的数据，但不会立即加载资源；三、get*方法，会触发资源的加载(getProps)，处理属性扩展等，返回属性对应的值；四、set*设置自己的参数。

posted on 2021-01-19 11:25 情陌人灬已不在阅读(335) 评论(0) 编辑收藏举报

刷新页面返回顶部

情陌人灬已不在

Hadoop源码之Configuration

导航

公告