Java 8 中HashMap源码分析

HashMap的系统介绍：

HashMap实现了Map接口（注意：map类容器都没有实现Collection接口，只有set，list这类的容器才实现Collection），其对一般的基本操作（put，get，contains）能够保证常数时间，当然前提是hash function能让各个key分布的均匀。然而HashMap不能维护其内<key, value>对的顺序，也不保证其中的顺序是一直不变的。

有两个参数能够影响HashMap的性能： initial capacity 与 load factor，前者指创建HashMap时指定的bucket（抽象list）数量，即底层数组的length，默认为16；后者指装填因子，即当 NUMS(Entry) > load factor * capacity 时，自动扩充数组rehash，默认为0.75。

此外，HashMap is not synchronized。可以使用工具类Collections中的方法：Map m = Collections.synchronizedMap(new HashMap(...));来获取一个并发的hashmap。当遍历HashMap时，有另一个Thread试图修改hashmap，会立即终止迭代并抛出 ConcurrentModificationException ，即所谓的fail-fast策略。

实现原理的概述：

在hashmap的实际实现时，其底层为bucket的数组(bucket=bin)。为让Node分布更均匀，不至于扎堆集中到同一个bin中，通过key.hashCode()经位运算得一个h值，在利用此h值来计算数组下标index。据此index，在数组中定位到bucket，然后在bucket中进行查找或插入。bucket有list和tree两种形式：list式的node更小，但是可能导致bucket很深，遍历list时更耗时；treenode更大，但其能有效降低bucket的深度，能够更加快速的遍历bucket。在实际使用时，只有当map比较大时，才会采用tree式的bucket，以空间换时间。

具体实现源码分析：

以下分析均基于list数组形式，不考虑tree node（一个TreeNode大概是普通Node的两倍大小）。且主要考虑get，put，contains三个方法

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;  //默认bucket数量（数组大小）为16
static final int MAXIMUM_CAPACITY = 1 << 30;         //默认最大capacity为2^30
static final float DEFAULT_LOAD_FACTOR = 0.75f;      //默认装填因子为0.75

static final int TREEIFY_THRESHOLD = 8;        //当一个bucket中多余8的元素时，这个bin(bucket)就会转换为tree实现
static final int UNTREEIFY_THRESHOLD = 6;      //当bin中元素小于6时，就会转换为list形式实现bucket的功能
static final int MIN_TREEIFY_CAPACITY = 64;    //当大于64个bin时，才会考虑向tree转化
//几个重要属性
transient Node<K,V>[] table;     //bucket的数组，即底层list的数组
transient int size;              //<key, value>对的数量，调用size()方法返回的就是这个量
transient int modCount;          //记录在迭代时，map被修改的次数，据此在并发环境下报告异常

list式bucket的node：

//利用静态内部类Node来封装<key-value>对数据，同时用next属性来构成list结构的bucket
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;                                               //指向下一个Node，构成list，将index相同的Node，都装到同一个bucket中

        Node(int hash, K key, V value, Node<K,V> next) {              //constructor
            ... ...
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            ... ...
        }

        public final boolean equals(Object o) {                      //判断两个<key, value>是否完全相等
            ... ...
        }
    }

构造方法：

//在构造时注意，table size(即数组长度)永远是2的n次方。例如按HashMap(15)，table size应为2^4=16
public HashMap(int initialCapacity, float loadFactor) {   //用户指定初始数组大小及装填因子
    ... ...
}
public HashMap(int initialCapacity) { ... ... }
public HashMap() { ... ... }
 

//静态工具方法：
    static final int hash(Object key) {                  //根据key的hashCode来计算出一个值h
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
//分析：在查找或插入Node时，不直接使用key.hashCode()，而将key.hashCode经过位运算求得一个h值，再根据h值确定数组下标index。
//原因：为了让Node能更加均匀的分布到数组中各个bucket中，尽量避免扎堆

get() 与 containsKey()方法的实现：

public boolean containsKey(Object key) {
        return getNode(hash(key), key) != null;                //使用h值来定位bucket
    }
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }


    final Node<K,V> getNode(int hash, Object key) {            //被get与contains调用的工具方法，输入的hash不是key.hashCode，而是上面提到的h值
        Node<K,V>[] tab; 
        Node<K,V> first, e; 
        int n; K k;
                                                               //index = (table.length-1) & h ，定位到bucket
        if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {

            if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k))))  //若这个bin中第一个元素即为所找就直接返回
                return first;

            if ((e = first.next) != null) {                                   //否则，就得遍历这个bucket来查找
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);     //bin为tree式的

                do {                                                          //bin为list式的
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;                                             //在遍历bucket时得不断调用equals()方法
                } while ((e = e.next) != null);                               
            }

        }
        return null;
    }

put(key, value)方法的实现：

 

    public V put(K key, V value) {                              //key已有时，就更新其对应的value，否则新建一个Node放入value
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {   //被put()调用的工具方法
        Node<K,V>[] tab; 
        Node<K,V> p; 
        int n, i;

        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;                     //resize()方法将底层bucket的数组table扩充为2倍

        if ((p = tab[i = (n - 1) & hash]) == null)           //index = (table.length-1) & h ，定位到bucket
            tab[i] = newNode(hash, key, value, null);        //当这个bucket为null时，就新建一个Node，这个Node就是此bin的第一个节点
        else {                                               //当bucket不为null，有同key的就替换其value，否在就插入一个新Node
            Node<K,V> e; K k;                                //Node e 为需要放入value的Node，要么是bucket中已有的，要么是新插入的

            if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))   //bucket的第一个node与欲put的同key
                e = p;                                                                      //记录下这个Node，在后面统一替换
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);             //tree式bucket
            else {                                                                          //list式bucket
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {                                //最初p是bucket中第一个Node，且上面已经判断p不是所找的Node
                        p.next = newNode(hash, key, value, null);              //在循环中，不断变更p，且利用binCount来记录此bin中已有多少Node了
                        if (binCount >= TREEIFY_THRESHOLD - 1)                 //根据binCount判断是否需要tree化bucket
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))  //在bucket中找到了同key的node，就停下，后面更新其value
                        break;                                                                     //若没找到同key的就说明bin中没有，就e = newNode
                    p = e;
                }
            }
            if (e != null) {                  //此时e要么为bucket中新插入的Node，要么为同key需要更新value的
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;                           //记录修改次数，以使当并发错误时能抛出异常
        if (++size > threshold)               //更新size，并判断是否需要扩容table
            resize();
        afterNodeInsertion(evict);
        return null;
    }
//此两个方法在源码中没有具体实现，意思应该是当访问或插入之后，回调此方法，完成一些额外的功能
    void afterNodeAccess(Node<K,V> p) { }
    void afterNodeInsertion(boolean evict) { }
 

//再看一下map中的一个foreach方法，在进行迭代时，使用modCount来保证并发出错时能终止迭代，并抛出异常
        public final void forEach(Consumer<? super V> action) {
            Node<K,V>[] tab;
            if (action == null)
                throw new NullPointerException();
            if (size > 0 && (tab = table) != null) {
                int mc = modCount;
                for (int i = 0; i < tab.length; ++i) {
                    for (Node<K,V> e = tab[i]; e != null; e = e.next)
                        action.accept(e.value);
                }
                if (modCount != mc)                      //在迭代时，若map被其他的线程修改了，就抛出异常
                    throw new ConcurrentModificationException();
            }
        }

posted @ 2015-07-12 20:51 Mr.do 阅读(128) 评论(0) 收藏举报

刷新页面返回顶部

Mr.do

Java 8 中HashMap源码分析

公告