HashMap在JDK1.7和1.8的区别（附带ConcurrentHashMap，个人理解向）

前言：本文只是总结，具体实现需要结合源码理解，本问题旨在探讨跟着JDK的迭代，理解设计者的思维模式，思考为什么这么设计（Talk is cheap,show me the code）

有关HashMap的面试常见问题可以参考大佬的总结：https://blog.csdn.net/v123411739/article/details/106324537

一、HashMap在JDK1.8里的优化

先放一张大佬总结的图，根据源码去理解里面的区别

　　1.初始化操作集成在reSize里面（1.7中单独方法inflateTable）tips：不明觉厉，少一个方法会提高效率吗？

//JDK1.7
private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}
//JDK1.8在后面resize方法可以找到

　　2.通过构造函数指定容量时计算capacity时使用五次|=运算、五次位运算、一次加法和一次减法（1.7调用Integer.highestOneBit()取最高位后再做一次左移，总计使用五次|=运算、七次位运算和两次减法）

tips：就是把原先直接调用Integer.highestOneBit()里多余的两次位运算省略了，其实没优化多少，加减运算和位运算效率相当，而乘除运算比位运算慢几十倍，这是优化的关键所在。

//JDK1.7
private static int roundUpToPowerOf2(int number) {
    // assert number >= 0 : "number must be non-negative";
    return number >= MAXIMUM_CAPACITY
            ? MAXIMUM_CAPACITY
            : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
}
public static int highestOneBit(int i) {
    // HD, Figure 3-1
    i |= (i >>  1);
    i |= (i >>  2);
    i |= (i >>  4);
    i |= (i >>  8);
    i |= (i >> 16);
    return i - (i >>> 1);
}
//JDK1.8
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

　　3.扰动处理1次位运算，1次异或（1.7使用4次位运算，5次异或）tips：同上不明觉厉

//JDK1.7
final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }
    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
//JDK1.8
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

　　4.插入时采用尾插法，避免了并发扩容情况下罕见的循环链表（头插法效率高但是会倒置链表顺序，尾插法按插入顺序维护解决了循环链表的bug但是效率低每次插入需要遍历）

tips：其实链表插入快的优点HashMap并没有发挥到，因为put时总要遍历一遍key，链表遍历又非常慢，但是扩容后转移数据时是单纯的链表插入操作于是就有了效率问题，设计者在扩容转移数据方面做了如下第6点的优化。

//JDK1.7
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }
    createEntry(hash, key, value, bucketIndex);
}

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
　　//新建的Entry指向e，即头插法
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
Entry(int h, K k, V v, Entry<K,V> n) {
    value = v;
    next = n;
    key = k;
    hash = h;
}
//DK1.8在后面putVal方法可以找到

　　5.先插入再扩容，扩容后进行统一transfer操作（1.7先扩容后插入）

tips：1.7只需要判断key是否相等，头插法效率高不需要遍历，可以在插入前判断是否需要扩容，1.8采用尾插法也需要遍历于是和key值判等写在一起，同时计算插入后节点数量达到树化个数（也可能扩容不树化），所以这大概只是流程变化引起的差异，归根结底都是改用尾插法导致的。

//JDK1.7
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}
//JDK1.8
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    ……
　　//头结点判空
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
　　　　　//key值判等
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
　　　　　//是否为树节点
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
　　　　　　　//结点不为空，key值和头结点key不相等，不为树节点则开始遍历
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
　　　　　　　　　　　　//尾插法
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
    ……
    }
    ++modCount;
    ……
    return null;
}

　　6.扩容transfer时，只考虑高位bit，高位为0则保持在原索引位置，为1则偏移+原capacity，巧用&运算（详情见下方注释），前提是capacity为2的幂数（1.7采用重新遍历，1.8如果采用遍历的话，哈希冲突时效率很低，尾插法需要遍历链表，头插法则不需要） tips：高低位巧用&运算的前提是hash&(n-1)等价于对hash mod n，即n为2的幂数。

//JDK1.7

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
　　　　　　　//头插法
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}
//JDK1.8
final Node<K,V>[] resize() {
    ……
　　
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
　　　　　　　//头结点不为空
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
　　　　　　　　　　//尾部为空则插入尾部
                if (e.next == null) 
                    newTab[e.hash & (newCap - 1)] = e;
　　　　　　　　　　//如果是树结点则分裂为两个树，和链表的高低位分批插入
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order 尾插法保持顺序不变
　　　　　　　　　　　　//lo即LowBit低位，hi即HignBit高位，四个变量分别为高低位的头和尾
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
　　　　　　　　　　　　　　//定位到index的算法为hash&(capacity-1))，而capacity为2的幂数，以16为例，扩容后为32，那么任何数和01111(16-1)相&的结果和11111(32-1)相&只有最高位不同
　　　　　　　　　　　　　　//与10000（16）相&得0，那么与11111（32-1）相&最高位为0，则扩容后index不变，在这里用loHead和loTail表示这组链表
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
　　　　　　　　　　　　　　//与10000（16）相&得1，那么与11111（32-1）相&最高位为1，则扩容后index偏移16(capacity)，在这里用hiHead和hiTail表示这组链表
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
　　　　　　　　　　　　//拼接低位链表头尾
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
　　　　　　　　　　　　//拼接高位链表头尾
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

　　7.结构上加了红黑树，链表长度为8且数组长度大于等于64时树化，数组长度不足时扩容（红黑树查找效率高，但是占用空间多，结点为链表结点的2倍）。

tips：不明觉厉，以后还要多补充数据结构的知识。

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
　　//容量初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
   //二次哈希定位后结点为空 
　　if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //头结点key值相等，替换
　　　　 if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
　　　　 //树结点
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
　　　　　　　　　　　　//尾插法
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
　　　　　　　　　　　　　　　//链表结点数大于8时，树化（treeifyBin里还要判断）
                        treeifyBin(tab, hash);
                    break;
                }
　　　　　　　　　　//遍历时发现key值相等，替换
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
 　　 ……
    return null;
}
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
　　 //数组长度小于64时，选择扩容而不树化
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

二、ConcurrentHashMap

　　1.JDK1.7中的实现为Segment+HashEntry数组（每个Segment都相等于一个HashMap），Segment本身继承自ReentantLock，是ConcurrentHashMap线程安全的保证，锁的粒度为每个Segment，扩容操作rehash时也只是以自身Segment为单位进行rehash。

public V put(K key, V value) {
    Segment<K,V> s;
    if (value == null)
        throw new NullPointerException();
　　 //第一次hash定位到Segment，第二次hash定位到hashEntry的index。
    int hash = hash(key);
    int j = (hash >>> segmentShift) & segmentMask;
    if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
         (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
        s = ensureSegment(j);
    return s.put(key, hash, value, false);
}
//Segment的put方法
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
　　//扫描加锁
    HashEntry<K,V> node = tryLock() ? null :
        scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        HashEntry<K,V>[] tab = table;
　　　　　//取模定位到链表头结点
        int index = (tab.length - 1) & hash;
        HashEntry<K,V> first = entryAt(tab, index);
        for (HashEntry<K,V> e = first;;) {
            //头结点不为空遍历链表，key值判等
　　　　　　　　if (e != null) {
                K k;
                if ((k = e.key) == key ||
                    (e.hash == hash && key.equals(k))) {
                    oldValue = e.value;
                    if (!onlyIfAbsent) {
                        e.value = value;
                        ++modCount;
                    }
                    break;
                }
                e = e.next;
            }
            else {
　　　　　　　　　//头插法第一步，尾部指向原先头结点
                if (node != null)
                    node.setNext(first);
　　　　　　　　　//头结点为空，先key，value存入
                else
                    node = new HashEntry<K,V>(hash, key, value, first);
                int c = count + 1;
　　　　　　　　　//数量超过阈值，扩容（重哈希）
                if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                    rehash(node);
　　　　　　　　　//头插法第二步，tab[index]=node
                else
                    setEntryAt(tab, index, node);
                ++modCount;
                count = c;
                oldValue = null;
                break;
            }
        }
    } finally {
        unlock();
    }
    return oldValue;
}

总结：和1.7HashMap差不多，把原先的HashMap包装成可加锁的Segment，再由Segment数组组成ConcurrentHashMap，put时加锁

private void rehash(HashEntry<K,V> node) {
    HashEntry<K,V>[] oldTable = table;
    int oldCapacity = oldTable.length;
    int newCapacity = oldCapacity << 1;
    threshold = (int)(newCapacity * loadFactor);
    HashEntry<K,V>[] newTable =
        (HashEntry<K,V>[]) new HashEntry[newCapacity];
    int sizeMask = newCapacity - 1;
    for (int i = 0; i < oldCapacity ; i++) {
        HashEntry<K,V> e = oldTable[i];
        if (e != null) {
            HashEntry<K,V> next = e.next;
            int idx = e.hash & sizeMask;
　　　　　　　//单个结点　
            if (next == null)   //  Single node on list
                newTable[idx] = e;
            else { // Reuse consecutive sequence at same slot
                HashEntry<K,V> lastRun = e;
                int lastIdx = idx;
　　　　　　　　　　//遍历链表，计算每次rehash的值，找到最后一次相邻两个newIndex不同的时候，标记node和index
                for (HashEntry<K,V> last = next;
                     last != null;
                     last = last.next) {
                    int k = last.hash & sizeMask;
                    if (k != lastIdx) {
                        lastIdx = k;
                        lastRun = last;
                    }
                }
　　　　　　　　　　//把记录的node插入到链表newIndex头部
                newTable[lastIdx] = lastRun;
                // Clone remaining nodes
　　　　　　　　　　//从头遍历到记录的位置，依次rehash转移到newTable，头插法
                for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
                    V v = p.value;
                    int h = p.hash;
                    int k = h & sizeMask;
                    HashEntry<K,V> n = newTable[k];
                    newTable[k] = new HashEntry<K,V>(h, p.key, v, n);
                }
            }
        }
    }
    int nodeIndex = node.hash & sizeMask; // add the new node
    node.setNext(newTable[nodeIndex]);
    newTable[nodeIndex] = node;
    table = newTable;
}

总结：因为rehash会有重复的（其实后来发现就两个值），那么把原链表尾部一段rehash相同的部分结点先切下来转移，再遍历链表剩下部分。

　　2.JDK1.8中的实现和HashMap相同为Node数组。线程安全方面则完全摒弃了1.7中的实现方案，新实现为Synchronized代码块+CAS，锁的粒度为每个Node节点。

final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
　　　　　　　//CAS乐观锁，如果Tab[i]的值为null则把new Node赋给Tab[i]
            if (casTabAt(tab, i, null,
                         new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        else {
            V oldVal = null;
　　　　　　　//遍历node结点时加锁
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) {
                        binCount = 1;
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
　　　　　　　　　　　　　　　　//遍历key值判等
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node<K,V> pred = e;
　　　　　　　　　　　　　　　　　//尾插法
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                       value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
            if (binCount != 0) {
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}

总结：和1.8的HashMap差距不大，定位到node节点时如果为空，使用CAS插入，不为空要遍历链表时使用Synchronized

private final void treeifyBin(Node<K,V>[] tab, int index) {
    Node<K,V> b; int n, sc;
    if (tab != null) {
　　　　　//一样的树化前判断数组长度是否小于64，小则扩容
        if ((n = tab.length) < MIN_TREEIFY_CAPACITY)
            tryPresize(n << 1);
        else if ((b = tabAt(tab, index)) != null && b.hash >= 0) {
　　　　　　　//树化时加锁
            synchronized (b) {
                if (tabAt(tab, index) == b) {
                    TreeNode<K,V> hd = null, tl = null;
                    for (Node<K,V> e = b; e != null; e = e.next) {
                        TreeNode<K,V> p =
                            new TreeNode<K,V>(e.hash, e.key, e.val,
                                              null, null);
                        if ((p.prev = tl) == null)
                            hd = p;
                        else
                            tl.next = p;
                        tl = p;
                    }
                    setTabAt(tab, index, new TreeBin<K,V>(hd));
                }
            }
        }
    }
}

JDK1.8ConcurrentHashMap扩容好难~下次补充。

posted @ 2022-02-20 05:34 小皮睡不醒阅读(319) 评论(0) 收藏举报

刷新页面返回顶部

HashMap在JDK1.7和1.8的区别（附带ConcurrentHashMap，个人理解向）

公告