HashMap在JDK1.7和1.8的区别(附带ConcurrentHashMap,个人理解向)

前言:本文只是总结,具体实现需要结合源码理解,本问题旨在探讨跟着JDK的迭代,理解设计者的思维模式,思考为什么这么设计(Talk is cheap,show me the code)

有关HashMap的面试常见问题可以参考大佬的总结:https://blog.csdn.net/v123411739/article/details/106324537

一、HashMap在JDK1.8里的优化

 先放一张大佬总结的图,根据源码去理解里面的区别

  1.初始化操作集成在reSize里面(1.7中单独方法inflateTable)tips:不明觉厉,少一个方法会提高效率吗?

复制代码
//JDK1.7
private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}
//JDK1.8在后面resize方法可以找到
复制代码

  2.通过构造函数指定容量时计算capacity时使用五次|=运算、五次位运算、一次加法一次减法(1.7调用Integer.highestOneBit()取最高位后再做一次左移,总计使用五次|=运算七次位运算两次减法

tips:就是把原先直接调用Integer.highestOneBit()里多余的两次位运算省略了,其实没优化多少,加减运算和位运算效率相当,而乘除运算比位运算慢几十倍,这是优化的关键所在。

复制代码
//JDK1.7
private static int roundUpToPowerOf2(int number) {
    // assert number >= 0 : "number must be non-negative";
    return number >= MAXIMUM_CAPACITY
            ? MAXIMUM_CAPACITY
            : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
}
public static int highestOneBit(int i) {
    // HD, Figure 3-1
    i |= (i >>  1);
    i |= (i >>  2);
    i |= (i >>  4);
    i |= (i >>  8);
    i |= (i >> 16);
    return i - (i >>> 1);
}
//JDK1.8
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
复制代码

  3.扰动处理1次位运算,1次异或(1.7使用4次位运算,5次异或)tips:同上不明觉厉

复制代码
//JDK1.7
final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }
    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
//JDK1.8
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
复制代码

  4.插入时采用尾插法,避免了并发扩容情况下罕见的循环链表(头插法效率高但是会倒置链表顺序,尾插法按插入顺序维护解决了循环链表的bug但是效率低每次插入需要遍历)

tips:其实链表插入快的优点HashMap并没有发挥到,因为put时总要遍历一遍key,链表遍历又非常慢,但是扩容后转移数据时是单纯的链表插入操作于是就有了效率问题,设计者在扩容转移数据方面做了如下第6点的优化。

复制代码
//JDK1.7
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }
    createEntry(hash, key, value, bucketIndex);
}

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
  //新建的Entry指向e,即头插法
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
Entry(int h, K k, V v, Entry<K,V> n) {
    value = v;
    next = n;
    key = k;
    hash = h;
}
//DK1.8在后面putVal方法可以找到
复制代码

  5.先插入再扩容,扩容后进行统一transfer操作(1.7先扩容后插入)

tips:1.7只需要判断key是否相等,头插法效率高不需要遍历,可以在插入前判断是否需要扩容,1.8采用尾插法也需要遍历于是和key值判等写在一起,同时计算插入后节点数量达到树化个数(也可能扩容不树化),所以这大概只是流程变化引起的差异,归根结底都是改用尾插法导致的。

复制代码
//JDK1.7
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}
//JDK1.8
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    ……
  //头结点判空
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
     //key值判等
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
     //是否为树节点
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
       //结点不为空,key值和头结点key不相等,不为树节点则开始遍历
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
            //尾插法
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
    ……
    }
    ++modCount;
    ……
    return null;
}
复制代码

  6.扩容transfer时,只考虑高位bit,高位为0则保持在原索引位置,为1则偏移+原capacity,巧用&运算(详情见下方注释),前提是capacity为2的幂数(1.7采用重新遍历,1.8如果采用遍历的话,哈希冲突时效率很低,尾插法需要遍历链表,头插法则不需要) tips:高低位巧用&运算的前提是hash&(n-1)等价于对hash mod n,即n为2的幂数

复制代码
//JDK1.7

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
       //头插法
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}
//JDK1.8
final Node<K,V>[] resize() {
    ……
  
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
       //头结点不为空
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
          //尾部为空则插入尾部
                if (e.next == null) 
                    newTab[e.hash & (newCap - 1)] = e;
          //如果是树结点则分裂为两个树,和链表的高低位分批插入
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order 尾插法保持顺序不变
            //lo即LowBit低位,hi即HignBit高位,四个变量分别为高低位的头和尾
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
              //定位到index的算法为hash&(capacity-1)),而capacity为2的幂数,以16为例,扩容后为32,那么任何数和01111(16-1)相&的结果和11111(32-1)相&只有最高位不同
              //与10000(16)相&得0,那么与11111(32-1)相&最高位为0,则扩容后index不变,在这里用loHead和loTail表示这组链表
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
              //与10000(16)相&得1,那么与11111(32-1)相&最高位为1,则扩容后index偏移16(capacity),在这里用hiHead和hiTail表示这组链表
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
            //拼接低位链表头尾
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
            //拼接高位链表头尾
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
复制代码

  7.结构上加了红黑树,链表长度为8且数组长度大于等于64时树化,数组长度不足时扩容(红黑树查找效率高,但是占用空间多,结点为链表结点的2倍)。

tips:不明觉厉,以后还要多补充数据结构的知识。

复制代码
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
  //容量初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
   //二次哈希定位后结点为空 
  if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //头结点key值相等,替换
     if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
     //树结点
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
            //尾插法
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
               //链表结点数大于8时,树化(treeifyBin里还要判断)
                        treeifyBin(tab, hash);
                    break;
                }
          //遍历时发现key值相等,替换
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
    ……
    return null;
}
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
   //数组长度小于64时,选择扩容而不树化
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}
复制代码

 

二、ConcurrentHashMap

  1.JDK1.7中的实现为Segment+HashEntry数组(每个Segment都相等于一个HashMap),Segment本身继承自ReentantLock,是ConcurrentHashMap线程安全的保证,锁的粒度为每个Segment,扩容操作rehash时也只是以自身Segment为单位进行rehash。

复制代码
public V put(K key, V value) {
    Segment<K,V> s;
    if (value == null)
        throw new NullPointerException();
   //第一次hash定位到Segment,第二次hash定位到hashEntry的index。
    int hash = hash(key);
    int j = (hash >>> segmentShift) & segmentMask;
    if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
         (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
        s = ensureSegment(j);
    return s.put(key, hash, value, false);
}
//Segment的put方法
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
  //扫描加锁
    HashEntry<K,V> node = tryLock() ? null :
        scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        HashEntry<K,V>[] tab = table;
     //取模定位到链表头结点
        int index = (tab.length - 1) & hash;
        HashEntry<K,V> first = entryAt(tab, index);
        for (HashEntry<K,V> e = first;;) {
            //头结点不为空遍历链表,key值判等
        if (e != null) {
                K k;
                if ((k = e.key) == key ||
                    (e.hash == hash && key.equals(k))) {
                    oldValue = e.value;
                    if (!onlyIfAbsent) {
                        e.value = value;
                        ++modCount;
                    }
                    break;
                }
                e = e.next;
            }
            else {
         //头插法第一步,尾部指向原先头结点
                if (node != null)
                    node.setNext(first);
         //头结点为空,先key,value存入
                else
                    node = new HashEntry<K,V>(hash, key, value, first);
                int c = count + 1;
         //数量超过阈值,扩容(重哈希)
                if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                    rehash(node);
         //头插法第二步,tab[index]=node
                else
                    setEntryAt(tab, index, node);
                ++modCount;
                count = c;
                oldValue = null;
                break;
            }
        }
    } finally {
        unlock();
    }
    return oldValue;
}
复制代码

总结:和1.7HashMap差不多,把原先的HashMap包装成可加锁的Segment,再由Segment数组组成ConcurrentHashMap,put时加锁

复制代码
private void rehash(HashEntry<K,V> node) {
    HashEntry<K,V>[] oldTable = table;
    int oldCapacity = oldTable.length;
    int newCapacity = oldCapacity << 1;
    threshold = (int)(newCapacity * loadFactor);
    HashEntry<K,V>[] newTable =
        (HashEntry<K,V>[]) new HashEntry[newCapacity];
    int sizeMask = newCapacity - 1;
    for (int i = 0; i < oldCapacity ; i++) {
        HashEntry<K,V> e = oldTable[i];
        if (e != null) {
            HashEntry<K,V> next = e.next;
            int idx = e.hash & sizeMask;
       //单个结点 
            if (next == null)   //  Single node on list
                newTable[idx] = e;
            else { // Reuse consecutive sequence at same slot
                HashEntry<K,V> lastRun = e;
                int lastIdx = idx;
          //遍历链表,计算每次rehash的值,找到最后一次相邻两个newIndex不同的时候,标记node和index
                for (HashEntry<K,V> last = next;
                     last != null;
                     last = last.next) {
                    int k = last.hash & sizeMask;
                    if (k != lastIdx) {
                        lastIdx = k;
                        lastRun = last;
                    }
                }
          //把记录的node插入到链表newIndex头部
                newTable[lastIdx] = lastRun;
                // Clone remaining nodes
          //从头遍历到记录的位置,依次rehash转移到newTable,头插法
                for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
                    V v = p.value;
                    int h = p.hash;
                    int k = h & sizeMask;
                    HashEntry<K,V> n = newTable[k];
                    newTable[k] = new HashEntry<K,V>(h, p.key, v, n);
                }
            }
        }
    }
    int nodeIndex = node.hash & sizeMask; // add the new node
    node.setNext(newTable[nodeIndex]);
    newTable[nodeIndex] = node;
    table = newTable;
}
复制代码

总结:因为rehash会有重复的(其实后来发现就两个值),那么把原链表尾部一段rehash相同的部分结点先切下来转移,再遍历链表剩下部分。

  2.JDK1.8中的实现和HashMap相同为Node数组。线程安全方面则完全摒弃了1.7中的实现方案,新实现为Synchronized代码块+CAS,锁的粒度为每个Node节点

复制代码
final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
       //CAS乐观锁,如果Tab[i]的值为null则把new Node赋给Tab[i]
            if (casTabAt(tab, i, null,
                         new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        else {
            V oldVal = null;
       //遍历node结点时加锁
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) {
                        binCount = 1;
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                //遍历key值判等
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node<K,V> pred = e;
                 //尾插法
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                       value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
            if (binCount != 0) {
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}
复制代码

总结:和1.8的HashMap差距不大,定位到node节点时如果为空,使用CAS插入,不为空要遍历链表时使用Synchronized

复制代码
private final void treeifyBin(Node<K,V>[] tab, int index) {
    Node<K,V> b; int n, sc;
    if (tab != null) {
     //一样的树化前判断数组长度是否小于64,小则扩容
        if ((n = tab.length) < MIN_TREEIFY_CAPACITY)
            tryPresize(n << 1);
        else if ((b = tabAt(tab, index)) != null && b.hash >= 0) {
       //树化时加锁
            synchronized (b) {
                if (tabAt(tab, index) == b) {
                    TreeNode<K,V> hd = null, tl = null;
                    for (Node<K,V> e = b; e != null; e = e.next) {
                        TreeNode<K,V> p =
                            new TreeNode<K,V>(e.hash, e.key, e.val,
                                              null, null);
                        if ((p.prev = tl) == null)
                            hd = p;
                        else
                            tl.next = p;
                        tl = p;
                    }
                    setTabAt(tab, index, new TreeBin<K,V>(hd));
                }
            }
        }
    }
}
复制代码
JDK1.8ConcurrentHashMap扩容好难~下次补充。
posted @   小皮睡不醒  阅读(258)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)
点击右上角即可分享
微信分享提示