集合框架-HashMap

1.7版本分析

数据结构实现

数组(基本)+链表(扩展)

属性

//默认hash表容量大小 2^4=16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
//最大hash表容量大小，必须是2的整数幂
static final int MAXIMUM_CAPACITY = 1 << 30;
//默认加载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//表未膨胀时共享的空表实例。
static final Entry<?,?>[] EMPTY_TABLE = {};
//hash表resize，大小必须是2的整数幂
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
//表中存在的key-value对实例数量
transient int size;
//resize的阈值 (capacity * load factor)
int threshold;
//hash表的加载因子
final float loadFactor;
//修改的次数，更多在于迭代时使用， fail-fast机制
transient int modCount;
//map容量的默认阈值，高于该阈值的替代散列是用于字符串键。
//替代哈希降低了发生率由于字符串键的弱哈希码计算而导致的冲突。
static final int ALTERNATIVE_HASHING_THRESHOLD_DEFAULT = Integer.MAX_VALUE;
//hash种子初始值
transient int hashSeed = 0;

构造方法

 1 //基本基于这个方法，如：HashMap()，默认16，0.75f
 2 public HashMap(int initialCapacity, float loadFactor) {
 3     if (initialCapacity < 0)
 4         throw new IllegalArgumentException("Illegal initial capacity: " +
 5                                            initialCapacity);
 6     //最大也只有2^30
 7     if (initialCapacity > MAXIMUM_CAPACITY)
 8         initialCapacity = MAXIMUM_CAPACITY;
 9     if (loadFactor <= 0 || Float.isNaN(loadFactor))
10         throw new IllegalArgumentException("Illegal load factor: " +
11                                            loadFactor);
12     this.loadFactor = loadFactor;
13     //resize的阈值 就是给定的初始容量
14     threshold = initialCapacity;
15     //钩子方法
16     init();
17 }

重要方法

put

public V put(K key, V value) {
    //判断是否没存放元素
    if (table == EMPTY_TABLE) {
        //初始化数组表，阈值，hash种子
        inflateTable(threshold);
    }
    if (key == null)
        //可存储key为null的，如果没存则会专门创建一个entry在bucket中且index为0.
        //有了则替换value
        return putForNullKey(value);
    //通过位运算(计算机底层运算，远比%速度快) 得到hash散列值
    int hash = hash(key);
    //通过hash值 与 表长度-1(底层方法) 进行与运算，控制下标一定会在长度范围内
    //如key='a'时，hash=103，length-1=15；0110 0111&0000 1111 = 0000 0111=7
    int i = indexFor(hash, table.length);
    //拿到当前数组下标位上的链表，没有则跳过，否则以它是否有下一节点来进行遍历
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        //先判断key是否一致，是则用新值替换旧值
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    //在此hash新增一个entry，底层createEntry方法，如果当前index处已经有了entry
    //那Entry<K,V> e = table[bucketIndex]; 会先拿到当前index的entry，即上一次添加的
    //table[bucketIndex] = new Entry<>(hash, key, value, e);再将上一次的entry设置为
    //新增的entry的next，且将新增的entry设置为当前index的首个元素(头插法)
    //(即每次新增的都会成为链表的第一个元素，位于数组下标处)
    addEntry(hash, key, value, i);
    return null;
}

private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize 根据当前的阈值得到2的整数幂
    int capacity = roundUpToPowerOf2(toSize);
    //设置新的阈值 capacity * loadFactor
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    //设置新的容量大小
    table = new Entry[capacity];
    //初始化hash种子
    initHashSeedAsNeeded(capacity);
}

final int hash(Object k) {
    //初始值为0，put、扩容会计算有值
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
//获取当前容量下的hash 索引
static int indexFor(int h, int length) {
    return h & (length-1);
}

put流程图

get

public V get(Object key) {
    //特殊的：key为null
    if (key == null)
        return getForNullKey();
    //
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }
    //同样的先根据key得到hash值
    int hash = (key == null) ? 0 : hash(key);
    //再计算出index，去数组中对应的下标拿 entry数组遍历
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        //拿到相同hash值的 entry，没有则返回null
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

入队数组&扩容实现

上述put源码时，分析了当hash碰撞时，链表的插入为头部插入法，即每次新增的key都是链表头节点。

扩容源码

void resize(int newCapacity) {
    //保存旧数组
    Entry[] oldTable = table;
    //保存旧容量
    int oldCapacity = oldTable.length;
    //作下容量判断处理，如果旧容量已经是最大了，则设置阈值为最大且直接返回
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }
    //用新容量创建的新数组
    Entry[] newTable = new Entry[newCapacity];
    //initHashSeedAsNeeded 用新容量得到新的hash种子
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    //将新数组替换旧的数组
    table = newTable;
    //阈值设置为2*旧容量*0.75 与 最大取小
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
//真正的从旧到新的转换
void transfer(Entry[] newTable, boolean rehash) {
    //2*旧容量
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        //遍历hash表中每个位置(下标)上的链表
        while(null != e) {
            //记录旧的next
            Entry<K,V> next = e.next;
            if (rehash) {
                //重新计算hash值
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            //根据上面计算的hash值 & 新容量 计算新hash桶中的下标位置
            int i = indexFor(e.hash, newCapacity);
            //头插法，所以需要将之前可能有的entry放到e的next去
            e.next = newTable[i];
            //将当前e放到hash表中i位置头部
            newTable[i] = e;
            //继续遍历
            e = next;
        }
    }
}

单线程put&扩容图

假设：有一个长度为2的hash表，已有的数据如图1.1

扩容：继续put数据(k4,v4)，按hashmap的扩容(即旧容量*2)

图1.1

扩容&转移数据到新table中

第一步转移k1：

第二步转移k2：

第三步转移k3：

第四步put：

多线程put&扩容图

场景：在图1.1的基础上，有两个线程T1和T2同时进行put，然后需要扩容的操作。

结合代码(主要就是while循环里)

while(null != e) {
    Entry<K,V> next = e.next;//No.1
    if (rehash) {
        e.hash = null == e.key ? 0 : hash(e.key);
    }
    int i = indexFor(e.hash, newCapacity);
    e.next = newTable[i];
    newTable[i] = e;
    e = next;
}

1.当T1在No.1行挂起了的时候，此时的T1和T2如下图2.1和图2.2

由于T1中是旧的hash表，e=k1,next=k2,而T2已经转换过去新hash表了。

图2.1

图2.2

2.T1在此时被唤醒(e=k1,next=k2)

2.1它已经新建了一个hash数组，当计算了hash并执行到e.next = newTable[i]时，k1的next指向的是T1线程中的新hash表(全部为空，如图2.1)，index=3处的头部元素为null，即e.next=null

2.2当执行到newTable[i] = e; 将index=3处设置为k1。

2.3当执行到e = next; 即e=k2。此时状态如图2.3

图2.3

2.4继续循环，e=k2,e.next=k1, 执行e.next = newTable[i],即e.next=k1

2.5当执行到newTable[i] = e; 将index=3处头元素设置为k2。

2.6当执行到e = next; 即e=k1。此时状态如图2.4

图2.4

2.7继续循环，e=k1,e.next=null,

执行e.next = newTable[i],即e.next=k2(即k1又会指向k2，此时🉑k2的next指向的是k1)

2.8当执行到newTable[i] = e; 将index=3处头元素设置为k1。

2.9当执行到e = next; e=null 。此时状态如图2.5

图2.5

看最终图结果得知，k1和k2之间形成了相互的next节点指向，导致死环了。这也是1.7版本严重的问题所在。

1.8版本分析

引入了bucket，位桶的概念，来表示每个hash表index位置处的节点数。

数据结构实现

数组(基本)+链表(扩展)+红黑树(扩展)

链表转红黑树问题

为什么规定长度>=8时，转红黑树？8是怎么来的？

理想情况下使用随机的哈希码，容器中节点分布在hash桶中的频率遵循泊松分布。按照泊松分布的计算公式计算出了桶中元素个数和概率的对照表，可以看到链表中元素个数为8时的概率已经非常小，再多的就更少了。

属性

//默认hash表容量大小 2^4=16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
//最大hash表容量大小，必须是2的整数幂
static final int MAXIMUM_CAPACITY = 1 << 30;
//默认加载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//链表转红黑树阈值
static final int TREEIFY_THRESHOLD = 8;
//红黑树转回链表阈值
static final int UNTREEIFY_THRESHOLD = 6;
//链表转红黑树时，hash表容量需到达的最小容量阈值，否则优先扩容
static final int MIN_TREEIFY_CAPACITY = 64;
//表未膨胀时共享的空表实例。1.8用内部类Node实现
transient Node<K,V>[] table;
//内部类EntrySet实现，用来存放不重复keys
transient Set<Map.Entry<K,V>> entrySet;
//表中存在的key-value对实例数量
transient int size;
//resize的阈值 (capacity * load factor)
int threshold;
//hash表的加载因子
final float loadFactor;
//修改的次数，更多在于迭代时使用， fail-fast机制
transient int modCount;

构造方法

//只给了默认的加载因子
public HashMap() {
     this.loadFactor = DEFAULT_LOAD_FACTOR;  
 }
//基本基于这个方法，如：HashMap()，默认16，0.75f
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    //最大也只有2^30
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    //初始化容量且设定阈值
    this.threshold = tableSizeFor(initialCapacity);
}
//通过多次按位右移，来获得一个2的整数次幂
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

重要方法

put

public V put(K key, V value) {
    //先计算得到hash值
    return putVal(hash(key), key, value, false, true);
}
//采用低16位与高16位进行异或处理获取hash值，更加平均分布
//参考：https://blog.csdn.net/qq_42034205/article/details/90384772
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //判断table是否没初始化过，是则进行阈值、容量等的初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //求indexof，与1.7类似
    if ((p = tab[i = (n - 1) & hash]) == null)
        //hash表index处没元素，直接new一个node存放
        tab[i] = newNode(hash, key, value, null);
    else {
        //否则，追加链表(红黑树)处理
        Node<K,V> e; K k;
        //判断插入的key是否与头部key相等，是就替换值，后方判断
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //不相同，判断是否是红黑树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //链表处理
        else {
            //binCount 用来计数链表的长度，判断转红黑树的阈值，是当前下标处的链表转
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //判断转的阈值，不过底层还会判断整个hash表的容量是否大于64才会进行转
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                //用来判断链表中的key是否与当前插入的key相等，是就替换值，后方判断
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        //key相等时，新值替换旧值
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    //再判断下当前hash表的容量是否超过了扩容阈值
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

get

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //table不为空、table长度大于0、且第一个元素不为null 否则直接返回null
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        //判断hash值和key
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        //判断下一节点是否为空
        if ((e = first.next) != null) {
            //如果已经是红黑树了，用它自己的获取节点的方法
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            //否则走链表遍历判断是否有key
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

入队数组&扩容实现

1.7的是头插法，而1.8采用的是尾插法。具体看代码和图形分析。

扩容&转移源码

//1.8+中都放在一个方法了
//作用：1初始化阈值、容量；2扩容；3扩容后的转移
final Node<K,V>[] resize() {
    //记录旧的hash表
    Node<K,V>[] oldTab = table;
    //记录旧的表容量
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    //记录旧的扩容阈值
    int oldThr = threshold;
    int newCap, newThr = 0;
    //初始化过了
    if (oldCap > 0) {
        //判断容量是否到达最大了
        if (oldCap >= MAXIMUM_CAPACITY) {
            //是的话，直接设置阈值也为最大
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }//否则判断 newCap=2*oldCap是否大于最大容量，oldCap是否大于默认的容量16，都是则扩容
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            //newThr=2*oldThr(如12*2=24)
            newThr = oldThr << 1; // double threshold
    }
    //如前面调用了hashmap的带参构造方法，第一次放map时，oldCap会=null，而oldThr会大于0
    //把容量设置为阈值 
    else if (oldThr > 0) 
        newCap = oldThr;
    else {//使用默认的初始化              
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    //用新容量创建新的hash空表，并给到table
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        //遍历旧hash表上 每个bucket
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                //判断下一节点
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //判断红黑树
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //链表
                else {
                    //这里扩容，将同一个桶中的元素分为了两个链表来处理，低16位和高16位判断
                    //即在新的hash表中，根据当前index划分了两个index位置，
                    //见最后两个判断高低位是否有链表的方法，
                    //低位是直接在index处，
                    //而高位是index+oldCap，避免重新计算hash位置，缩短时间
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        //低16位
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                //直接放到链表首部
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        //高16位
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

扩容示意图

假设：hash表中，i=3处已有一个红黑树的数据，以及后面某index处有一个链表不超过8的数据(表容量没超过64)。

如图：

按resize方法的扩容来，无法达到转红黑树的条件，即走的是链表的处理：

Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
    next = e.next;
    if ((e.hash & oldCap) == 0) {
        if (loTail == null)
            loHead = e;
        else
            loTail.next = e;
        loTail = e;
    }
    else {
        if (hiTail == null)
            hiHead = e;
        else
            hiTail.next = e;
        hiTail = e;
    }
} while ((e = next) != null);
if (loTail != null) {
    loTail.next = null;
    newTab[j] = loHead;
}
if (hiTail != null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;
}

争对J处的数据，假定k4,k5在一个hash，k6,k7在一个hash上。

很明显，当采用了高低位拆分转移方式时，避免了链表死环的产生。

参考

参考链接：

keyset方面、1.7的分析，hash说明清晰、1.8的hahscode说明

posted @ 2021-07-29 23:08 Flyinglion 阅读(40) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

广秀

这是最好的时代

集合框架-HashMap

1.7版本分析

数据结构实现

属性

构造方法

重要方法

put

put流程图

get

入队数组&扩容实现

扩容源码

单线程put&扩容图

多线程put&扩容图

1.8版本分析

数据结构实现

链表转红黑树问题

属性

构造方法

重要方法

put

get

入队数组&扩容实现

扩容&转移源码

扩容示意图

参考

公告