HashMap底层数组长度与位运算

HashMap数据结构

看过jdk中HashMap源码的同学都知道他的底层数据结构是数组+链表
并且jdk1.8做了优化，当链表长度大于8时会采用红黑树
形如下面两张结构图
images

jdk1.8之前hashmap结构图

images

jdk1.8 hashmap结构图

当然，这不是这篇文章的重点，我们来分析一下底层源码中的位运算。

数组容量初始化

初始化实际发生在第一次put元素，在resize()中完成

1、不指定initialCapacity

//指定默认负载容量，即容量超过3/4时扩容
public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
 
 static final float DEFAULT_LOAD_FACTOR = 0.75f;
 //默认初始容量
 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
 
 
final Node<K,V>[] resize() {
    ...
    
    else {               
           // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        
        
return newTab;
    
}

2、指定initialCapacity


public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

//初始化核心方法
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}


final Node<K,V>[] resize() {
    ...
    int oldThr = threshold;
    ...
    
    else if (oldThr > 0) 
    // initial capacity was placed in threshold
    newCap = oldThr;
    ...  
    return newTab;
}

分析tableSizeFor方法

cap=0 n=-1 通过位运算仍然为-1 方法返回1
cap=1 n=0 通过位运算仍然为-1方法返回也是1
cap>1 n=cap - 1>0 那么二进制的 n 至少有一个 bit 为 1

cap = 17
n = cap-1 = 16  原码=反码=补码 =  0001 0000

n |= n >>> 1  
n >>> 1 = 0000 1000
n= 0001 0000|0000 1000 = 0001 1000     24

n |= n >>> 2 
n >>> 2 = 0000 0110
n= 0001 1000|0000 0110 = 0001 1110     30

n |= n >>> 4 
n >>> 4 = 0000 0001
n= 0001 1110|0000 0001 = 0001 1111     31

n |= n >>> 8
n >>> 8 = 0000 0000
n = 00001 1111|0000 0000 = 0001 1111     31

n |= n >>> 16
n >>> 16 = 0000 0000
n = 00001 1111|0000 0000 = 0001 1111     31

返回 n + 1 = 31+1 = 2^4+2^3+2^2+2^1+2^0 +1 = 2^5=32

位运算的目的是为了将第一个位值1后面的所有位都置换为1最终

n = (省略0) 1...111

最终返回 n + 1 符合数学定理

2^n  = 2^(n-1)+2^(n-2)+...+2^0 + 1

是>=cap 最小2的n次幂

另外resize方法真正扩容时,采取容量翻倍策略

final Node<K,V>[] resize() {
    ...
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //newCap = oldCap << 1  = oldCap x 2
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    ...

｝

所以HashMap底层数组初始化和扩容，数组长度都是2次幂

底层位运算(n - 1) & hash

HashMap底层多个方法用到了这个位运算

final Node<K,V> getNode(int hash, Object key)

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict)
                   
final void treeifyBin(Node<K,V>[] tab, int hash)

final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable)
                               
public V computeIfAbsent(K key,
                             Function<? super K, ? extends V> mappingFunction)
                             
public V compute(K key,
                     BiFunction<? super K, ? super V, ? extends V> remappingFunction)
                     
public V merge(K key, V value,
                   BiFunction<? super V, ? super V, ? extends V> remappingFunction)
                   
final void removeTreeNode(HashMap<K,V> map, Node<K,V>[] tab,
                                  boolean movable)
                                  
//类似
int n;
n = tab.length;
int index = (n - 1) & hash;
//或者
n = tab.length;
tab[i = (n - 1) & hash]

目的都是为了确定元素在数组中的位置，分析(n - 1) & hash

前文可知数组 
n = tab.length = 2^m
n-1 = 2^m -1 = 2^(m-1)+2^(m-2)+...+2^0
二进制换算
n-1= (省略0)11111111

n-1&hash
根据按位与运算的特性 即取二进制hash值的低m位

例如n = 16 = 2^4
n-1 = 16-1 = 2^3+2^2+2^1+2^0 = 0000 1111
(n-1) & hash = 0000 1111 & hash 即取二进制hash值的低4位

无论hash值是多少 
其结果范围只能是  [0000 0000 , 0000 1111] = [0,15]
正好对应数组的index

所以HashMap数组长度是2次幂，将可以很方便的与Key的hash值运算出元素在数组中的位置，非常巧妙。

另外扩容时

newCap = oldCap x 2     2^m -> 2^(m+1)
n-1二进制数的变化 表示原真值的所有位(m)不变，前一位(m+1)由0变成1
例如 oldCap = 16   newCap = 16x2 = 32
oldCap-1 = 15 二进制  0000 1111
newCap-1 = 31 二进制  0001 1111

再来看 n-1&hash  
如果key的hash值 m+1 是0  运算结果不变，元素在数组中位置不变
如果key的hash值 m+1 是1  显然 
运算结果 =原结果+2^m  即
元素的位置 = 原index+原数组长度

扩容时元素迁移也非常方便高效

posted @ 2021-01-07 20:23 刘66 阅读(136) 评论(0) 编辑收藏举报

刷新页面返回顶部

刘66

HashMap底层数组长度与位运算

HashMap数据结构

数组容量初始化

底层位运算(n - 1) & hash

公告