原理一、Java中的HashMap的实现

文章从JDK1.7和JDK1.8两个版本解析HashMap的实现原理及其中常见的面试题(两个版本HashMap最大的区别，1.7版HashMap=数组+链表，1.8版HashMap=数组+红黑树+链表)

一、先讲讲哈希表

又叫散列表，是为了加快查找数组元素的速度，将每个要存进数组的数值进行哈希计算，从而获得另外一个唯一对应的数，将该数作为目标数值存进数组的索引，以后每次查询该数，只要再进行一次哈希计算，可以找到对应的索引，取值。

所以在不考虑哈希冲突的情况下，哈希表的增删改查都为O(1)
解决哈希冲突的方法：1、开放地址法；2、再哈希法；3、公共溢出区；4、链地址法（HashMap所采用的）

开放地址法：发生冲突时，向后查找一个空位插入

再哈希法：采用另外一个散列函数

二、JDK1.7

1、原理

HashMap底层为数组，加链表用于解决哈希冲突，并且链表的插入用的是头插法

后插入的值被查询的概率更高，效率更高
头插法扩容时链表顺序倒置，可能导致链表成环问题

2、初始化

/**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }

以上为HashMap的初始化，有几个参数需要注意

capacity，容量，即数组长度，默认为16，计算index时与15进行异或计算，刚好保留hashcode后四位

初始化时，Map的容量必须都为2^n(a power of 2)，为的是使得哈希计算求index的值尽可能不同，减少哈希冲突，哈希更均匀
capacity <<= 1是位运算，即二进制数值左移一位，循环递增，确保初始化后的容量为2^n

loadFactor，装载因子(默认为0.75)

threhold，阈值(capacity * loadFactor) 存放数值(size)达到阈值时进行扩容
0.75确保了不会存的值太少，空间利用率低，存的值太多，效率低

table，数组的长度 new Entry[]为键值对

3、插入元素

    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
      /**
     * Returns index for hash code h.
     */
     static int indexFor(int h, int length) {
        return h & (length-1); //异或 比 取模更快                                                                                                                    
    }
      /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);  //key为空，存在数组的第一位
        int hash = hash(key.hashCode());  // hashcode()后得到二进制数需要右移，保证高位参与运算，减少哈希计算的冲突
        int i = indexFor(hash, table.length); //指定到数组对应的索引，采用异或运算，速度更快
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {  //数组对应索引上不为空时，进行遍历
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue; //替换新的value，访问oldValue
            }
        }

        modCount++;  
        addEntry(hash, key, value, i); //索引为空
        return null;
    }

4、获取元素

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) //equals()方法放在最后可以提高效率
                return e.value;
        }
        return null;

删除元素时，不能用for循环，要用迭代器进行元素删除

5、扩容

/**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length); //全部元素个数超过阈值时，而且数组对应索引上必须有值，数组才会扩容两倍  
    }

扩容时，原数组的全部数据，进行重新计算（数组长度改变）插入到新的数组(最后插入的元素会转化为最先插入的)，比较消耗性能

向HashMap添加1000个元素是怎么扩容的？1000，初始化1024大小的数组，达到阈值1024*0.75=768时进行扩容
所以在新建HashMap时最好自定义初始化数组的长度，减少扩容消耗性能

三、JDK1.8

1、原理

JDK1.8后的HashMap底层采用数组+部分链表+部分红黑树的组合，并采用尾插法

链表遍历的时间复杂度为O(n)，红黑树为O(log n)，提升了效率
当一个索引上要存储的元素个数超过8个，并且数组的长度大于64时，链表就会树化成红黑树

2、插入元素

 /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) //  TREEIFY_THRESHOLD=8
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

3、扩容

JDK8，HashMap的扩容大小与之前的一致，主要区别在于扩容后红黑树如何插入到新的数组

（源码过于复杂就不放上来了）红黑树里的元素迁移，不需要像1.7那样进行重新计算
因为数组长度整加了一倍，可以直接分为两组数据，一组保留原位置，另一组分到新增加的数组

迁移后的红黑树，如果同一数组上的元素小于6个，就会自动去树化成链表

四、jdk7和8的区别

JDK8链表会变成红黑树，加快查询
新节点插入的链表的方式不同，JDK7是头插法，JDK8是尾插法因为需要遍历链表变成红黑树
JDK8的hash算法进行了简化
resize的逻辑修改，JDK7可能会出现死锁
JDK7键值对Entry，在构造方法时创建；JDK8称为Node，在put第一个元素时创建

五、equals和hashcode

equals继承Object，对引用对象的比较是通过对象的内存地址。
hashcode用于存入数组索引的计算，get和put的时候都会调用equals进行判断
所以想要保证，equals相等hashcode也相等，对象不同的时候hashcode也要不同

六、安全的HashMap

Collections.synchronizedMap
- 通过构造器传入的mutex参数作为互斥锁，底层代码是通过synchronized同步代码块实现的
Hashtable
- get put方法都会加synchronized锁
- 初始容量为11，扩容为翻倍+1
- 不能存放null值，因为没有使用集合类的fail-fast安全机制(modcount标志来判断遍历的过程中是否遭到了修改，所以很多的集合类都不支持多线程)
ConcurrentHashMap
- JDK7
  - 由segment数组和HashEntry组成，hashEntry同样为数组+链表
  - 其中HashEntry使用了Volatile修饰数据
  - segment继承了reentrantLock，即每个线程访问一个segment，只锁定该segment，不会影响到其他，所以并发度高。理论上，并发度就是segment数组的容量。
  - put方法：找到对应的segment，尝试加锁，存在竞争就scanAndLockForPut()自旋获取锁，达到一定次的会改为互斥锁。
  - get方法：因为HashEntry的value是使用了volatile修饰的，保证了每次读到的值都是最新，所以不用加锁，而且效率高
- JDK8
  - 同HashMap，也会有红黑树。Node键值对用volatile修饰，保证了可见性
  - 放弃了segment分段锁，采用CAS+synchronized实现并发
  - put方法：
    - 计算hashcode，判断是否为空，需要初始化
    - 定位是否为null，null使用CAS写入
    - 不null存在元素了，判断是否扩容，不扩容再使用synchronize关键字加锁写入
  - get方法，同JDK7
- 为什么更改为CAS+synchronized
  - 使用reentrantLock需要继续AQS类，增加了内存开销，synchronized属于JDK级别，性能会随着升级
  - 扩容的时候并不会对segment数组进行扩容，扩容的是HashEntry数组，所以随着元素越来越多，锁的粒度是变大的

六、常见面试题

HashMap的底层数据结构
HashMap的存取原理
- 继承了Map接口是以键值对的形式保存数据，保存数据时，是将key的hashcode进行哈希计算得到的数，作为放在数组上的索引，如果计算之后的索引相同，就在那个节点后面加一个链表或者红黑树；取数据的时候，就只要进行一次哈希计算就可以确定索引的位置，遍历该索引上的节点就可以找打。
为啥会出现线程不安全
- JDK7，扩容时会出现环形链表的情况，因为扩容转移时链表的顺序会调换
- JDK8，多线程会出现数据覆盖的情况
有什么线程安全的类替换
- currentHashMap、hashTable因为性能低，只是简单地在方法上加synchronized锁
默认初始化大小是多少？为啥是这么多？为啥大小都是2的幂？
- 16，为了让哈希计算结果的分布更均匀。求索引的时候是用hashcode和数组的长度-1进行异或运算，15的二进制刚好为1111，只要hashcode的分布是均匀的，异或运算之后的数值也是均匀的
HashMap的扩容方式？负载因子是多少？为什是这么多？
- 1.7扩容需要存储的个数大于阈值且存放新的值时刚好发生了哈希冲突，这时才会触发扩容机制。扩容后长度为原来的两倍，遍历原来的entry数组，将节点重新hash后复制到新的数组
- 0.75，确保存的值不会太少，空间利用率低；也不会太多，导致遍历效率低
hash的计算规则
- 将hashcode右移16位和原值进行异或运算，保证高16位和低16位参与计算，使返回的值足够均匀，再和数组的长度-1进行异或计算，得到索引（异或计算：相同为1不同为0）
为什么长度为2^n
- 是为了让哈希计算后索引的分布更均匀，减少哈希冲突。我记得源码里面的索引的计算是key的hashcode和数组长度-1进行异或运算，如果长度是2^n-1,二进制就全是11111，比如16就是四个一、和hascode进行进行异或运算，只要hashcode是均匀的，计算出来的索引也会是均匀的
为什么长度超过8就会自动转为红黑树
- 是根据泊松分布，负载因子为0.75时，单个hash槽内出现8个元素的概率已经很小了，就可以减少链表转换为红黑树这种比较耗时的操作。

总结

HashMap是一种利用key的hashcode来进行存储的复杂数据结构

posted @ 2020-05-25 17:43 gg12138 阅读(257) 评论(0) 编辑收藏举报

指间灵动，快码加编

刷新页面返回顶部

Gg Code

https://github.com/GGGuang12138

原理一、Java中的HashMap的实现

一、先讲讲哈希表

二、JDK1.7

1、原理

2、初始化

3、插入元素

4、获取元素

5、扩容

三、JDK1.8

1、原理

2、插入元素

3、扩容

四、jdk7和8的区别

五、equals和hashcode

六、安全的HashMap

六、常见面试题

总结

公告