Map---HashMap---jdk8
总结
为什么取模运算只需要看对应二进制的后几位?
比如,a % b 正常取模,只需要看 a - b*n,只需要看 <b的值;
换算成 二进制, b 是 2的几次方,
HashMap的key有什么要求?
需要实现 equals和hashcode 方法;
Hash函数是怎么实现的?
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
key的Hash值 与 Hash值无符号右移16位 进行 异或运算;
(低位 与 高位 进行异或运算)
为什么要将高位与低位进行计算?
降低哈希冲突的概率;
Hash函数的值 与 数组索引是怎么转换的?
n = tab.length tab[i = (n - 1) & hash]
(数组的length - 1 ) 与 Hash值 做 与运算;
为什么要用&?
1、位运算速度更快
2、一般散列使用 n%size,对于n=2的n次幂,n-1就是一个全是1的二进制数,这时 (n-1) & hash 等效于 hash % n;
Hashmap构造函数的initialCapacity是最终使用的么?
/** * Returns a power of two size for the given target capacity. */ static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }
不是,会对参数进行tableSizeFor运算,最终提供一个2的n次幂的size;
默认容量为啥是16?
DEFAULT_INITIAL_CAPACITY
1、为啥是2的幂次方? 在计算 索引位置时,可以使用 位运算,提高性能;
2、为啥是16? 太小,哈希冲突频繁、扩容频繁;
加载因子
loadFactor
扩容开销:
每次扩容 都会创建一个2倍的原数组,都会导致所有元素重新计算哈希码并重新分配到新的桶中,这是一个相对昂贵的操作。
如果负载因子过高,冲突越多,链表越长/红黑树高度越大,频繁的扩容会增加额外的开销;如果负载因子过低,扩容过于频繁也会增加开销。
加载因子(load factor)默认设置为0.75。
加载因子是决定何时扩容的阈值,具体来说,当 HashMap 中的元素数量 > 数组容量*加载因子时,HashMap 会进行扩容。
为啥红黑树的阈值设置为8
TREEIFY_THRESHOLD
性能考虑
链表 vs. 红黑树:在哈希冲突较多的情况下,链表的查找时间复杂度为O(n),而红黑树的查找时间复杂度为O(log n)。当链表长度较长时,红黑树的性能优势明显。
阈值选择:选择8作为阈值,是因为在这个长度下,链表的性能开始显著下降,而红黑树的性能优势开始显现。
内存开销
红黑树的额外开销:红黑树的每个节点需要额外的指针(父节点、左子、右子、颜色)来维护树的结构,这会增加内存开销。如果链表长度较短,转换为红黑树的开销可能大于其带来的性能提升。
平衡点:8是一个相对合理的平衡点,既能在链表长度较长时提高性能,又不至于因频繁转换而增加过多的内存开销。
泊松分布的统计结果;
为啥最小红黑树容量阈值是64
当链表长度超过8时,考虑将其转换为红黑树。
只有当HashMap的容量大于或等于64时,才会进行这种转换。这样可以确保在容量较小的情况下,链表仍然保持为链表,从而节省内存。
链表转红黑树逻辑
java.util.HashMap#treeifyBin final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null; do { TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); } }
final void treeify(Node<K,V>[] tab) { TreeNode<K,V> root = null; for (TreeNode<K,V> x = this, next; x != null; x = next) { next = (TreeNode<K,V>)x.next; x.left = x.right = null; if (root == null) { x.parent = null; x.red = false; root = x; } else { K k = x.key; int h = x.hash; Class<?> kc = null; for (TreeNode<K,V> p = root;;) { int dir, ph; K pk = p.key; if ((ph = p.hash) > h) dir = -1; else if (ph < h) dir = 1; else if ((kc == null && (kc = comparableClassFor(k)) == null) || (dir = compareComparables(kc, k, pk)) == 0) dir = tieBreakOrder(k, pk); TreeNode<K,V> xp = p; if ((p = (dir <= 0) ? p.left : p.right) == null) { x.parent = xp; if (dir <= 0) xp.left = x; else xp.right = x; root = balanceInsertion(root, x); break; } } } } moveRootToFront(tab, root); }
1、先判断当前数组是否是null || 数组的长度是否<最小树化阈值64 ,先扩容
2、如果当前索引位置不为空,将Node转换为TreeNode,遍历链表生成双向链表;
3、遍历链表生成红黑树;
红黑树结点结构
TreeNode<K,V> parent; TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; TreeNode<K,V> next; boolean red;
扩容逻辑
1、如果old数组容量>0 ,new = old << 1,扩容为之前的2倍,创建一个新的2倍大小的数组;
2、扩容逻辑:
遍历old数组,如果当前索引位置不为null;
如果索引位置的next == null,hash & (new - 1) 位置放置新结点;
如果索引位置的Next!=null && 非TreeNode,遍历链表,
hash & oldCap,如果==0,放到low链表,如果非0,放到high链表;
将low链表放到 原数组位置,high链表放到 原数组+oldCap位置;
如果索引位置的next!=null && TreeNode,遍历红黑树,
hash & oldCap,如果==0,放到low链表,如果非0,放到high链表;
如果low链表<最小树化阈值6,要转为链表,然后放到索引位置; 如果high链表<6,要转为链表,放到 索引+oldCap 位置;
final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
概述
Hash table based implementation of the Map interface.
This implementation provides all of the optional map operations, and permits <tt>null</tt> values and the <tt>null</tt> key.(The <tt>HashMap</tt> class is roughly equivalent to <tt>Hashtable</tt>, except that it is unsynchronized and permits nulls.)
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
基于Map的hash表实现;
hashmap支持Map的所有操作,允许k=null, v=null;
hashmap不保证排序;
This implementation provides constant-time performance for the basic operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function disperses the elements properly among the buckets.
Iteration over collection views requires time proportional to the "capacity" of the <tt>HashMap</tt> instance (the number of buckets) plus its size (the number of key-value mappings).
Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
假设hashmap的hash将元素分散到buckets中,hashmap的get,put 提供O(1)复杂度;
hashmap迭代 和它的capacity和size有关系;
如果需要迭代的时候,不要把capacity设置的太高;
An instance of <tt>HashMap</tt> has two parameters that affect its performance: <i>initial capacity</i> and <i>load factor</i>.
The <i>capacity</i> is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
The <i>load factor</i> is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is <i>rehashed</i> (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
hashmap有2个参数影响它的性能:initial capacity、load factor
capacity:hash表的buckets数量,initial capacity是hash表创建时确定的;
load factor:衡量hash表在容量增加之前允许的负载;
当hash表的 entry数量 超过 load factor*capacity,hash表将会扩容;
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs.
Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the <tt>HashMap</tt> class, including <tt>get</tt> and <tt>put</tt>).
作为一般规则,默认的load factor=0.75 在时间和空间提供了良好的权衡;
loadfactor较高,会减少空间开销,但是增加查找时间;
The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations.
If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
在 initial capacity 时,要考虑Map的entry数量 & loadfactor,以便减少rehash的次数;
如果 initial capacity远大于 entry的最大量/loadfactor,将不会发生rehash;
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table.
如果hashmap要存储大量的映射,创建时,要建一个大的容量以便存储,而不是让它自己rehash;
Note that using many keys with the same {@code hashCode()} is a sure way to slow down performance of any hash table.
To ameliorate impact, when keys are {@link Comparable}, this class may use comparison order among keys to help break ties.
注意:使用相同的hashcode的key 会降低hash表的性能;
为了改善影响,当key是Comparable时,会使用key的comparison order打破hashcode的影响;
Note that this implementation is not synchronized.
multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it <i>must</i> be synchronized externally.
(A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
This is typically accomplished by synchronizing on some object that naturally encapsulates the map.
hashmap是非线程同步的;
当多个线程并发访问hashmap时,需要在外部做同步;
If no such object exists, the map should be "wrapped" using the {@link Collections#synchronizedMap Collections.synchronizedMap} method.
This is best done at creation time, to prevent accidental unsynchronized access to the map: Map m = Collections.synchronizedMap(new HashMap(...));
如果没有线程安全的hashMap对象,可以使用Collections.synchronizedMap;
The iterators returned by all of this class's "collection view methods" are <i>fail-fast</i>: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own <tt>remove</tt> method, the iterator will throw a {@link ConcurrentModificationException}.
Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
hashmap的iterator是fail-fast:如果在iterator时修改结构,将会抛出ConcurrentModificationException;
在并发修改时,会进行fail-fast;
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification.
Fail-fast iterators throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
Therefore, it would be wrong to write a program that depended on this exception for its correctness: <i>the fail-fast behavior of iterators should be used only to detect bugs.</i>
注意:在存在不同步的并发修改情况下,无法保证iterator的fail-fast;
编写iterator的remove程序是错误的;
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } } final class EntrySet extends AbstractSet<Entry<K,V>> { } abstract class HashIterator { } final class EntryIterator extends HashIterator implements Iterator<Map.Entry<K,V>> { public final Map.Entry<K,V> next() { return nextNode(); } } /** * The table, initialized on first use, and resized as necessary. 第一次使用时初始化,必要时resize * When allocated, length is always a power of two. 长度分配时,一般是2的幂次 */ transient Node<K,V>[] table; /** * The default initial capacity - MUST be a power of two. 默认初始容量(必须是2的幂次) */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified by either of the constructors with arguments. * MUST be a power of two <= 1<<30. */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f; static final int TREEIFY_THRESHOLD = 8; static final int UNTREEIFY_THRESHOLD = 6; static final int MIN_TREEIFY_CAPACITY = 64; // The next size value at which to resize (capacity * load factor). resize的下个阈值(capacity * load factor) int threshold; // The number of key-value mappings contained in this map. map中k-v数量 transient int size; /** * Holds cached entrySet(). Note that AbstractMap fields are used for keySet() and values(). */ transient Set<Map.Entry<K,V>> entrySet; }
链路
map.entrySet() + map.remove
Map<Integer, Integer> map = new HashMap<Integer, Integer>(); map.put(null, null); map.put(1, 2); map.put(2, 3); /** * 40 invokeinterface #8 <java/util/Map.entrySet> count 1 * 45 invokeinterface #9 <java/util/Set.iterator> count 1 * 50 astore_2 * 51 aload_2 * 52 invokeinterface #10 <java/util/Iterator.hasNext> count 1 * 57 ifeq 86 (+29) * 60 aload_2 * 61 invokeinterface #11 <java/util/Iterator.next> count 1 * 66 checkcast #12 <java/util/Map$Entry> * 69 astore_3 * 70 aload_1 * 71 aload_3 * 72 invokeinterface #13 <java/util/Map$Entry.getKey> count 1 * 77 invokeinterface #14 <java/util/Map.remove> count 2 */ for (Map.Entry<Integer, Integer> entry : map.entrySet()) { map.remove(entry.getKey()); } // java.util.HashMap.entrySet public Set<Map.Entry<K,V>> entrySet() { Set<Map.Entry<K,V>> es; return (es = entrySet) == null ? (entrySet = new EntrySet()) : es; } // java.util.HashMap.EntrySet.iterator public final Iterator<Map.Entry<K,V>> iterator() { return new EntryIterator(); } // java.util.HashMap.HashIterator.hasNext public final boolean hasNext() { return next != null; } // java.util.HashMap.EntryIterator.next public final Map.Entry<K,V> next() { return nextNode(); } // java.util.HashMap.HashIterator.nextNode final Node<K,V> nextNode() { Node<K,V>[] t; Node<K,V> e = next; if (modCount != expectedModCount) // remove操作后,modCount+1了,但是expectedModCount没变,导致异常 throw new ConcurrentModificationException(); if (e == null) throw new NoSuchElementException(); if ((next = (current = e).next) == null && (t = table) != null) { do {} while (index < t.length && (next = t[index++]) == null); } return e; }
get
// java.util.HashMap.get public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } // java.util.HashMap.hash // 将key的hashcode值 与 hashcode值右位移16位 做异或位运算,确定数组的索引 static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } // java.util.HashMap.getNode final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { // 数组不为空 && 指定索引位不为null if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k)))) // 数组位元素的hash值相等 && key相等 -> return数组位元素value return first; if ((e = first.next) != null) { // 数组位元素值 与 参数不相等 if (first instanceof TreeNode) // 如果 数组位元素 为TreeNode -> 从树中查找 return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) // 遍历链表查找元素 return e; } while ((e = e.next) != null); } } return null; }
map.keySet().iterator()
java.util.HashMap.KeyIterator + java.util.HashMap.HashIterator
// java.util.HashMap.keySet public Set<K> keySet() { Set<K> ks = keySet; if (ks == null) { ks = new KeySet(); keySet = ks; } return ks; } // java.util.HashMap.KeySet final class KeySet extends AbstractSet<K> { } // java.util.HashMap.KeySet.iterator final class KeySet extends AbstractSet<K> { public final Iterator<K> iterator() { return new KeyIterator(); } } // java.util.HashMap.KeyIterator final class KeyIterator extends HashIterator implements Iterator<K> { public final K next() { return nextNode().key; } } // java.util.HashMap.HashIterator abstract class HashIterator { public final boolean hasNext() { return next != null; } final HashMap.Node<K,V> nextNode() { HashMap.Node<K,V>[] t; HashMap.Node<K,V> e = next; if (modCount != expectedModCount) throw new ConcurrentModificationException(); if (e == null) throw new NoSuchElementException(); if ((next = (current = e).next) == null && (t = table) != null) { do {} while (index < t.length && (next = t[index++]) == null); } return e; } public final void remove() { HashMap.Node<K,V> p = current; if (p == null) throw new IllegalStateException(); if (modCount != expectedModCount) throw new ConcurrentModificationException(); current = null; K key = p.key; removeNode(hash(key), key, null, false, false); expectedModCount = modCount; } }
map.values().iterator();
java.util.HashMap.ValueIterator + java.util.HashMap.HashIterator
// java.util.HashMap.values public Collection<V> values() { Collection<V> vs = values; if (vs == null) { vs = new Values(); values = vs; } return vs; } // java.util.HashMap.Values final class Values extends AbstractCollection<V> { } // java.util.HashMap.Values.iterator final class Values extends AbstractCollection<V> { public final Iterator<V> iterator() { return new ValueIterator(); } } // java.util.HashMap.ValueIterator final class ValueIterator extends HashIterator implements Iterator<V> { public final V next() { return nextNode().value; } } // java.util.HashMap.HashIterator abstract class HashIterator { public final boolean hasNext() { return next != null; } final HashMap.Node<K,V> nextNode() { HashMap.Node<K,V>[] t; HashMap.Node<K,V> e = next; if (modCount != expectedModCount) throw new ConcurrentModificationException(); if (e == null) throw new NoSuchElementException(); if ((next = (current = e).next) == null && (t = table) != null) { do {} while (index < t.length && (next = t[index++]) == null); } return e; } public final void remove() { HashMap.Node<K,V> p = current; if (p == null) throw new IllegalStateException(); if (modCount != expectedModCount) throw new ConcurrentModificationException(); current = null; K key = p.key; removeNode(hash(key), key, null, false, false); expectedModCount = modCount; } }
map.entrySet().iterator();
java.util.HashMap.EntryIterator + java.util.HashMap.HashIterator
// java.util.HashMap.entrySet public Set<Map.Entry<K,V>> entrySet() { Set<Map.Entry<K,V>> es; return (es = entrySet) == null ? (entrySet = new EntrySet()) : es; } // java.util.HashMap.EntrySet final class EntrySet extends AbstractSet<Map.Entry<K,V>> { } // java.util.HashMap.EntrySet.iterator final class EntrySet extends AbstractSet<Map.Entry<K,V>> { public final Iterator<Map.Entry<K,V>> iterator() { return new EntryIterator(); } } // java.util.HashMap.EntryIterator final class EntryIterator extends HashIterator implements Iterator<Map.Entry<K,V>> { public final Map.Entry<K,V> next() { return nextNode(); } } // java.util.HashMap.HashIterator abstract class HashIterator { public final boolean hasNext() { return next != null; } final HashMap.Node<K,V> nextNode() { HashMap.Node<K,V>[] t; HashMap.Node<K,V> e = next; if (modCount != expectedModCount) throw new ConcurrentModificationException(); if (e == null) throw new NoSuchElementException(); if ((next = (current = e).next) == null && (t = table) != null) { do {} while (index < t.length && (next = t[index++]) == null); } return e; } public final void remove() { HashMap.Node<K,V> p = current; if (p == null) throw new IllegalStateException(); if (modCount != expectedModCount) throw new ConcurrentModificationException(); current = null; K key = p.key; removeNode(hash(key), key, null, false, false); expectedModCount = modCount; } }