Java11 HashMap源码分析(一、文档翻译)
描述文档:
/**
* Hash table based implementation of the {@code Map} interface. This
* implementation provides all of the optional map operations, and permits
* {@code null} values and the {@code null} key. (The {@code HashMap}
* class is roughly equivalent to {@code Hashtable}, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
*/
第一段表明HashMap是基于Map接口实现的哈希表,提供了所有可选的Map操作,并且允许空Key和Value。HashMap和Hashtable大体上相同,但是HashMap不是线程安全的并且允许空值。
这个类也不保证Map的顺序,特别是不能保证顺序的持久化,也就是说顺序可能随着时间变化(例如rehash操作)。
/**This implementation provides constant-time performance for the basic
* operations ({@code get} and {@code put}), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* {@code HashMap} instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
*/
第二段,这个实现(HashMap)提供了基础操作put和get的固定时间表现(指每次操作时间相同),假设哈希函数将元素正确的分散到篮子(table中的位置)。
在集合视图上迭代的时间和HashMap的capacity和size有关,因此为了迭代的性能,不要将初始capacity设置的过高。
/**<p>An instance of {@code HashMap} has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
*/
第三段,有两个影响HashMap性能的因素:初始容积(initial capacity, hashtable中位置的数量)和装填因子(load factor,决定HashMap的位置在扩容之前能有多少比例被使用)。
每当元素的数量大于装填因子和容积的积,就会触发扩容和重哈希,新的HashMap的容积大约是原来的两倍。
/**<p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the {@code HashMap} class, including
* {@code get} and {@code put}). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
*/
第四段,作为一般的规则,默认的装填因子提供了一个在时间和空间花费上好的折中。装填因子增加,空间浪费减少但是查找操作时间增加,减少则相反。
在设置初始容积时,元素的数量和装填因子需要被考虑。当初始容积比除以装填因子的元素最大数量还大时,就永远不会发生重哈希。
/**<p>If many mappings are to be stored in a {@code HashMap}
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
*/
第五段,当有许多映射储存在Hashmap中时,创建时指定一个足够大的容积比等待它自动扩容要更有效率。
使用许多有着相同hashcode的key会明显的降低任何hashtable的性能。为了改善这个情况,当key是可比较的(实现了Comparable接口),
可以在key中使用比较顺序来帮助打破ties(可能是困境的意思?)
/**<p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
*/
Map m = Collections.synchronizedMap(new HashMap(...));</pre>
第六段,注意这个实现(HashMap)是线程不安全的。如果多个线程同时访问一个hashmap,并且至少一个线程结构性的(structurally,删除或增加)修改这个map,
就必须在外部被同步。同步通常是在通过封装map的某个对象上来实现的。(?)
如果没有这样的对象,hashmap应该使用Collections.synchrinizedMap方法包裹起来,最好在创建的时候,来防止意外的非同步访问。并给出了示例。
/**<p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* {@code remove} method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
*/
第七段,这个类所有的集合视图方法返回的迭代器都是fail-fast的--如果map在迭代器被创建后的任何时间被结构性的修改(除了这个迭代器自己通过remove修改),迭代器就会抛出concurrentModificationException异常。因此,当有并发操作时,迭代器快速崩溃,而不是冒着出错的风险继续工作。
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw {@code ConcurrentModificationException} on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
第八段,接着上一段讲,注意迭代器的fail-fast行为不能被保证正常工作,一般来说,在非同步并发修改时做出任何硬性保证是不可能的。
fail-fast迭代器抛出ConcurrentMdificationException异常已经尽了最大的努力,因此不能依赖这个机制的正确性来写程序--迭代器的fast-fail行为应该只在检测bug时被使用。