HashMap 源码分析

简单数据结构

HashMap

类注释

  • 线程不安全,出现并发问题会 fail-fast,并发需要用 Collections.synchronizedMap 包裹起来
  • 顺序不按照插入顺序
  • hashcode 分散均匀很重要
  • hashcode 分散不均匀,Compareble 来补充
  • capacity、load factor 参数很重要,一个是容量,根据实际容量设置最好,一个是实际容量的比例,.75 已经是最优了尽量不要修改了
/**
 * Hash table based implementation of the {@code Map} interface.  This
 * implementation provides all of the optional map operations, and permits
 * {@code null} values and the {@code null} key.  (The {@code HashMap}
 * class is roughly equivalent to {@code Hashtable}, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.
 *
 * Hash 表实现了 Map 接口。它提供了所有 map 的操作,并允许 key 和 value 为空。
 * 类 HashMap 和 类 Hashtable 大致相同,除了 HashMap 是非同步的、允许空值。
 * 该类不保证 map 的顺序;特别指出,在变化的时候,它更不保证顺序一致。
 *
 * <p>This implementation provides constant-time performance for the basic
 * operations ({@code get} and {@code put}), assuming the hash function
 * disperses the elements properly among the buckets.  Iteration over
 * collection views requires time proportional to the "capacity" of the
 * {@code HashMap} instance (the number of buckets) plus its size (the number
 * of key-value mappings).  Thus, it's very important not to set the initial
 * capacity too high (or the load factor too low) if iteration performance is
 * important.
 *
 * 在基本操作 get 和 put 上,在 hash 函数在 bucket(桶)中均匀分散元素的前提下,该实现的时间复杂度是常数级的。
 * 遍历该集合的 Iteration(迭代器)的时间与 Hashmap 的 capacity(容量)和 size(key-value 映射的大小)有关。 
 * 因此,在 Iteration 的性能是很重要的情况,不要设置 capacity 过大(或者 load factor(装在因子)过低)是非常重要的。
 *
 * <p>An instance of {@code HashMap} has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the
 * number of buckets.
 * 
 * 有两个很重要的参数会影响一个 HashMap 的实例的性能:initial capacity(初始大小)和 load factor(装填因子)。
 * capacity 含义是 hash 表创建时候的容量。load factor 则是一个测度,该测度表明了 hash 表能实际存储最大的容量,当实际存储容量到最大时会进行自动扩容。
 * 在 hash 表容量达到 load factor * capacity 时,hash 表会 rehashed(意味着内部数据结构的重建),最终 hash 表 buckets 大约会变成原来的两倍。
 *
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the {@code HashMap} class, including
 * {@code get} and {@code put}).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
 *
 * 经验而谈,在时间和空间消耗的平衡上,默认的 load factor(.75)是一个较好的折中方案。
 * 更高的值降低了空间占用,但是增加了查询消耗(影响了大多数操作,包括 get 和 put)。
 * 在设置 initial capacity 的时候,应该认真考虑 map 中元素大小的预期值和 load factor,目的是最小化 rehash 操作的次数。
 * 如果 initial capacity 大于最大条目数量 / load factor,rehash 操作不会出现。
 *
 * <p>If many mappings are to be stored in a {@code HashMap}
 * instance, creating it with a sufficiently large capacity will allow
 * the mappings to be stored more efficiently than letting it perform
 * automatic rehashing as needed to grow the table.  Note that using
 * many keys with the same {@code hashCode()} is a sure way to slow
 * down performance of any hash table. To ameliorate impact, when keys
 * are {@link Comparable}, this class may use comparison order among
 * keys to help break ties.
 *
 * 在很多 mapping 被存储在 HashMap 实例的情况下,如果表的数量会增长,与自动 rehash 相比较,用足够大的 capacity 初始化 hash 表,mapping 存储效率更高。
 * 注意,如果有很多对象 hashcode() 相同会使得性能急剧降低。为了降低这个影响,如果 key 实现了 Comparable,这个类可能会在 keys 之间使用 comparison order 来打破影响。
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 * 
 * 注意,该实现是不同步的。如果很多线程并发访问 hash map,并且至少有一个线程修改了 map 的结构,必须要进行额外的同步。
 *  (一个结构性的改变是增加或删除一个或更多的 mapping;仅仅改变和某个已经存在 key 关联的 value,不是一个结构性的改变)
 * 上面的同步问题通常可以用封装同步对象来解决。
 * 
 * If no such object exists, the map should be "wrapped" using the
 * {@link Collections#synchronizedMap Collections.synchronizedMap}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the map:<pre>
 *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>
 *
 * 如果没有这样对象存在,map 应该使用 Collections#synchronizedMap Collections.synchronizedMap 方法进行“包裹”。
 * 在创建对象的时候最好就去做,防止一切意外的、不同步对该 map 的访问。Map m = Collections.synchronizedMap(new HashMap(...));
 *  
 * <p>The iterators returned by all of this class's "collection view methods"
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * {@code remove} method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.
 * 
 * 该类返回有关集合视图方法的 Iteration 都会 fail-fast(快速失败)的:
 * 如果 map 的结构在 iterator 创建后,除了 iterator 的 remove 方法修改以外任何修改,iterator 将抛出一个 ConcurrentModificationException。
 * 因此,面对并发修改,iterator 快速且彻底的失败,而不是冒着武断、不确定性的风险在一个不确定的未来失败。
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw {@code ConcurrentModificationException} on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 *
 * 注意,iterator fail-fast 行为不能保证某些情况,一般来说,它不可能肯定保证非同步并发修改一定会 fail-fast。
 * fail-fast iterators 尽力抛出 ConcurrentModificationException。
 * 因此,依赖于该异常写程序是错误的:在 iterators 中 fail-fast 行为只应该检测 bugs。
 * 
 * <p>This class is a member of the
 * <a href="{@docRoot}/java.base/java/util/package-summary.html#CollectionsFramework">
 * Java Collections Framework</a>.
 *
 * 该类是集合框架中的一员。
 *
 * @param <K> the type of keys maintained by this map 
 * key 的类型
 * @param <V> the type of mapped values 
 * values的类型
 *
 * @author  Doug Lea
 * @author  Josh Bloch
 * @author  Arthur van Hoff
 * @author  Neal Gafter
 * @see     Object#hashCode()
 * @see     Collection
 * @see     Map
 * @see     TreeMap
 * @see     Hashtable
 * @since   1.2
 */

接口、继承分析

public class HashMap<K,V> extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable

实现注释

  • 比较细节的注释
  • bin 的结构会在 Tree、LinkedList 切换,在 hashcode 散列不均匀、大量出现冲突的时候,会变成 Tree;否则就是 LinkedList
  • Tree 比较占用空间,速度快,LinkedList 占用少,速度慢
  • 根据泊松分布,一个 bin 出现 8 个元素的概率很低很低
  /*
     * Implementation notes.
     *
     * 实现注释
     *
     * This map usually acts as a binned (bucketed) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins, but
     * relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated. However, since the vast majority of bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.
     *
     * 这个 map 通常是一个 以 binned(桶)(bucketed)为基础的 hash 表,然而当 bin 容量过大时,它会转变成以 TreeNode 为结构的桶,该结构类似于 java.util.TreeMap 中相关的结构。
     * 大多数方法尝试使用简单的 bins,但当需要使用的时候,会转变成 TreeNode(仅仅检查是不是 node 的实例)。
     * TreeNodes 组成的 bins 也许和其他方式一样被遍历和使用,但当元素密集时支持额外快读查找。
     * 然而,大多数 bins 在通常情况下不会过于密集,在 table 方法的过程中,检查以 tree 为结构的 bins 会有些延迟。
     *
     * Tree bins (i.e., bins whose elements are all TreeNodes) are
     * ordered primarily by hashCode, but in the case of ties, if two
     * elements are of the same "class C implements Comparable<C>",
     * type then their compareTo method is used for ordering. (We
     * conservatively check generic types via reflection to validate
     * this -- see method comparableClassFor).  The added complexity
     * of tree bins is worthwhile in providing worst-case O(log n)
     * operations when keys either have distinct hashes or are
     * orderable, Thus, performance degrades gracefully under
     * accidental or malicious usages in which hashCode() methods
     * return values that are poorly distributed, as well as those in
     * which many keys share a hashCode, so long as they are also
     * Comparable. (If neither of these apply, we may waste about a
     * factor of two in time and space compared to taking no
     * precautions. But the only known cases stem from poor user
     * programming practices that are already so slow that this makes
     * little difference.)
     * 
     * 以 Tree 为结构的 bins(如,bins 的元素全部是 TreeNodes)主要用 hashCode 排序,不过有重复元素的情况下,如果两个元素是都实现了 "class C implements Comparable<C>",它们的 compareTo 
     * 方法会用作排序。(我们会谨慎通过反射去验证检查泛型——详见方法 comparableClassFor)。
     * 这额外的 tree bins 的复杂度是值得的,当 keys 拥有独特的 hashes 或者是有序的,它提供了最坏时间复杂度为 O(log n) 的操作。
     * 因此,一些意外或者恶意用法,如 hashCode() 方法返回值分散的不均匀的情况,以及很多 keys 共享一个 hashCode,性能降低是很平缓的,只要它们实现了 Comparable。
     * (如果上面的条件没有一个成立,相对于不采取任何预防措施,我们也许会浪费 1/2 的时间和空间。然而,由糟糕的用户编程技巧导致的情况已经很慢了以致于基本没有区别了)。
     * 
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     * 
     * 由于 TreeNodes 相对于普通 nodes 大小是两倍,我们使用它仅仅在 bins 包含了足够多的 nodes 情况下(参见 TREEIFY_THRESHOLD)。
     * 并且,当它变的足够小的时候(由于 removal 或 resizing )它们会转换成简单的、直接的 bins。
     * 在用户设置分布均匀的 hashCodes 的用法下,nodes 在 bins 的分布遵从 Poisson 分布,参数约为 0.5,默认阈值为 0.75,尽管由于 resizing 尺度不同,会有一个很大的方差。
     * (具体参见别人的,数学我讲不清楚,https://blog.csdn.net/reliveIT/article/details/82960063)
     * 忽略方差,出现的 k 个重复元素的预期概率 =  (exp(-0.5) * pow(0.5, k) / factorial(k)。值为:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * 更多 : 小于千万分之一 
     *
     * The root of a tree bin is normally its first node.  However,
     * sometimes (currently only upon Iterator.remove), the root might
     * be elsewhere, but can be recovered following parent links
     * (method TreeNode.root()).
     *
     * tree bin 的 root 一般都是第一个 node。
     * 然而,有些情况(目前仅有 Iterator.remove),root 也许是其他的,但是根据 parent 链接可以被恢复(方法 TreeNode.root())。
     *
     * All applicable internal methods accept a hash code as an
     * argument (as normally supplied from a public method), allowing
     * them to call each other without recomputing user hashCodes.
     * Most internal methods also accept a "tab" argument, that is
     * normally the current table, but may be a new or old one when
     * resizing or converting.
     *
     * 所有应用 internal 方法接受一个 hash code 做为一个参数(这通常是由一个 public 方法提供),允许他们相互调用不通过重新计算 hashCodes。
     * 大多数内部方法也接受一个 “tab” 参数,通常表示的是当前的表,但也许是一个新或老的表在重新计算大小或者转换的时候。
     *
     * When bin lists are treeified, split, or untreeified, we keep
     * them in the same relative access/traversal order (i.e., field
     * Node.next) to better preserve locality, and to slightly
     * simplify handling of splits and traversals that invoke
     * iterator.remove. When using comparators on insertion, to keep a
     * total ordering (or as close as is required here) across
     * rebalancings, we compare classes and identityHashCodes as
     * tie-breakers.
     * 
     * 当 bin 列表被转换成树,分割,或者从树中拆开,我们保存他们和访问、遍历的次序一致(如,变量 Node.next),目的是更好的保留位置,此外还有轻微的简化拆分、遍历的处理,以 iterator.remove 调  
     * 用。
     * 当使用 comparators 去插入时,在重平衡之间为了保证总的次序(或者和要求的一样),我们比较类和独特的 HashCodes 打破重复元素。
     *
     * The use and transitions among plain vs tree modes is
     * complicated by the existence of subclass LinkedHashMap. See
     * below for hook methods defined to be invoked upon insertion,
     * removal and access that allow LinkedHashMap internals to
     * otherwise remain independent of these mechanics. (This also
     * requires that a map instance be passed to some utility methods
     * that may create new nodes.)
     *
     * 在直接和 tree nodes 的使用和过渡是复杂的,通过存在的子类 LinkedHashMap。
     * 见下文的钩子方法,定义了如何被调用的在插入、移除、访问的时候,允许 LinkedHashMap 做一些内部操作,否则保留这些机制的独立。(这也要求一个 map 实例需要传递在一些工具方法中,这些方法 
     * 也许会创建新的 nodes)
     *
     * The concurrent-programming-like SSA-based coding style helps
     * avoid aliasing errors amid all of the twisty pointer operations.
     * 
     * concurrent-programming-like(并发编程的) SSA-based(单静态赋值)编码风格帮助在所有的曲折的指针操作中避免别名错误。
     */

字段注释

  • 都是重要参数,含义见下
    private static final long serialVersionUID = 362498820763181265L;
     
    /**
     * The default initial capacity - MUST be a power of two.
     * 默认初始 capacity,必须是 2 ^ ?
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     * 最大 capacity,如果一个更高的值被在构造函数当作参数传入则会被使用,必须是 2 ^ ?
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     * load factor 如果没有在构造函数被指定,默认值
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * 用 tree 而不是 list 的 bin 数量阈值。Bins 被转换成树,当在一个 bin 内增加元素超过这个值的时候。
     * 这个值必须大于 2 ,并且应该至少大于 8 ,在收缩时与树的移除转变为简单的 bins 的紧密配合。
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     * 在 resize 操作时,移除树的 bin 数量阈值。应该比 TREEIFY_THRESHOLD 小,并且至多是 6,在移除时与收缩检测配合。
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * 在 bins 可能被树化的最小值 capacity。(反之,如果有很多的 nodes 在同一个 bin 中表会被 resized )
     * 应该至少是 4 * TREEIFY_THRESHOLD 避免在 resizing 和树化阈值之间冲突。
     * 译者注:冲突的意思是,如果 capacity 本来就不大,这个时候应该尽量去扩容,说不定 hashcode 效果就好了,因为空间大了计算 hashCode 范围就广了,这时候这个还有一种策略就是树化,显然两个操 
     * 作在容量达到最大时候最好只执行一个,所以叫冲突
     * (可以看看 https://blog.csdn.net/fan2012huan/article/details/51088211,https://stackoverflow.com/questions/43911369/hashmap-java-8-implementation)
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     * 表,第一次用时会初始化,如果需要的话会 resized。当分配,length 一定是 2 ^ ?。
     * (我们也容忍长度为 0 ,在某些操作下,允许一些目前不需要的启动机制。)
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     * 保存了缓存的 entrySet(),注意,AbstractMap 的成员变量被 keySet() 和 values() 使用。
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     * 在 map 中 key-value 的大小。
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     * HashMap 结构被改变的次数。结构改变是指在 HashMap 中改变了 mappings 的数量活着修改了内部结构(如 rehash)。
     * 这个成员变量用做 iterators 在集合试图的 fail-fast。(参见  ConcurrentModificationException)。
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     * 下一个 resize 的值(capacity * load factor)。
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     * 用在 hash 表中的 load factor。
     *
     * @serial
     */
    final float loadFactor;

常用方法分析

public V put(K key, V value)

  /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     * 
     * 在 map 中关联 key 和 value。如果 map 已经包含了一个关于某个 key 的 mapping,老值被替代。
     *
     * @param key key with which the specified value is to be associated 被 value 关联的 key
     * @param value value to be associated with the specified key 和 key 关联的 value
     * @return the previous value associated with {@code key}, or 
     *         {@code null} if there was no mapping for {@code key}.
     *         (A {@code null} return can also indicate that the map
     *         previously associated {@code null} with {@code key}.)
     *         之前和 key 关联的值,null 如果没有 mapping,也可能是原来的 value 是 null
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

看一下 hash。

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     * 
     * 计算 key.hashCode() 并且传播 (XORs)高位 hash 到低位。因为表使用了 2 ^ ? 掩码,在当前掩码下,只分布在某几个 bits 的 hashes 的集合总是会冲突。
     *(典型的情况是Float keys 的集合,在一个很小的 tables 中,这个集合保存了连续的数字)。所以我们应用了一个转变,该转变传播了高位影响到低位。
     * bit 传播有一个在速度,实用性,质量的平衡。
     * 因为很多 hashes 的集合已经合理的均匀分布了(所以我们不能从传播获得好处),并且我们使用 trees 去应对大集合在 bins 上的冲突,所以我们仅仅是 XOR(异或)一些偏移 bits,这是一个开销最小的 
     * 方式去减少语义上的?(译者注:翻译软件都没有这个词),并且包括了高位的影响,要不然的话由于 table 的边界在 index 计算中永远不会用到。
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

什么意思呢?Float 的 hashCode 是将 Float 转换为 int bits。
1 = 00111111 10000000 00000000 00000000
2 = 01000000 00000000 00000000 00000000
3 = 01000000 01000000 00000000 00000000

而下面 put 的时候是这样寻找 bucket 的位置的:(n - 1) & hash,假设 n = 256,n - 1 用二进制表示: 00000000 00000000 00000000 11111111
所以如果用 Float 的 key.hashCode() 直接当作 hash 的话,会发现 1,2,3 全部都映射计算出来的 index = 0,在同一个 bucket 中, 散列不均匀,性能很低。
假设 XOR (h >>> 16),单看二进制结果,
1 的结果是(忽略前面的 24 位,因为最后 & 11111111,前面的无效的) 10000000 ^ 00000000 = 10000000,10000000 & 11111111 = 10000000,位置是 128
2 的结果是 00000000 ^ 00000000 = 00000000,00000000 & 11111111 = 00000000,位置是 0
3 的结果是 01000000 ^ 00000000 = 01000000,01000000 & 11111111 = 01000000,位置是64
显然不冲突了。

    /**
     * Implements Map.put and related methods.
     * 应用了 Map.put 方法和相关的方法。
     *
     * @param hash hash for key key 的 hash
     * @param key the key 当前 key
     * @param value the value to put 需要被存储的 value
     * @param onlyIfAbsent if true, don't change existing value 如果为 true,不会去改变之前的值
     * @param evict if false, the table is in creation mode. 如果为 false,table 在创建模式
     * @return previous value, or null if none 之前的 value,不存在为 null
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // tab 未创建,或者长度为 0(删光了),resize
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 计算 hash 的位置,检查是否存在值,不存在直接塞入,此时 p 被赋值成已经存在的结点的值(可能是 null)
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        // 存在,hash 冲突了
        else {
            Node<K,V> e; K k;
            // hash 相同,key 地址相同或者是 equals,简而言之就是 key 相同,设置 e 的值,e 是操作结点
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // hash 不同,p 是一个 TreeNode 的实例,直接插入
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
            // hash 不同,p 是一个普通的 Node(LinkedList),遍历寻找
                for (int binCount = 0; ; ++binCount) { 
                    // 遍历到结尾了都没有找到,新建 Node
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 大于 TREEIFY_THRESHOLD 需要树化,因为冲突太多,为了提升效率
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // 遍历,找到了 hash 相同的元素
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // e 存在,表明存在结点
            if (e != null) { // existing mapping for key
                // 准备返回旧值
                V oldValue = e.value;  
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value; 
                // 访问成功后处理,在 LinkedHashMap 有用
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // 修改次数 ++
        ++modCount;
        // size 增加后超过了 threshold,resize
        if (++size > threshold)
            resize();
        // 插入成功后处理,在 LinkedHashMap 有用
        afterNodeInsertion(evict);
        return null;
    }

注释基本上讲明白了,总的来说,目的就是寻找 hash 相同的结点,不存在直接插入,存在的话修改值。存在分为两种情况,因为是两种数据结构,树有自己的处理方法,链表也有自己的处理方法。
现在具体来看一看细节,比如如何 resize 的。

    /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     * 
     * 初始化或者是翻倍表大小。如果 null,根据成员变量阈值初始化 capacity。
     * 否则,因为我们适用的是 2 ^ ? 扩展,所以在每个 bin 中的元素必须要么在还在原来的 index 下,在新表中要么翻一倍。
     * (译者注:为啥是翻一倍? (n - 1) & hash,n 翻倍了,多了一个“1”,原来 00001111,变成 00011111,所以 & 多了一个位置,hash 那一位是 0,则保持不变;hash 那一位是 1,则翻倍)
     *
     * @return the table
     */
    final Node<K,V>[] resize() { 
        // 赋值操作
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        // 旧的 capacity > 0 
        if (oldCap > 0) {
            // 到了最大值,capacity 已经不能再扩大了,修改 threshold,直接返回了 
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // 字面意思,这里还有一种情况没处理,就是 oldCap <= DEFAULT_INITIAL_CAPACITY,其实这是不会出现的,初始化就不允许这样的情况出现
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        // 之前设置过 threshold
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        // 默认值,初始化进这里
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // newThr 为 0,这里应该是为了保险起见?
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        // 赋值回去
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        // 重散列
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // 如果有值的话
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // 如果只有一个 Node
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // 如果是 TreeNode
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // 如果有很多 Node
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // lo 表示还在原来位置,hi 表示坐标已经翻倍,这里就是组建原来位置和坐标翻倍位置的元素的链表
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // 放入相应位置,把尾巴 next 赋值 null
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

public V get(Object key)

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * 返回特定 key 的 value,或者 null 当 map 中不存在 key。
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * 更加严格的说,如果 map 包含一个 mapping key,但 key == null 或者  key.equals(k),返回 v;否则返回null。
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * 返回 null 不一定表明了 map 不包含 key;也有可能是 map 中 key 对应的 value 为 null;containsKey 可以区分这种情况。
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

     /**
     * Implements Map.get and related methods.
     *
     * 实现了 Map.get 和相关的方法。
     *
     * @param hash hash for key key 的 hash
     * @param key the key 当前 key
     * @return the node, or null if none 结点,或者 null 如果不存在
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // tab 非空判断,tab[index] 是否存在元素
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // 检查第一个元素
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 检查之后的元素
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    // 检查树的元素
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    // 检查链表元素
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }
posted @ 2019-02-25 23:47  Piers  阅读(431)  评论(0编辑  收藏  举报