HashMap源码初探 - suyor

公告

闲来无事，对于HashMap的内部构造一直很好奇，所以就抽了个时间对它的源码进行了阅读，这边文章算是记录下，免得自己又忘记。

HashMap是一个散列表，java程序里面的缓存机制大多都是通过这个来实现的。

首先研究他的构造子，HashMap有四个构造子，初始化大体都是为了设置其容量和加载因子。

我们一般用到的HashMap基本都是空参数的构造子，及默认容量和加载因子。

对于HashMap的容量设置是有学问的，如果通过一定的规则设置的话可以很大提高HashMap的性能的。我们通过源码来看下。

static final int DEFAULT_INITIAL_CAPACITY = 16; //这个是默认的容量

static final float DEFAULT_LOAD_FACTOR = 0.75f; //这个是默认的加载因子，是在性能上的一个平衡点。

transient Entry[] table;

HashMap本质是一个数组，加上一系列的逻辑判断，很好的装饰成了一个散列表。这个数组的修饰符transient，说明在序列化的时候不会序列化这个变量，因为数组存放的都是地址，默认的序列化方法当然就无法满足它的需求了，这里它会改写序列化的方法（writeObject和readObject），使它能够序列化数组里面的对象。

我们看下HashMap的插入一条数据的put方法的源码：

public Object put(Object key, Object value) {
        Object k = maskNull(key); //检查key是否是null，如果是null，则new 一个Object对象，这步主要是为了将null值也可以转变为对象，从而计算hash值
        int hash = hash(k); //计算hash值
        int i = indexFor(hash, table.length); //通过一种运算计算出应该存放到数组的序列，这里从参数就可以初见端倪，因为传入了数组的长度，详情看下面的(1)

        for (Entry e = table[i]; e != null; e = e.next) {
            if (e.hash == hash && eq(k, e.key)) { //如果这个entry对象的key的hash值和新添加进来的key值相同，并且新的key与entry的key通过equal方法是true，那么就认为这两个key对象相等
                Object oldValue = e.value;
                e.value = value;
                e.recordAccess(this); //这个方法跟踪进去会发现是个空实现，其实是为了给LinkedHashMap重写的，为的是对其entry对象进行排序
                return oldValue;
            }
        }

        modCount++; //这是一个记录改动的地方，为了给Iterator遍历时提醒是否元素发生了变化，从而保证Iterator能及时抛出异常，详情见（2）
        addEntry(hash, k, value, i); //添加到元素中去，见（3）
        return null;
    }

(1)所以计算hash值，最终目标是要找到数组的下标，从而定位到此数组下标对应的对象地址。

indexFor(indexFor(hash, table.length)的源码如下。

    static int indexFor(int h, int length) {
        return h & (length-1);
    }

很简洁吧。就是hash值和数组长度-1 做了个与操作。这里就要说到数组长度，即HashMap容量设置的问题了。借用下http://www.iteye.com/topic/539465这位大大的图，话说他的这篇博文很有研究价值哦。

从上图就可以看出只有容量是2的n次方，这样就会直接将hash值和容量位数相同的后几位作为数组的下标。不同的key值算出来的hash值肯定相差很大，所以对于减少碰撞的概率很有效果。

从HashMap的自动扩容代码可以看出：

resize(2 * table.length);

它也是每次扩容一倍的方法来构造的。所以如果我们一开始设置的容量不是2的n次方，那么接下来它自动扩容也就无法遵守高效率的规则了，这样对于碰撞的减免很不利，从而造成链表的增加，浪费数组空间。
（2）modCount是个绝妙的想法，他在每次改动数组元素的时候会自加1，如果没有迭代器这一说，那我们还看不出它的重要性，看一下HashMap的私有迭代器代码：

private abstract class HashIterator implements Iterator {
        Entry next;                  // next entry to return
        int expectedModCount;        // For fast-fail 
        int index;                   // current slot 
        Entry current;               // current entry

        HashIterator() {
            expectedModCount = modCount; //构造子里面赋予modCount的值给检查变量
            Entry[] t = table;
            int i = t.length;
            Entry n = null;
            if (size != 0) { // advance to first entry
                while (i > 0 && (n = t[--i]) == null) //这里我们可以看出，迭代器其实是从数组的末尾开始读取的。
                    ;
            }
            next = n;
            index = i;
        }

        public boolean hasNext() {
            return next != null;
        }

        Entry nextEntry() { 
            if (modCount != expectedModCount) //每次得到下一个变量的时候检查modCount是否和expectedModCount的值相同，不相同说明数组元素进行了变动，抛出异常。
                throw new ConcurrentModificationException();
            Entry e = next;
            if (e == null) 
                throw new NoSuchElementException();
                
            Entry n = e.next;
            Entry[] t = table;
            int i = index;
            while (n == null && i > 0)
                n = t[--i];
            index = i;
            next = n;
            return current = e;
        }

        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount) //同理
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k); //只有在迭代器方法里面自己调用HashMap的removeEntryForKey方法，才不会抛出ConcurrentModificationException异常，原因在下一行代码
            expectedModCount = modCount; //重新将modCount赋值给检查变量
        }

    }

（3）在看addEntry方法源码之前，我们先看下Entry的构造子

Entry(int h, Object k, Object v, Entry n) { 
            value = v; 
            next = n;
            key = k;
            hash = h;
        }

可以看出它是一个典型的链表结构

先看addEntry方法的源码

   void addEntry(int hash, Object key, Object value, int bucketIndex) {
        table[bucketIndex] = new Entry(hash, key, value, table[bucketIndex]); //这里隐含的有个逻辑，见下面分析
        if (size++ >= threshold) 
            resize(2 * table.length);
    }

首先判断table[bucketIndex]里面的元素是否为空，如果为空的话，那么很效率，直接存入应该存的对象。如果不为空则替换掉，将原来数组的对象存入新加入对象的next变量里面去，由此我们也可以看出，如果一直碰撞发生的话，同一个链表中，最早存入的对象，取出来是最耗时间的。

我们再来看一下HashMap的get方法源码：

public Object get(Object key) {
        Object k = maskNull(key);
        int hash = hash(k);
        int i = indexFor(hash, table.length);
        Entry e = table[i]; 
        while (true) {//这个语句块就可以看出取链表的规则了，HashCode和equals方法重写的重要性。
            if (e == null)
                return e;
            if (e.hash == hash && eq(k, e.key)) 
                return e.value;
            e = e.next;
        }
    }

这个方法返回的是一个Object对象，其实就是Entry里面的属性value值。

要得到整个Entry对象，需要用getEntry方法：

Entry getEntry(Object key) {
        Object k = maskNull(key);
        int hash = hash(k);
        int i = indexFor(hash, table.length);
        Entry e = table[i]; 
        while (e != null && !(e.hash == hash && eq(k, e.key)))
            e = e.next;
        return e;
    }

===============================================================

HashMap提供了3个集合，分别是keySet，values以及entrySet

private transient Set entrySet = null;

    transient volatile Set        keySet = null;
    transient volatile Collection values = null;

有了集合，自然就有个迭代器Iterator：KeyIterator，ValueIterator，EntryIterator：

private class ValueIterator extends HashIterator {
        public Object next() {
            return nextEntry().value;
        }
    }

    private class KeyIterator extends HashIterator {
        public Object next() {
            return nextEntry().getKey();
        }
    }

    private class EntryIterator extends HashIterator {
        public Object next() {
            return nextEntry();
        }
    }

通过源码可以看出，这三个迭代器都是继承了HashMap的HashIterator，只不过改写了next方法，分别返回不同的对象而已。

===================================================

接下来看下它的序列化方法，首先是写入操作

private void writeObject(java.io.ObjectOutputStream s)
        throws IOException
    {
 // Write out the threshold, loadfactor, and any hidden stuff
 s.defaultWriteObject(); //先调用默认的序列化方法，序列化其他变量。

 // Write out number of buckets
 s.writeInt(table.length);

 // Write out size (number of Mappings)
 s.writeInt(size);

        // Write out keys and values (alternating)
        for (Iterator i = entrySet().iterator(); i.hasNext(); ) { //这里的语句块就是将对象写入，而不是数组存的地址
            Map.Entry e = (Map.Entry) i.next();
            s.writeObject(e.getKey());
            s.writeObject(e.getValue());
        }
    }

有改写的写操作，当然就要改写读操作了，不然就没法还原啦：

private void readObject(java.io.ObjectInputStream s)
         throws IOException, ClassNotFoundException
    {
 // Read in the threshold, loadfactor, and any hidden stuff
 s.defaultReadObject();

 // Read in number of buckets and allocate the bucket array;
 int numBuckets = s.readInt();
 table = new Entry[numBuckets];

        init();  // Give subclass a chance to do its thing.

 // Read in size (number of Mappings)
 int size = s.readInt();

 // Read the keys and values, and put the mappings in the HashMap
 for (int i=0; i<size; i++) { //这个语句块就是读出对象咯
     Object key = s.readObject();
     Object value = s.readObject();
     putForCreate(key, value);
 }
    }

这里仅仅是一点自己的粗浅认识，HashMap中细节的设计精妙很多，需要慢慢的去理解了。

posted on 2011-11-12 20:14 suyor 阅读(446) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部