HashMap内部实现机制及优化----第一篇

HashMap是链表（请参考LinkedList算法详解）和数组(ArrayList解读)组合的一个综合性的算法，理解本文前最好阅读本文的两篇

他的结构类似于：

数组保存时链表的头部，下面来一步一的详细解释HashMap主要源码：

对象：

   // 默认容量（数组容量）
    static final int DEFAULT_INITIAL_CAPACITY = 16;
    //最大容量（数组容量）
    static final int MAXIMUM_CAPACITY = 1 << 30;
    //加载因子（当size>=容量*加载因子大于或等于table.length，容量扩大一倍）
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    //链表数组
    transient Entry[] table;
    //元素个数
    transient int size;
    //此值=容量*加载因子
    int threshold;
    //加载因子
    final float loadFactor;

上面介绍的HashMpa的属性暂时可能有点难理解。下面介绍他们的用处：

对象初始化:

    public HashMap(int initialCapacity, float loadFactor) {
        
        //判断传入的初始化容量是否为小于0，如果是抛出异常      
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        //判断传入的初始化容量是否为大于MAX，如果是就等于MAX
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
       　//判断传入的初始化加载因子是否符合要求。
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // 初始化容量
        int capacity = 1;
 
　　　　　//如果传入的容量小于初始化容量向左移动一位(也就是乘二)
        while (capacity < initialCapacity)
            capacity <<= 1;
        //加载因子
        this.loadFactor = loadFactor;
        //加载因子（当size>=threshold，容量扩大一倍）
        threshold = (int)(capacity * loadFactor);
        //初始化table　
        table = new Entry[capacity];
        init();
    }

上面介绍的对象，大家现在可以先别管（记住有这东西），对于现在有点难理解。大家可以集合源码整体的看看效果更好。

增加：

    public V put(K key, V value) {
        //判断是否为空，如果为空就空处理。
        if (key == null)
            return putForNullKey(value);
        //取处理后的hash（下面有介绍）
        int hash = hash(key.hashCode());
　　　　　//取索引（下面有介绍）
        int i = indexFor(hash, table.length);
        //判断是否为空，如果为空就空处理。
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
　　　　　　　 //判断是否相等，如果相等直接覆盖。            
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
　　　　　//此方法下面有介绍
        addEntry(hash, key, value, i);
        return null;

想要理解上面的代码就必须理解hash的存储机制，下面来介绍下：

　　1.根据key的hash值（hashCode()）,到HashMap提供的hash方法里面加工一下(这里暂时这么理解)

    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

　　　　　　这是一段神奇的代码~表示看不懂，你们可以深入研究。参数h为对象hashcode()方法返回值。

　　2.取到加工后的hash码后然后根据indexFor()方法取到index(这里可以说是hash算法的核心，所有操作的基于它的。)

    static int indexFor(int h, int length) {
        return h & (length-1);
    }

　　　　　　h是加工后的hash码，length是当前数组的长度,这个方法取到的长度不会超过length-1,所以index不会超过数组的长度.

前面介绍的两个方法是hash算法的核心，每个不同对象都有不同的hash码(在hashmap里面相同的对象hash码一定是相等的，而且==或则equals也返回true)，而当对象通过hash算法取到的index也是一个随机的，所以存储的位置是根据hash值来决定的，在这里我们来个假设：如果没有hashmap通过hash值来取索引,你会怎样去根据k来取到value？可想而知，hash码取索引节约了很多时间。

3.取到索引后判断指定索引里面的Entry(单向链表)是否存在添加的k,也就是(e.hash == hash && ((k = e.key) == key || key.equals(k))),如果返回true就直接覆盖，返回false添加新元素，覆盖不多讲，这里讲添加。首先介绍一下hash的Entry数据结构：

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
.........省略
}

看了前面的LinedList详解,就能理解这种数据结构的基本操作(这里是个单向链表，也就是没有previous)相对而言操作简单些，前面介绍的hashmap对象里面就有一个table，他是Entry数组，也就是table里面保存了多条链表。整个数据结构类似于2纬表（所以也经常叫hash表）。

添加:

    void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }

index和hash，k，v已经从put方法传过来了，已经知道要放在哪一条链表中了(知道table的index),每次添加就是把table[index]后移动,然后把添加的元素添加到头部，这里不具体介绍，添加后然后判断当前size是否等于或大于threshold（table.length*loadfactor）,加载因子的作用在这里体现。间接性的控制链表的长度，或者是控制数组的长度。如果条件满足，数组扩大一倍,也就是调用resize(length)方法,接下来介绍resize():

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);
    }

这里的方法很简单，这里调用了transfer(newTable)：

    void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                src[j] = null;
                do {
                    Entry<K,V> next = e.next;
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }

看resize是不是还没有添加元素到新的table中来？所以这个方法是加载table。还记得indexFor(h,length)这个方法吗？这个方法是根据数组的来确定index的。现在新数组的长度扩大了两倍所以元素的index改变了。这个方法就是重新加载新table。
基本的put操作已经介绍完。

get操作：

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

这里有个null处理因为没有key.hashcode()会报NullPointException异常，所以要来个空处理，这里根据hash码就能取到index(因为他是根据hash值来存储的)。然后遍历指定的index处的Entry.next.next......的k是存在，如果存在返回value,不存在返回null.

基本操作也介绍完(remove....等一些方法自己可以打开源码看)。

下面给出我实现的MyHashMap:

package servlet;

import java.util.AbstractMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

@SuppressWarnings("unchecked")
public class MyHashMap<K, V> extends AbstractMap<K, V> implements Map<K, V>,Iterable<K> {

    static final int MAX_CAPCITY = 1 << 30; // max size

    static final int DEFAULT_INITAL_CAPCITY = 16; // default size;

    static final float DEFAULT_LOAD_FACTOR = 0.75f; // default load factor

    transient int capcity;//容量

    transient float factor;//加载因子

    transient int threshold;//桶值

    transient int size;//大小

    transient Entry[] table;//hash表

    public MyHashMap(int capcity, float factor) {
        if (capcity < 0 || capcity > MAX_CAPCITY || Float.isNaN(factor) || factor <= 0)
            throw new IllegalArgumentException();
        this.capcity = 1;
        while(this.capcity < capcity){
            this.capcity<<=1;
        }
        this.factor = factor;
        table = new Entry[this.capcity];
        threshold = (int) (capcity * factor);
    }

    public MyHashMap() {
        this(DEFAULT_INITAL_CAPCITY, DEFAULT_LOAD_FACTOR);
    }

    /*
     * return size
     * 
     * @see java.util.AbstractMap#size()
     */
    public int size() {
        return size;
    }

    public V get(Object k) {
        int hash = hash(k.hashCode());
        for (Entry<K, V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
            if (e.h == hash && e.k == k || e.k.equals(k)) {
                return e.v;
            }
        }
        return null;
    }

    public V put(K key, V value) {
        int hash = hash(key.hashCode());// 计算键hash值
        int index = indexFor(hash, table.length);// 计算出index
        // 遍历当前列是否有存在所添加的k
        for (Entry<K, V> column = table[index]; column != null; column = column.next) {
            // 如果存在覆盖
            if (column.h == hash && key == column.k || key.equals(column.k)) {
                V oldv = column.v;
                column.v = value;
                return oldv;
            }
        }
        addEntry(hash, key, value, index);// 往列中添加新数据
        return value;
    }

    /**
     * 
     * 添加新对象
     */
    public void addEntry(int hash, K k, V v, int index) {
        table[index] = new Entry<K, V>(hash, k, v, table[index]);
        if (size++ >= threshold)
            resize(table.length * 2);// 新数组容量
    }
    
    /**
     * 调整数组大小
     * @Author : JimmyYong
     */
    void resize(int newLength) {
        if (table.length == MAX_CAPCITY) {// 当达到最大容量
            threshold = Integer.MAX_VALUE;// 直接int最大值
            return;// 数组不变
        }
        Entry[] entry = new Entry[newLength];
        transfer(entry);
        table=entry;
        threshold = (int) (newLength * factor);
    }
    
    /**
     * 从组hash表
     * @Author : JimmyYong
     */
    void transfer(Entry[] entry) {
        // 从组hash表
        for (int i = 0; i < table.length; i++) {
            for (Entry<K, V> e = table[i]; e != null; e = e.next) {
                // 取到索引
                int index = indexFor(e.h, entry.length);
                e.next = entry[index];
                entry[index] = e;
                table[i]=null;
            }
        }
    }

    @SuppressWarnings("hiding")
    private class Entry<K, V> {
        K k;
        V v;
        Entry<K, V> next;
        int h;

        public Entry(int h, K k, V v, Entry<K, V> next) {
            this.h = h;
            this.k = k;
            this.v = v;
            this.next = next;
        }
        public K getKey(){
            return k;
        }
        public V getValue(){
            return v;
        }
    }

    public Set<java.util.Map.Entry<K, V>> entrySet() {

        return null;
    }

    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        // 此段代码我无法解释 数学问题
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    /**
     * Returns index for hash code h.散列
     */
    static int indexFor(int h, int length) {
        return h & (length - 1);
    }

    public void putAll(Map map) {
        int targetCapcity = map.size();
        if (targetCapcity >= MAX_CAPCITY) {
            targetCapcity = MAX_CAPCITY;
        }
        int newLength = table.length;
        while (newLength < targetCapcity) {
            newLength <<= 1;
        }
        resize(newLength);
        for (int index = 0; index < table.length; index++) {
            for (Entry<K, V> e = table[index]; e != null; e = e.next) {
                
            }
        }
    }
    
    public V remove(Object k){
        int hash=hash(k.hashCode());
        int index=indexFor(hash,table.length);
        Entry<K,V> previous = table[index];//后一个对象
        for(Entry<K,V> e = previous; e != null; e = e.next){
            if(e.h == hash && e.k == k || e.k.equals(k)){
                size--;
                //判断是否为链表的头
                if(e == table[index]){
                    table[index] = null;
                    return e.v;
                }
                previous.next=e.next;//移除
                return e.v;
            }
            previous=e;
        }
        return null;
    }
    
    
    private abstract class MyHashIterator<E> implements Iterator<E> {

        int index;// 保存table数组的index
        Entry<K, V> current;// 当前对象
        Entry<K, V> next = table[index];// 下一个对象
        int length = table.length;
        public MyHashIterator(){
            for(;next==null && index<table.length-1;next=table[++index]);
        }
        public boolean hasNext() {
            return next!=null;
        }

        public Entry<K, V> getEntry() {
            current = next;
            if (current != null) {
                next = current.next;
            }
            for (; next == null && index < length-1; next = table[++index]);
            return current;
        }

        public void remove() {
            
        }
    }
    
    private class MyKeyIterator extends MyHashIterator<K>{
        public K next() {
            return  getEntry().getKey();
        }
    }
    
    private class MyValueIterator extends MyHashIterator<V>{
        public V next() {
            return  getEntry().getValue();
        }
    }
    
    private class MyEntryIterator extends MyHashIterator<Entry<K,V>>{
        public Entry<K,V> next() {
            return  getEntry();
        }
    }
    
    public Iterator<K> getMyKeyIterator(){
        return new MyKeyIterator();
    }
    
    public Iterator<V> getMyValueIterator(){
        return new MyValueIterator();
    }
    
    public Iterator<Entry<K,V>> getMyEntryIterator(){
        return new MyEntryIterator();
    }

    public Iterator<K> iterator() {
        return getMyKeyIterator();
    }
}

此类我优化了很久，还是不能超过hashmap的速度。极度郁闷！看来还要努力学习！
源码下载

posted @ 2013-02-01 17:44 JimmyYong 阅读(1429) 评论(0) 编辑收藏举报

刷新页面返回顶部

JimmyYong

HashMap内部实现机制及优化----第一篇

公告