Hashmap相关

近复习了一下hashmap的相关东西，真的要感叹，有些东西长时间不用，还是会忘记的，在此记录一下。

1、关于Hash

一般翻译做“散列”，也有译为"哈希"的，就是把任意长度的输入（又叫做预映射， pre-image），通过散列算法，变换成固定长度的输出。简单的说，就是hashcode = hash(key), 这种转换是一种压缩映射，散列值的空间通常远小于输入的空间，在查找的时候也能提高效率。

2、Hashmap的原理

java的hashmap是用一个key和value的模式来存取数据，但是本质上，它还是存储在一个数组里面，过程是这样的：首先得到key的hashcode，然后通过index=hashcode%size，得到数组的索引，然后把相应的entry放进数组中。但是这时候会有问题产生，不同的hashcode可能产生相同的index，这时候就会产生冲突，那么怎么解决这个冲突呢？这时候引入一个概念‘桶’，每一个数组元素代表（通过index确定位置）的存储叫一个‘桶’，而每个有相同的index的entry都存储在这个桶中，每个桶中的entry看做一个单向链表的结构来存储数据。

（1）、hashmap.put()的过程，直接上代码：

//hashmap的put方法

public V put(K key, V value) {
        K k = maskNull(key);
        int hash = hash(k);      //通过hash算法得到散列值
        int i = indexFor(hash, table.length); //得到数组的index

        //遍历table[i]位置的链表，如果有发现符合条件的key，说明该key已经存在，替换掉对应的
         //value，否则继续下面的操作
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            if (e.hash == hash && eq(k, e.key)) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        //把新的Entry加入到链表中去
        addEntry(hash, k, value, i);
        return null;
}

//把新的Entry加入到链表中去
void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        //threshold=(int)(capacity*loadFactor);
        //capacity为hashmap的容量，默认值为16
        //loadFactor的负载因子，默认值为0.75
        if (size++ >= threshold)
            resize(2 * table.length);  //容量翻一翻，内部的数据全部重新排一遍，很消耗资源
}

说到这里，有必要了解一下Entry的结构，它是hashmap的一个内部类，本身就是一个单向列表结构。主要代码如下：

//内部类Entry
static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        final int hash;
        Entry<K,V> next;  //链表结构

        /**
         * Create new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public K getKey() {
            return HashMap.<K>unmaskNull(key);
        }

        public V getValue() {
            return value;
        }
        ..
  }

（2）、hashmap的get的过程

//hashmap的get方法
public V get(Object key) {
        Object k = maskNull(key);
        int hash = hash(k);  //得到key的hashcode
        int i = indexFor(hash, table.length); //得到数组的index
        Entry<K,V> e = table[i];
        while (true) {    //在对应的桶中遍历
            if (e == null)
                return null;
            //满足此条件，则说明找到了对应的value，返回
             //该处说明key的equals（）和hashcode（）方法的作用，如果key的hashcode每一次都
             //相等，那么每一次都要去比较equals方法，equals是很消耗资源的
            if (e.hash == hash && eq(k, e.key))
               return e.value;
            e = e.next;
        }
    }

3、关于Object的equals（）和hashcode（）方法

这2个方法在java相关的场合出现的频率确实很高，这里就来讨论一下hashmap和这2个方法的关系。简单的讲，就是当你的对象可能要来做hashmap的key的时候，你就要特别注意了。在缺省情况下，equals方法返回this==obj， hashCode方法的缺省返回对象的内存地址对映于一个整数值来生成。下面举例说明一下：

public class User {
    private String name;

    public User(String name){
        this.name = name;
    }

    public boolean equals(Object o) {
        if (!(o instanceof User)) {
            return false;
        }
        if (o == null) {
            return false;
        }
        User anotherUser = (User) o;
        if(anotherUser.name==null){
            if(this.name==null){
                return true;
            }else{
                return false;
            }
        }else{
            return anotherUser.name.equals(this.name);
        }
    }
    public static void main(String[] args) {
               Map<User,Object> map = new HashMap<User,Object>();
               map.put(new User("test"),"test1");
                System.out.println(map.get(new User("test")));
    }

}

以上代码的运行结果是：null

分析原因：没有重载hashcode（）方法，在2次new User("test")的时候，虽然他们的equals方法相等，但是hashcode返回值并不相等。所有在get的时候找不到对应的桶，也自然取不到正确的值。

下面我们重载hashcode方法，如果按照下面的写法会有什么结果呢？

public int hashCode(){
return 0;
}

每次都返回相同的hashcode，这样写运行的时候不会报错，但是效率相当很低下。这样写也就违背了hashmap的初衷，它本身就为想利用散列来块的实现查找，而不用每次都去比较equals方法。分析代码，当很多这个这样的<key,value>组合被put进去的时候，它们被放在了同一个桶中，而当get的时候，每次都要去遍历链表，并且当执行if(e.hash == hash && eq(k, e.key)) 的时候，因为hashcode相等，所有总是要去执行equals方法，这样导致效率低下。

到这里，就会有个疑问，怎样写hashcode方法才是最好的呢？

首先把握住3个原则：1、如果两个对象equals(Object o)方法是相等的，则调用这两个对象中任一对象的hashCode方法必须产生相同的整数结果；

2、根据第一条可以推出，如果两个对象的hashcode方法产生的值不相等，那么equals方法也必定不相等；

3、虽然可以出现，equals方法不相等，hashcode相等的情况。但是如果提高equals不相等===》

hashcode也不相等的概率，可以提高hashmap的效率。

一种简单常见的hashcode方法：

public int hashCode(){
         int hash = 1;
         hash = hash * 31 + (this.name == null ? 0 : this.name.hashCode());
         return hash;
}

4、关于冲突和避免冲突的方法：

冲突在2种情况下发生，第一，key得到的hashcode是相等的，第二，不同的hashcode，通过hashcode%mapsize后，得到相同的index。首先强调，要完全避免冲突是不可能的，只有降低发生的概率。重点讨论第二种情况的解决方案：

1、最好能够在定义map的时候能够估计到map的大小，Map map = new HashMap（size），因为rehash的过程是相当消耗的资源的；

2、调整负载因子loadFactor的大小，默认值为0.75，JDK描述这个值是权衡了时间和空间上的综合考虑后取的平均值，loadFactor值越大，空间消耗就越小，时间

就越大；反之，loadFactor越小，时间消耗就越小，空间消耗及越大。loadFactor的值一般在0.5-1之间。

3、map的大小，自定义的size值的时候，最好是素数。因为素数可以降低冲突的概率。（但jdk的默认值mapsize是16）。

posted on 2009-11-25 10:44 ethanchan 阅读(756) 评论(0) 收藏举报

刷新页面返回顶部

Hashmap相关

导航

公告