C#中GetHashCode的各类实现

GetHashCode的用处
为什么不能使用默认的GetHashCode
GetHashCode的高效定义方法
参考

`GetHashCode`的用处

首先声明一下，这里的GetHashCode是Object.GetHashCode，是需要在对象中定义的函数。这个函数在对象被插入到字典Dictionary<TKey, TValue>或者HashSet<T>之类的哈希表中的时候会被调用，用于生成hash键值。关于哈希表：

散列表（Hash table，也叫哈希表），是根据关键码值(Key value)而直接进行访问的数据结构。也就是说，它通过把关键码值映射到表中一个位置来访问记录，以加快查找的速度。这个映射函数叫做散列函数，存放记录的数组叫做散列表。
给定表M，存在函数f(key)，对任意给定的关键字值key，代入函数后若能得到包含该关键字的记录在表中的地址，则称表M为哈希(Hash）表，函数f(key)为哈希(Hash) 函数。

HashCode的规范：

如果a和b相等，那么a.equals(b)一定为true，则a.hashCode()必须等于b.hashCode()；
如果a和b不相等，那么a.equals(b)一定为false，则a.hashCode()和b.hashCode()尽量不要相等。

第一条是必须实现的，否则Dictionary这类数据结构无法正常使用；第二条则是尽量实现，如果实现得不好的话会影响实际使用时的存取性能。

因此，GetHashCode可以用于辅助快速判断两个Object是否相等，之所以是辅助是因为即使是不同的两个Object，也是有可能拥有同样的HashCode的。

为什么不能使用默认的`GetHashCode`

直接使用默认的ValueType的GetHashCode实现容易造成哈希冲突，这样的Object在配合哈希表这类数据结构使用的时候会出现性能问题。

`GetHashCode`的高效定义方法

`HashCode.Combine`

class Person:IEquatable<Person>
    {
        private int Age { get; }
        public string Name { get;}

        public bool Equals(Person other)
        {
            if (ReferenceEquals(null, other)) return false;
            if (ReferenceEquals(this, other)) return true;
            return Age == other.Age && Name == other.Name;
        }

        public override bool Equals(object obj)
        {
            if (ReferenceEquals(null, obj)) return false;
            if (ReferenceEquals(this, obj)) return true;
            if (obj.GetType() != this.GetType()) return false;
            return Equals((Person) obj);
        }

        public override int GetHashCode()
        {
            return HashCode.Combine<int, string>(Age, Name);
        }
    }

最多传入8个参数。在参数是简单的数据类型时（如Int）那么很高效。

`HashCode.ToHashCode`

class Person:IEquatable<Person>
    {
        public int Age { get; set; }
        public string Name { get;set; }

        public Person(int age, string name)
        {
            Age = age;
            Name = name;
        }

        public bool Equals(Person other)
        {
            if (ReferenceEquals(null, other)) return false;
            if (ReferenceEquals(this, other)) return true;
            return Age == other.Age && Name == other.Name;
        }

        public override bool Equals(object obj)
        {
            if (ReferenceEquals(null, obj)) return false;
            if (ReferenceEquals(this, obj)) return true;
            if (obj.GetType() != this.GetType()) return false;
            return Equals((Person) obj);
        }

        public override int GetHashCode()
        {
            var hash = new HashCode();
            hash.Add(Age);
            hash.Add(Name);

            return hash.ToHashCode();;
        }
    }

利用元组

要求C#是7.0以上的版本。

class Person:IEquatable<Person>
    {
        public int Age { get; set; }
        public string Name { get;set; }

        public Person(int age, string name)
        {
            Age = age;
            Name = name;
        }

        public bool Equals(Person other)
        {
            if (ReferenceEquals(null, other)) return false;
            if (ReferenceEquals(this, other)) return true;
            return Age == other.Age && Name == other.Name;
        }

        public override bool Equals(object obj)
        {
            if (ReferenceEquals(null, obj)) return false;
            if (ReferenceEquals(this, obj)) return true;
            if (obj.GetType() != this.GetType()) return false;
            return Equals((Person) obj);
        }

        public override int GetHashCode()
        {
            return (Age, Name).GetHashCode();
        }

    }

这个方法比较好的地方就在于是在栈上面完成的，不会触发GC。除了上面的还可以：

new { Age, Name }.GetHashCode();

但是这样会触发GC，因为在堆上分配内存了。这是用到了C#的匿名类型来实现的。

基于两个素数、乘、加法

public override int GetHashCode()
{
    unchecked // Hash计算的结果是可以溢出的
    {
        int hash = 17;
        // 注意在计算之前要对field*进行null检测，计算个Hash都要触发异常就得不偿失了
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

用两个不相通的素数，通过类似上面的乘法和加法计算就足够编写一个拥有较低Hash碰撞概率的算法。17和23只是例子，可以是随意的素数，越大越好。

基于异或

public override int GetHashCode()
{
    unchecked
    {
        int hash = (int) 2166136261; // 不是素数
        // 16777619是素数
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

2166136261和16777619是FNV算法重的取值，上面这个算法模仿了FNV，所以沿用了FNV的取法。具体的取法以及这两个数字是怎么计算出来的可以看wiki介绍页。

参考

What is the best algorithm for overriding GetHashCode?
Object.GetHashCode Method
不要使用 struct 默认的 GetHashCode 实现
 12-1怎么写HashCode
HashCode.Combine Method
HashCode.ToHashCode Method
Eternally Confuzzled - The Art of Hashing
Fowler–Noll–Vo hash function

posted @ 2021-04-08 17:21 夜溅樱阅读(2152) 评论(0) 收藏举报

刷新页面返回顶部

夜溅樱

优雅地绽放吧，墨染的樱花！

C#中GetHashCode的各类实现

`GetHashCode`的用处

为什么不能使用默认的`GetHashCode`

`GetHashCode`的高效定义方法

`HashCode.Combine`

`HashCode.ToHashCode`

利用元组

基于两个素数、乘、加法

基于异或

参考

公告

夜溅樱

优雅地绽放吧，墨染的樱花！

C#中GetHashCode的各类实现

GetHashCode的用处

为什么不能使用默认的GetHashCode

GetHashCode的高效定义方法

HashCode.Combine

HashCode.ToHashCode

利用元组

基于两个素数、乘、加法

基于异或

参考

公告

`GetHashCode`的用处

为什么不能使用默认的`GetHashCode`

`GetHashCode`的高效定义方法

`HashCode.Combine`

`HashCode.ToHashCode`