[Java] String的hashCode方法

简述

  • java/lang/String#hashCode是用途极广的方法,其源码实现也存在一定变迁。

其位于 JRE 的 rt.jar 包内

OpenJDK

OpenJDK 8-b120版 ~ 9-b00版 := Oracle JDK 1.8.0-261

  • jdk/jdk/src/share/classes/java/lang/String.java

https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk9-b00/jdk/src/share/classes/java/lang/String.java

    /**
     * Returns a hash code for this string. The hash code for a
     * {@code String} object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using {@code int} arithmetic, where {@code s[i]} is the
     * <i>i</i>th character of the string, {@code n} is the length of
     * the string, and {@code ^} indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

OpenJDK 9-b38版 ~ 9-b65版

  • jdk/src/java.base/share/classes/java/lang/String.java

https://github.com/openjdk/jdk/blob/jdk9-b38/jdk/src/java.base/share/classes/java/lang/String.java

    /**
     * Returns a hash code for this string. The hash code for a
     * {@code String} object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using {@code int} arithmetic, where {@code s[i]} is the
     * <i>i</i>th character of the string, {@code n} is the length of
     * the string, and {@code ^} indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        int h = hash;
        if (h == 0) {
            for (char v : value) {
                h = 31 * h + v;
            }
            hash = h;
        }
        return h;
    }

OpenJDK 9-b66版 ~ 9-b92版

https://github.com/openjdk/jdk/blob/jdk9-b66/jdk/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk9-b92/jdk/src/java.base/share/classes/java/lang/String.java

    /**
     * Returns a hash code for this string. The hash code for a
     * {@code String} object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using {@code int} arithmetic, where {@code s[i]} is the
     * <i>i</i>th character of the string, {@code n} is the length of
     * the string, and {@code ^} indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        int h = hash;
        if (h == 0) {
            for (char v : value) {
                h = 31 * h + v;
            }
            if (h != 0) {
                hash = h;
            }
        }
        return h;
    }

OpenJDK 9-b93版 ~ 10+1版 ~ unknowVersion

  • jdk/src/java.base/share/classes/java/lang/String.java

https://github.com/openjdk/jdk/blob/jdk9-b93/jdk/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk-10%2B1/jdk/src/java.base/share/classes/java/lang/String.java

    /**
     * Returns a hash code for this string. The hash code for a
     * {@code String} object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using {@code int} arithmetic, where {@code s[i]} is the
     * <i>i</i>th character of the string, {@code n} is the length of
     * the string, and {@code ^} indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        if (hash == 0 && value.length > 0) {
            hash = isLatin1() ? StringLatin1.hashCode(value) : StringUTF16.hashCode(value);
        }
        return hash;
    }

OpenJDK unknowVersion ~ 10+1版 ~ 24+24版

  • jdk/src/java.base/share/classes/java/lang/String.java

https://github.com/openjdk/jdk/blob/jdk-11%2B28/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk-24%2B24/src/java.base/share/classes/java/lang/String.java

    /**
     * Returns a hash code for this string. The hash code for a
     * {@code String} object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using {@code int} arithmetic, where {@code s[i]} is the
     * <i>i</i>th character of the string, {@code n} is the length of
     * the string, and {@code ^} indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        // The hash or hashIsZero fields are subject to a benign data race,
        // making it crucial to ensure that any observable result of the
        // calculation in this method stays correct under any possible read of
        // these fields. Necessary restrictions to allow this to be correct
        // without explicit memory fences or similar concurrency primitives is
        // that we can ever only write to one of these two fields for a given
        // String instance, and that the computation is idempotent and derived
        // from immutable state
        int h = hash;
        if (h == 0 && !hashIsZero) {
            h = isLatin1() ? StringLatin1.hashCode(value) : StringUTF16.hashCode(value);
            if (h == 0) {
                hashIsZero = true;
            } else {
                hash = h;
            }
        }
        return h;
    }

Huawei JDK

Huawei JDK 1.8/25.412-b08

hashCode

OpenJDK 64-Bit Server VM - Huawei Technologies Co., Ltd - 1.8/25.412-b08 JAVA_HOME: /usr/local/huaweijre-8 /usr/local/huaweijre-8/lib/rt.jar注: 华为云 DLI Flink 1.12 即使用的此版 JDK,如下代码为反编译后的结果: (与 Oracle JDK 1.8.0-261相比,其区别在于:int=>byte`)

  public int hashCode() {
    int i = this.hash;
    if (i == 0 && this.value.length > 0) {
      char[] arrayOfChar = this.value;
      for (byte b = 0; b < this.value.length; b++)
        i = 31 * i + arrayOfChar[b]; 
      this.hash = i;
    } 
    return i;
  }

Oracle JDK 1.8.0-261 vs. Huawei JDK 1.8/25.412-b08

试验代码

public static int hashCode(char [] value){  
    //char [] value = stringThis.value;  
    int var1 = 0; //this.hash;//this.hash 的起始值=0 | 本方法的计算结果(hash) 即 var1    if (var1 == 0 && value.length > 0) {  
        char[] var2 = value;  
  
        for(int var3 = 0; var3 < value.length; ++var3) {//int vs byte
            var1 = 31 * var1 + var2[var3];  
        }  
        //this.hash = var1;  
    }  
    return var1;  
}  
  
public static void main(String[] args) {  
    char [] value = "Hello".toCharArray(); //new char[] { 'H', 'e', 'l', 'l', 'o' };
    Long totalTimeConsuming = 0L;  
    int count = 1000;  
    for(int i=0;i<=count;i++){  
        Long startTime = System.nanoTime();  
        int hashcode = hashCode(value);  
        Long timeConsuming = System.nanoTime() - startTime;//nano time 纳秒 | 1纳秒 = 0.001 微秒 ; 1纳秒=0.000001 毫秒  
        totalTimeConsuming += timeConsuming;  
        log.info("hashCode: {}, timeConsuming : {}ns", hashcode, timeConsuming);  
    }  
    double avgTime = (double) totalTimeConsuming/count;//2个整型相除,需用浮点型接收,以避免精度丢失  
    log.info("avg time consuming : {} ns", avgTime );//  
}

功能对比结果:

hashCode 运算结果未发生改变

性能对比结果:

循环1000次(试验5轮),求平均值:

hashcode : byte 版 (Huawei JDK 1.8/25.412-b08)	
	552.1 ns
	626.3 ns
	477.2 ns
	643.8 ns
	648.2 ns
	=====> 总体 avg : 589.52 ns

hashcode : int 版 (Oracle JDK 1.8.0-261)
	633.8 ns
	577.4 ns
	553.4 ns
	517.7 ns
	788.0 ns
	====> 总体 avg : 614.06 ns

FAQ

Q:为什么String中hashCode方法里使用神奇因子 31呢?

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}
  • hashCode 方法核心的计算逻辑只有3行,也就是代码中的 for 循环:对value这个char数组每个元素都算个出个和31相关的数。

  • 假设这里value数组的长度为3(value.length = 3),那么逻辑过程就是这样的:

i=0 -> h = 31 * 0 + val[0]
i=1 -> h = 31 * (31 * 0 + val[0]) + val[1]
i=2 -> h = 31 * (31 * (31 * 0 + val[0]) + val[1]) + val[2]

//把括号里的数据乘出来
  h = 31*31*31*0 + 31*31*val[0] + 31*val[1] + val[2]
//于是我们可以推导出这个式子
  h = 31^(3-1)*val[0] + 31^(3-2)*val[1] + val[2]

上面的 for 循环推导出一个计算公式:

val[0]*31^(n-1) + val[1]*31^(n-2) + ... + val[n-1]

上面公式的推导并不是本文的重点,大家了解了解即可。但需要知道这里时时刻刻都有着31的身影。

  • 选择数字31的原因

原因 1:31 可以被编译器优化为31∗i=(i<<5)−i,位运算减法运算效率比乘法运算高。
原因 2: 31 是一个质数:质数是只能被 1 和自身整除的数,使用质数作为乘法因子获得的散列值,在将来进行取模时,得到相同 index 的概率会降低,即降低了哈希冲突的概率
原因 3: 31 是一个不大不小的质数:质数太小容易造成散列值聚集在一个小区间,提供散列冲突概率;质数过大容易造成散列值超出 int 的取值范围(上溢),丢失部分数值信息,散列冲突概率不稳定。

Q: 在Java中如何优化JavaBean的hashCode方法?

在Java中,hashCode方法的优化对于提高散列表(如HashMap、HashSet等)的性能至关重要。以下是一些优化hashCode方法的建议:

  • 一致性:对于同一个对象,无论在任何上下文中调用hashCode()方法,都应返回相同的值。这是hashCode方法的基本要求。
  • 高效性:计算hashCode()的时间复杂度应尽可能低,以便在大量数据中快速查找。
  • 均匀分布:生成的hashCode值应尽量均匀分布在散列表的各个位置,以减少哈希冲突的概率。
  • 避免使用输入字段中的特殊字符或空格:这些字符可能导致hashCode的计算结果不均匀分布。
  • 考虑使用不可变字段:如果对象的某些字段在创建后不会改变,那么可以将这些字段纳入hashCode的计算中。这样,只要对象不变,其hashCode就不会改变,这有助于提高性能。
  • 不要使用输入字段的负值:负值可能导致hashCode的分布不均匀。
  • 考虑使用位操作:位操作通常比乘法和除法更快,可以考虑将多个字段的值通过位操作组合成一个hashCode。
  • 避免使用重量级的计算:如果必须使用复杂的计算,尽量将其放在一个单独的方法中,并在hashCode方法中调用该方法。
  • 注意null值:对于null值,需要决定如何处理。一种常见的做法是返回一个特定的常量值(如0或-1)。
  • 文档和测试:明确文档中说明hashCode方法的实现方式和使用限制,并进行充分的测试以确保其正确性和性能。

X 推荐文献

  • openjdk

https://openjdk.org/
https://openjdk.org/projects/jdk/
https://github.com/openjdk/jdk/

posted @ 2024-11-19 17:59  千千寰宇  阅读(5)  评论(0编辑  收藏  举报