[Java] String的hashCode方法
简述
java/lang/String#hashCode
是用途极广的方法,其源码实现也存在一定变迁。
其位于 JRE 的
rt.jar
包内
OpenJDK
OpenJDK 8-b120版 ~ 9-b00版 := Oracle JDK 1.8.0-261
jdk/jdk/src/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk9-b00/jdk/src/share/classes/java/lang/String.java
/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
OpenJDK 9-b38版 ~ 9-b65版
jdk/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk9-b38/jdk/src/java.base/share/classes/java/lang/String.java
/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0) {
for (char v : value) {
h = 31 * h + v;
}
hash = h;
}
return h;
}
OpenJDK 9-b66版 ~ 9-b92版
https://github.com/openjdk/jdk/blob/jdk9-b66/jdk/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk9-b92/jdk/src/java.base/share/classes/java/lang/String.java
/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0) {
for (char v : value) {
h = 31 * h + v;
}
if (h != 0) {
hash = h;
}
}
return h;
}
OpenJDK 9-b93版 ~ 10+1版 ~ unknowVersion
jdk/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk9-b93/jdk/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk-10%2B1/jdk/src/java.base/share/classes/java/lang/String.java
/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
if (hash == 0 && value.length > 0) {
hash = isLatin1() ? StringLatin1.hashCode(value) : StringUTF16.hashCode(value);
}
return hash;
}
OpenJDK unknowVersion ~ 10+1版 ~ 24+24版
jdk/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk-11%2B28/src/java.base/share/classes/java/lang/String.java
https://github.com/openjdk/jdk/blob/jdk-24%2B24/src/java.base/share/classes/java/lang/String.java
/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
// The hash or hashIsZero fields are subject to a benign data race,
// making it crucial to ensure that any observable result of the
// calculation in this method stays correct under any possible read of
// these fields. Necessary restrictions to allow this to be correct
// without explicit memory fences or similar concurrency primitives is
// that we can ever only write to one of these two fields for a given
// String instance, and that the computation is idempotent and derived
// from immutable state
int h = hash;
if (h == 0 && !hashIsZero) {
h = isLatin1() ? StringLatin1.hashCode(value) : StringUTF16.hashCode(value);
if (h == 0) {
hashIsZero = true;
} else {
hash = h;
}
}
return h;
}
Huawei JDK
Huawei JDK 1.8/25.412-b08
hashCode
OpenJDK 64-Bit Server VM - Huawei Technologies Co., Ltd - 1.8/25.412-b08 JAVA_HOME: /usr/local/huaweijre-8
/usr/local/huaweijre-8/lib/rt.jar注: 华为云 DLI Flink 1.12 即使用的此版 JDK,如下代码为反编译后的结果: (与 Oracle JDK 1.8.0-261相比,其区别在于:
int=>
byte`)
public int hashCode() {
int i = this.hash;
if (i == 0 && this.value.length > 0) {
char[] arrayOfChar = this.value;
for (byte b = 0; b < this.value.length; b++)
i = 31 * i + arrayOfChar[b];
this.hash = i;
}
return i;
}
Oracle JDK 1.8.0-261
vs. Huawei JDK 1.8/25.412-b08
试验代码
public static int hashCode(char [] value){
//char [] value = stringThis.value;
int var1 = 0; //this.hash;//this.hash 的起始值=0 | 本方法的计算结果(hash) 即 var1 if (var1 == 0 && value.length > 0) {
char[] var2 = value;
for(int var3 = 0; var3 < value.length; ++var3) {//int vs byte
var1 = 31 * var1 + var2[var3];
}
//this.hash = var1;
}
return var1;
}
public static void main(String[] args) {
char [] value = "Hello".toCharArray(); //new char[] { 'H', 'e', 'l', 'l', 'o' };
Long totalTimeConsuming = 0L;
int count = 1000;
for(int i=0;i<=count;i++){
Long startTime = System.nanoTime();
int hashcode = hashCode(value);
Long timeConsuming = System.nanoTime() - startTime;//nano time 纳秒 | 1纳秒 = 0.001 微秒 ; 1纳秒=0.000001 毫秒
totalTimeConsuming += timeConsuming;
log.info("hashCode: {}, timeConsuming : {}ns", hashcode, timeConsuming);
}
double avgTime = (double) totalTimeConsuming/count;//2个整型相除,需用浮点型接收,以避免精度丢失
log.info("avg time consuming : {} ns", avgTime );//
}
功能对比结果:
hashCode 运算结果未发生改变
性能对比结果:
循环1000次(试验5轮),求平均值:
hashcode : byte 版 (Huawei JDK 1.8/25.412-b08)
552.1 ns
626.3 ns
477.2 ns
643.8 ns
648.2 ns
=====> 总体 avg : 589.52 ns
hashcode : int 版 (Oracle JDK 1.8.0-261)
633.8 ns
577.4 ns
553.4 ns
517.7 ns
788.0 ns
====> 总体 avg : 614.06 ns
FAQ
Q:为什么String中hashCode方法里使用神奇因子 31呢?
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
-
hashCode 方法核心的计算逻辑只有3行,也就是代码中的 for 循环:对value这个char数组每个元素都算个出个和31相关的数。
-
假设这里value数组的长度为3(value.length = 3),那么逻辑过程就是这样的:
i=0 -> h = 31 * 0 + val[0]
i=1 -> h = 31 * (31 * 0 + val[0]) + val[1]
i=2 -> h = 31 * (31 * (31 * 0 + val[0]) + val[1]) + val[2]
//把括号里的数据乘出来
h = 31*31*31*0 + 31*31*val[0] + 31*val[1] + val[2]
//于是我们可以推导出这个式子
h = 31^(3-1)*val[0] + 31^(3-2)*val[1] + val[2]
上面的 for 循环推导出一个计算公式:
val[0]*31^(n-1) + val[1]*31^(n-2) + ... + val[n-1]
上面公式的推导并不是本文的重点,大家了解了解即可。但需要知道这里时时刻刻都有着31的身影。
- 选择数字
31
的原因
原因 1:31 可以被编译器优化为31∗i=(i<<5)−i,位运算和减法运算的效率比乘法运算高。
原因 2: 31 是一个质数:质数是只能被 1 和自身整除的数,使用质数作为乘法因子获得的散列值,在将来进行取模时,得到相同 index 的概率会降低,即降低了哈希冲突的概率。
原因 3: 31 是一个不大不小的质数:质数太小容易造成散列值聚集在一个小区间,提供散列冲突概率;质数过大容易造成散列值超出 int 的取值范围(上溢),丢失部分数值信息,散列冲突概率不稳定。
Q: 在Java中如何优化JavaBean的hashCode方法?
在Java中,hashCode方法的优化对于提高散列表(如HashMap、HashSet等)的性能至关重要。以下是一些优化hashCode方法的建议:
- 一致性:对于同一个对象,无论在任何上下文中调用hashCode()方法,都应返回相同的值。这是hashCode方法的基本要求。
- 高效性:计算hashCode()的时间复杂度应尽可能低,以便在大量数据中快速查找。
- 均匀分布:生成的hashCode值应尽量均匀分布在散列表的各个位置,以减少哈希冲突的概率。
- 避免使用输入字段中的特殊字符或空格:这些字符可能导致hashCode的计算结果不均匀分布。
- 考虑使用不可变字段:如果对象的某些字段在创建后不会改变,那么可以将这些字段纳入hashCode的计算中。这样,只要对象不变,其hashCode就不会改变,这有助于提高性能。
- 不要使用输入字段的负值:负值可能导致hashCode的分布不均匀。
- 考虑使用位操作:位操作通常比乘法和除法更快,可以考虑将多个字段的值通过位操作组合成一个hashCode。
- 避免使用重量级的计算:如果必须使用复杂的计算,尽量将其放在一个单独的方法中,并在hashCode方法中调用该方法。
- 注意null值:对于null值,需要决定如何处理。一种常见的做法是返回一个特定的常量值(如0或-1)。
- 文档和测试:明确文档中说明hashCode方法的实现方式和使用限制,并进行充分的测试以确保其正确性和性能。
X 推荐文献
- openjdk
https://openjdk.org/
https://openjdk.org/projects/jdk/
https://github.com/openjdk/jdk/
本文链接: https://www.cnblogs.com/johnnyzen
关于博文:评论和私信会在第一时间回复,或直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
日常交流:大数据与软件开发-QQ交流群: 774386015 【入群二维码】参见左下角。您的支持、鼓励是博主技术写作的重要动力!