今天有同事提议用String的hashcode得到int类型作为主键。其实hashcode重复的可能性超大,下面是java的缺省算法:
但是什么情况下会重复?下面是测试代码
import java.util.HashMap;
public class Test {
static HashMap map = new HashMap();
private static char startChar = 'A';
private static char endChar = 'z';
private static int offset = endChar - startChar + 1;
private static int dup = 0;
public static void main(String[] args) {
int len = 3;
char[] chars = new char[len];
tryBit(chars, len);
System.out.println((int)Math.pow(offset, len) + ":" + dup);
}
private static void tryBit(char[] chars, int i) {
for (char j = startChar; j <= endChar; j++) {
chars[i - 1] = j;
if (i > 1)
tryBit(chars, i - 1);
else
test(chars);
}
}
private static void test(char[] chars) {
String str = new String(chars).replaceAll("[^a-zA-Z_]", "").toUpperCase();// 195112:0
//String str = new String(chars).toLowerCase();//195112:6612
//String str = new String(chars).replaceAll("[^a-zA-Z_]","");//195112:122500
//String str = new String(chars);//195112:138510
int hash = str.hashCode();
if (map.containsKey(hash)) {
String s = (String) map.get(hash);
if (!s.equals(str)) {
dup++;
System.out.println(s + ":" + str);
}
} else {
map.put(hash, str);
// System.out.println(str);
}
}
}
在A-z范围内有特殊字符,从结果看,仅仅3位长度的字符串:
不处理: 138510次重复
去掉字母意外字符: 122500次重复
所有字符转小写:6612次重复(少了很多)
去掉字母意外字符,并且转小写:没有重复!4位字符串也没见重复
不难看出:
1. 缺省实现为英文字母优化
2. 字母大小写可能导致重复
可能:
长字符串可能hashcode重复
中文字符串和特殊字符可能hashcode重复
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
import java.util.HashMap;
public class Test {
static HashMap map = new HashMap();
private static char startChar = 'A';
private static char endChar = 'z';
private static int offset = endChar - startChar + 1;
private static int dup = 0;
public static void main(String[] args) {
int len = 3;
char[] chars = new char[len];
tryBit(chars, len);
System.out.println((int)Math.pow(offset, len) + ":" + dup);
}
private static void tryBit(char[] chars, int i) {
for (char j = startChar; j <= endChar; j++) {
chars[i - 1] = j;
if (i > 1)
tryBit(chars, i - 1);
else
test(chars);
}
}
private static void test(char[] chars) {
String str = new String(chars).replaceAll("[^a-zA-Z_]", "").toUpperCase();// 195112:0
//String str = new String(chars).toLowerCase();//195112:6612
//String str = new String(chars).replaceAll("[^a-zA-Z_]","");//195112:122500
//String str = new String(chars);//195112:138510
int hash = str.hashCode();
if (map.containsKey(hash)) {
String s = (String) map.get(hash);
if (!s.equals(str)) {
dup++;
System.out.println(s + ":" + str);
}
} else {
map.put(hash, str);
// System.out.println(str);
}
}
}
在A-z范围内有特殊字符,从结果看,仅仅3位长度的字符串:
不处理: 138510次重复
去掉字母意外字符: 122500次重复
所有字符转小写:6612次重复(少了很多)
去掉字母意外字符,并且转小写:没有重复!4位字符串也没见重复
不难看出:
1. 缺省实现为英文字母优化
2. 字母大小写可能导致重复
可能:
长字符串可能hashcode重复
中文字符串和特殊字符可能hashcode重复