Java 之 JEP Café #17 Comparators
链接:https://inside.java/2023/02/21/jepcafe17/
Comparator are elements used daily in all Java applications. There are fairly easy to write, but must also follow several subtle rules.
This JEP Café explains all of them: how to leverage the factory methods from the Comparator interface, and from the wrapper classes of primitive types. We also cover even more subtle and unexpected bugs you may come across with HashSet and HashMap if your comparators are not consistent.
Comparator
@FunctionalInterface
public interface Comparator<T> {
int compare(T o1, T o2);
}
- 对于实现 Comparable 接口的 element,按自然序(natural ordering)排序。List.sort(null)
- 按指定 Comparator 实现进行排序。
- comparing, thenComparing, reversed
- thenComparingInt, thenComparingLong, thenComparingDouble
- naturalOrder, nullsFirst, nullsLast
例:Comparison method violates its general contract!
funky computation of a difference
var rand = new Random(209);
var ints = IntStream.range(0, 32)
.mapToObj(index -> rand.nextInt())
.toList();
var sorted = ints.stream()
.sorted((i1, i2) -> i1 - i2)
.toList();
此处可能有整数最大长度溢出 (2,147,483,647) 和 IEEE 754 浮点运算失真的危险,违反 compare 要求。
sneak boxing or unboxing
var rand = new Random(2664);
var ints = IntStream.range(0, 32)
.mapToObj(index -> rand.nextInt(1000, 1100))
.toList();
var sorted = ints.stream()
.sorted((i1, i2) -> i1 < i2 ? -1 : i1 == i2 ? 0 : 1)
.toList();
精心选择的随机数使问题暴露,(Integer i1, Integer i2) 中 Integer boxing 触及 == 和 equals 的差异,再次违法 compare 要求。
例:Always override hashCode when you override equals
奇怪的 hashCode 实现
public class Point implements Comparable<Point> {
private int x;
public Point(int x) {
this.x = x;
}
@Override
public boolean equals(Object o) {
return o instanceof Point p && x == p.x;
}
@Override
public int hashCode() {
return 0;
}
@Override
public int compareTo(Point other) {
return Integer.compare(this.x, other.x);
}
}
以下示例看上去没问题。
var points = IntStream.range(0, 10).mapToObj(Point::new).toList();
Set<Point> set = new HashSet<>();
set.addAll(points);
var p5 = points.get(5);
System.out.println(set.contains(p5)); // true
p5.x *= 10;
System.out.println(set.contains(p5)); // true
而以下示例会发生非预期的事情。
var points = IntStream.range(0, 20).mapToObj(Point::new).toList();
Set<Point> set = new HashSet<>();
set.addAll(points);
var p5 = points.get(5);
System.out.println(set.contains(p5)); // true
p5.x *= 10;
System.out.println(set.contains(p5)); // false
首先我们知道,HashSet 里面是一个 HashMap,contains 方法先看 hashCode,单槽如果是队列,挨个查看,如果是树,查节点是也会先看 hashCode,再看 compareTo。
那么,在这个示例中,hahsCode 始终相同,如果数量较少,队列挨个 compareTo 还是能查到的,但转树之后,查找只会覆盖部分节点,会发生非预期的事情,查错,查丢都会发生。
注:这个示例中,数字 10,5 都是精心挑选的。
得到的经验:Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
例:Comparator is consistent with equals
以 Tagir Valeev 的谜题 为题面。
// Replace '...' in this code with proper Java expressions so that the first println prints 'true'
// and the second one prints 'false' when running on Java 16.
// This is a fair puzzle:
// - No reflection;
// - No hacking the output stream;
// - No unchecked code (e. g., List<StringBuilder> contains StringBuilder objects only);
// - No hidden replacement of library classes (List is standard java.util.List, Set is java.util.Set, etc.).
public class StringBuilderInHashMap {
public static void main(String... args) {
List<StringBuilder> list = ...;
StringBuilder sb = ...;
Set<StringBuilder> set = new HashSet<>( list );
set.add( sb );
out.println( set.contains( sb ) );
sb.append( "oops" );
out.println( set.contains( sb ) );
}
}
和上面类似,StringBuidler 是 inconsistent with equals。
StringBuilder implements Comparable but does not override equals. Thus, the natural ordering of StringBuilder is inconsistent with equals. Care should be exercised if StringBuilder objects are used as keys in a SortedMap or elements in a SortedSet. See Comparable, SortedMap, or SortedSet for more information.
解法,找很多 StringBuilder,让 HashSet 某个槽发生树化,然后碰撞找出一个相同 hashCode 的 StringBuilder。(对于数量为11的集合,特定算法要碰撞大约 252,735,000 次。)
结语
- Use factory methods from the Comparator
- Use factory methods from the wrapper classes of primitive types
- use immutable objects.
参考
- 张哈希 - BV1gL411C7nJ
- Comparison Method Violates its General Contract! - <https://inside.java/2017/11/08/comparison/>
- The Importance of Writing Stuff Down - <https://stuartmarks.wordpress.com/2023/02/22/the-importance-of-writing-stuff-down/>
- Anyway, sorry about that José, that's why we won't be adding a no-arg List.sort() overload.
- https://www.cs.usfca.edu/~galles/visualization/RedBlack.html
- Effective Java 3rd, chapter 3: Mehtods Common to All Objects
附加
The general contract of equals
: equivalence relation
- 自反,It is reflexive: for any non-null reference value x, x.equals(x) should return true.
- 对称,It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
- 传递,It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
- 一致,It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
- 非空,For any non-null reference value x, x.equals(null) should return false.
So what's the solution? It turns out that this is a fundamental problem of equivalence relations in object-oriented languages. There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you're willing to forgo the benefits of object-oriented abstraction.
-- Effective Java 3rd, Item 10
The general contract of hashCode
- Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
- If two objects are equal according to the equals method, then calling the hashCode method on each of the two objects must produce the same integer result.
- It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The general contract of compareTo
- The implementor must ensure that signum(compare(x, y)) == -signum(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)
- The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
- Finally, the implementor must ensure that compare(x, y)==0 implies that signum(compare(x, z))==signum(compare(y, z)) for all z.
- It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."
Comparator is consistent with equals
For the mathematically inclined, the relation that defines the natural ordering on a given class C is:
{(x, y) such that x.compareTo(y) <= 0}.
The quotient for this total order is:
{(x, y) such that x.compareTo(y) == 0}.
It follows immediately from the contract for compareTo that the quotient is an equivalence relation on C, and that the natural ordering is a total order on C. When we say that a class's natural ordering is consistent with equals, we mean that the quotient for the natural ordering is the equivalence relation defined by the class's equals(Object) method:
{(x, y) such that x.equals(y)}.
In other words, when a class's natural ordering is consistent with equals, the equivalence classes defined by the equivalence relation of the equals method and the equivalence classes defined by the quotient of the compareTo method are the same.
—— Javadoc Comparable & Comparator
注:此处文档中 natural ordering, imposed ordering, total order 之混乱,在 JDK-6258108 中就可看到。
注2:对于CS,不要深究,<java-comparator-documentation-confused-about-the-terminology-total-order>
注3:equivalence relation, preorder, partial order, linear order.