java object多大 java对象内存模型 数组有多长(五)identityhashcode会变吗

1 https://stackoverflow.com/questions/7207302/if-javas-garbage-collector-moves-objects-what-is-object-hashcode-and-system-id

 I've often heard that these methods (Object.hashCode and System.identityHashCode) return the address of the object, or something computed quickly from the address; but I'm also pretty sure the garbage collector moves and compacts objects. Since the hash code cannot change, this presents a problem. 

I'm pretty sure most JVM implementations implement garbage compaction, which could move an object in memory

Also see java.sun.com/docs/hotspot/gc1.4.2/faq.html, which explicitly states the young generations are managed by a copying (i.e. also moving) GC

I don't think the "address" used by hashCode() is a physical memory address. In general, the GC can move objects around between young and tenured spaces without the application knowing

.NET it is "not guaranteed to produce a different value for each object", and "may change between framework versions".

Java's is more well-understood (though presumably could differ across JVMs)

the value of an object's hashcode is not relevant until it is retrieved for the first time. After that, it must remain constant. Thus the GC moving the object doesn't matter until the object's hashcode() method is called for the first time. After that, a cached value is used.

What does JVM do to all the references when it moves an object? Are they all just symbolic references? or does it need to update each reference to the new location?

The identityHashCode does not change for an object. So any moving is done beneath that level.

A rudimentary implementation would have a logical address --> physical address mapping for every object.

 

 

 

2 https://stackoverflow.com/questions/3796699/will-hashcode-return-a-different-int-due-to-compaction-of-tenure-space

If I call the Object.hashcode() method on some object it returns the internal address of the object (default implementation). Is this address a logical or physical address?

In garbage collection, due to memory compaction objects shifting takes place in the memory. If I call hashcode before and after the GC, will it return the same hashcode (it returns) and if yes then why (because of compaction address may change) ?

@erickson is more or less correct. The hashcode returned by java.lang.Object.hashCode() does not change for the lifetime of the object.

The way this is (typically) implemented is rather clever. When an object is relocated by the garbage collector, its original hashcode has to be stored somewhere in case it is used again. The obvious way to implement this would be to add a 32 bit field to the object header to hold the hashcode. But that would add a 1 word overhead to every object, and would waste space in the most common case ... where an Object's hashCode method is not called.

In fact, an identityHashCode implementation has to behave this way to satisfy the following part of the general hashCode contract:

"Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application."

 Meanwhile if a GC+memory compaction takes place, and a new object (whose hashCode() has not been invoked yet) is allocated the same space as the old one, then wouldn't the hashCode() value be same as that of the active object that initially occupied the memory location? How does this affect object equality and Hash based collections?

yes it will. But that doesn't matter. The identity hashcode is a hashcode ... not a unique identifier

 

No, the default hash code of an object will not change.

The documentation doesn't say that the hash code is the address, it says that it is based on the address. Consider that hash codes are 32 bits, but there are 64-bit JVMs. Clearly, directly using the address wouldn't always work.

The implementation depends on the JVM, but in the Sun (Oracle) JVM, I believe the hash code is cached the first time it's accessed.

actually, the hashcode is cached when the GC relocates an object ... if hashcode() has previously been called

By the contract of hashCode it cannot change for such a reason.

if the hashcode changes, the object will disappear in a hash set which it was inserted into, and Sun will be flooded with complaints.

 

 

3 https://stackoverflow.com/questions/1063068/how-does-the-jvm-ensure-that-system-identityhashcode-will-never-change

 why does the value returned by System.identityHashCode() never change during the object's lifetime?

Related question: Is that memory address a real memory address or something virtual that can stay fixed even as the object gets shuffled about?

Actually, where does it say that the identityHashCode must never change? The JavaDoc for System.identityHashCode is not clear on that.

it follows from the specification of hashCode and equals in Object.

Okay, got it: "Whenever (hashCode) is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified." And equals in this case is object identity comparison.

Modern JVMs save the value in the object header. 

Multiple objects can have the same identity hash code. That is the nature of hash codes.

Right - I've just looked thru ObjectSynchronizer::FastHashCode in synchronizer.cpp (vm runtime source code) and after generating the hashcode, it looks like it merges it into the object header.

In answer to the second question, irrespective of the implementation, it is possible for multiple objects to have the same identityHashCode.

See bug 6321873 for a brief discussion on the wording in the javadoc, and a program to demonstrate non-uniqueness.

The header of an object in HotSpot consists of a class pointer and a "mark" word.

The source code of the data structure for the mark word can be found the markOop.hpp file. In this file there is a comment describing memory layout of the mark word:

hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)

Here we can see that the the identity hash code for normal Java objects on a 32 bit system is saved in the mark word and it is 25 bits long.

The general guideline for implementing a hashing function is :

  • the same object should return a consistent hashCode, it should not change with time or depend on any variable information (e.g. an algorithm seeded by a random number or values of mutable member fields
  • the hash function should have a good random distribution, and by that I mean if you consider the hashcode as buckets, 2 objects should map to different buckets (hashcodes) as far as possible. The possibility that 2 objects would have the same hashcode should be rare - although it can happen.

 

 

 

4 https://stackoverflow.com/questions/4930781/how-do-hashcode-and-identityhashcode-work-at-the-back-end

That integer returned by identityHashCode may be related to the (a) machine address for the object, or it may not be1. The value returned by identityHashCode() is guaranteed not to change for the lifetime of the object. This means that if the GC relocates an object (after an identityHashCode() call) then it cannot use the new object address as the identity hashcode.

  • The identityHashCode(Object) method gives you a identifier for an object which can (in theory) be used for other things than hashing and hash tables. (Unfortunately, it is not a unique identifier, but it is guaranteed to never change for the lifetime of the object.)

For current generation JVMs, it is not related to the memory address at all. See @bestsss's answer.

 

identityHashCode() works like that (and as of now it has nothing do to with the address, especially since the addresses are 64bits long, ok aligned, so 61):

Checks if there is already generated one, if so returns it. You can assume there is a place in the object header for that int;

Otherwise: Generates a random number (Marsaglia shift-xor algorithm). Every native thread has its own seed, so no shared info. CAS the identityHashCode field in the object header to update w/ the newly generated number. If CAS succeeds, returns the value. If not, the field already contains a generated identityHashCode.

You can see the rest of the replies about overriding hashcode.

Bottom line: If the JavaDoc still states anything about addresses and identityHashCode, someone needs to update it.

There are 5 alternatives of hash code calculation in the code. Looking at this one can see that option 5, Marsaglia shift-xor, is really currently used. Hashcode generation has been elaborated more in this blog post 

– eis
The idea that hashCode uses the memory address is a historical artefact stackoverflow.com/questions/36236615/…   CommentedMar 26, 2016 at 17:28
identityHashCode does not return a memory address.
 
 
 
总结下来:
1 hashcode可以不唯一
2 hashcode必须全程不变,写在java hashcode规范中,且实践中,如果变了意味着容器崩塌了
 
 

posted on 2024-06-17 17:43  silyvin  阅读(1)  评论(0编辑  收藏  举报