jvm源码解析java对象头
认真学习过java的同学应该都知道,java对象由三个部分组成:对象头,实例数据,对齐填充,这三大部分扛起了java的大旗对象,实例数据其实就是我们对象中的数据,对齐填充是由于为了规则分配内存空间,java对象大小一定是8字节的整数倍,但是我们也不能让程序员来控制吧,所以当不够8位时,会自动填充至8的整数倍,对象头记录了hash值,gc年龄,锁状态(偏向锁还会记录线程id),gc状态等等,它还保存了对象的class指针,可谓是核心中的核心,有兴趣的同学可以去看一下关于我写的对象的一些介绍:https://www.cnblogs.com/gmt-hao/p/13817564.html。那么接下来我们就从jvm层面来剖析对象头的实现,还是老规矩,先撸代码。
java作为面向对象的语言,作为代表的对象原始类名称也很有代表性:oop,我们进oop.hpp中看一下:
// oopDesc is the top baseclass for objects classes. The {name}Desc classes describe // the format of Java objects so the fields can be accessed from C++. // oopDesc is abstract. // (see oopHierarchy for complete oop class hierarchy) // // no virtual functions allowed ...省略 class oopDesc { friend class VMStructs; private: volatile markOop _mark; union _metadata { Klass* _klass; narrowKlass _compressed_klass; } _metadata;
先看一下注释,oopDesc代表所有object对象的最上层基类,至于后面一句我理解的话其实这一块的意思就是说用c++中的字段定义java对象的格式,,再看下面定义的几个字段,_mark 就是mark world,而_metadata里面有俩属性, _klass和_compressed_klass,前者就是正常的指针,而后者是压缩指针,压缩指针在1.8默认开启,可以通过-XX:-UseCompressedOops关闭,这里就不做详细赘述,反正记住都是class指针,指向具体的klass就行了,先看Klass的注释
// A Klass provides:
// 1: language level class object (method dictionary etc.)
// 2: provide vm dispatch behavior for the object
// Both functions are combined into one C++ class.
这段话的意思是Klass提供了语言级别的类对象(如方法,字典表等),vm调度行为再一个c++ 类里面
// One reason for the oop/klass dichotomy in the implementation is
// that we don't want a C++ vtbl pointer in every object. Thus,
// normal oops don't have any virtual functions. Instead, they
// forward all "virtual" functions to their klass, which does have
// a vtbl and does the C++ dispatch depending on the object's
// actual type. (See oop.inline.hpp for some of the forwarding code.)
// ALL FUNCTIONS IMPLEMENTING THIS DISPATCH ARE PREFIXED WITH "oop_"!
这段话的意思大致是解释为什么要把klass 和 对象实体分成两部分来实现,他说不希望一个c++的虚方法指针存放在每个对象中,从而普通的对象不存放任何虚方法,有着虚方法的klass可以根据对象的实际类型进行c++的调度。
现在我大概是明白了,这不就是多态吗,原来多态的实现是这么玩的,在编译时期,对象是不知道自己具体调用的方法的,而在实际运行时去klass中去找实际类型调用对应方法。
我们再看一下实际类加载的klass子类InstanceKlass:
class InstanceKlass: public Klass { friend class VMStructs; friend class ClassFileParser; friend class CompileReplay; protected: // Constructor 构造函数 InstanceKlass(int vtable_len, //虚方法表大小 int itable_len, //接口函数表大小 int static_field_size, //静态变量个数 int nonstatic_oop_map_size, //非静态变量个数 ReferenceType rt, //引用类型 AccessFlags access_flags, //当前类的访问修饰符(public private) bool is_anonymous); //是否匿名 。。。。。。。 // See "The Java Virtual Machine Specification" section 2.16.2-5 for a detailed description // of the class loading & initialization procedure, and the use of the states. enum ClassState { allocated, // allocated (but not yet linked) loaded, // loaded and inserted in class hierarchy (but not linked yet) linked, // successfully linked/verified (but not initialized yet) being_initialized, // currently running class initializer fully_initialized, // initialized (successfull final state) initialization_error // error happened during initialization };
protected: // Annotations for this class 类注解信息 Annotations* _annotations; // Array classes holding elements of this class. Klass* _array_klasses; // Constant pool for this class. ConstantPool* _constants; // The InnerClasses attribute and EnclosingMethod attribute. The // _inner_classes is an array of shorts. If the class has InnerClasses // attribute, then the _inner_classes array begins with 4-tuples of shorts // [inner_class_info_index, outer_class_info_index, // inner_name_index, inner_class_access_flags] for the InnerClasses // attribute. If the EnclosingMethod attribute exists, it occupies the // last two shorts [class_index, method_index] of the array. If only // the InnerClasses attribute exists, the _inner_classes array length is // number_of_inner_classes * 4. If the class has both InnerClasses // and EnclosingMethod attributes the _inner_classes array length is // number_of_inner_classes * 4 + enclosing_method_attribute_size. Array<jushort>* _inner_classes; // the source debug extension for this klass, NULL if not specified. // Specified as UTF-8 string without terminating zero byte in the classfile, // it is stored in the instanceklass as a NULL-terminated UTF-8 string char* _source_debug_extension; // Array name derived from this class which needs unreferencing // if this class is unloaded. Symbol* _array_name; // Number of heapOopSize words used by non-static fields in this klass // (including inherited fields but after header_size()). int _nonstatic_field_size; int _static_field_size; // number words used by static fields (oop and non-oop) in this klass // Constant pool index to the utf8 entry of the Generic signature, // or 0 if none. u2 _generic_signature_index; // Constant pool index to the utf8 entry for the name of source file // containing this klass, 0 if not specified. u2 _source_file_name_index; u2 _static_oop_field_count;// number of static oop fields in this klass u2 _java_fields_count; // The number of declared Java fields int _nonstatic_oop_map_size;// size in words of nonstatic oop map blocks // _is_marked_dependent can be set concurrently, thus cannot be part of the // _misc_flags. bool _is_marked_dependent; // used for marking during flushing and deoptimization
可以看到初始化的Klass的构造方法包含了像虚函数表大小,引用类型等等基本信息,再往下可以看到这里面字段增加了注解属性,当前常量池中保存的当前类引用,内部类等等。
说完klass,我们在聊一聊今天的重头戏mark word,我们首先还是先看一下作者的注释:
The markOop describes the header of an object.
markOop描述了一个对象头
//
// Note that the mark is not a real oop but just a word.
// It is placed in the oop hierarchy for historical reasons.
请注意mark只是一个word(32位机器上就是32个字节,64位就是64个字节)而不是一个真实对象,由于一些历史原因他被留在了oop结构中
//
// Bit-format of an object header (most significant first, big endian layout below):
//对象的字节格式采用大端模式(高位字节放低位地址)
// 32 bits:
// --------
// hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:23 epoch:2 age:4 biased_lock:1 lock:2 (biased object)
// size:32 ------------------------------------------>| (CMS free block)
// PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
// 64 bits:
// --------
// unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object)
// PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
// size:64 ----------------------------------------------------->| (CMS free block)
第一句就点明了它作为我们这一章的主角地位,markOop描述了一个对象头,好家伙,这个才是真正的对象头,看了一圈网上的文章,基本都是在描述mark word和klass指针之类的,但是没关系,只是定义不同。
再看下面的字节格式,我们主要看一下64位系统,根据上述提供的我们看一下这4种情况:
1.未加锁但调用了hash是这样的:
2.加了偏向锁,并偏向指定线程:
3.CMS标记:
4.回收就不谈了,肯定是空的。
这里其实存在一个问题,可以看到第二种偏向锁的场景是没办法再存hash值的,那难道我加了偏向锁就不能在获取hash值了吗,答案当然是否定的,要分析这个我们先来看一段代码:
public class Response { }
@Slf4j public class TestHeader { static Response response = new Response(); public static void aaa(Response response) throws InterruptedException { log.info(Thread.currentThread().getName() + "out" +ClassLayout.parseInstance(response).toPrintable()); synchronized (response){ log.info(Thread.currentThread().getName() + ClassLayout.parseInstance(response).toPrintable()); sleep(5000); log.info(Thread.currentThread().getName()); } } public static void main(String[] args) throws InterruptedException { Thread t1 = new Thread("t1"){ @SneakyThrows @Override public void run(){ sleep(2000); aaa(response); } }; Thread t2 = new Thread("t2"){ @SneakyThrows @Override public void run(){ aaa(response); } }; t1.start(); t2.start(); t1.join(); t2.join(); } }
这里Response是一个空对象,没有计算hash,我们看打印结果:
16:03:40.326 [t2] INFO com.example.demo.TestHeader - t2outcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:03:40.330 [t2] INFO com.example.demo.TestHeader - t2com.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 b0 59 1f (00000101 10110000 01011001 00011111) (525971461) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:03:42.368 [t1] INFO com.example.demo.TestHeader - t1outcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 b0 59 1f (00000101 10110000 01011001 00011111) (525971461) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:03:45.331 [t2] INFO com.example.demo.TestHeader - t2 16:03:45.331 [t1] INFO com.example.demo.TestHeader - t1com.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) ba 16 ee 1c (10111010 00010110 11101110 00011100) (485365434) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) //klass引用 12 4 (loss due to the next object alignment) //对齐填充
上面的对象头介绍我们可以知道,锁的标识是最后两位,而倒数第三位
我们在来介绍一下其他几个的含义:age用来记录gc年龄(由于只有4位,最多只能记录到15,因此gc年龄最大也就是15),biased_lock表示偏向锁标识,0关闭,1开启,lock标识锁状态,01偏向锁,00轻量锁,10重量锁,而当被gc标记时,后三位用来表示标记符。
然后大端模式导致我们显示出来的和想象的不一样,可以看到除了对齐填充和klass就是mark word 一共64个01,8个字节,而这8个字节按倒序排序(前8位所占的字节其实是最后一个字节),所以我们看锁标记直接看标红地方的后三位就可以了。
我们在来具体分析一下这个代码,两个线程t1和t2,t1启动后等待2秒,t2先跑,拿到锁之后歇5秒,而t1在2秒之后到达,则会进行锁竞争,我们可以看到在t2在第一次拿到锁之后,将线程id记录了下来,而t1过来抢锁之后,则由偏向锁直接升级为重量锁。
我们再试一下将休眠5s给去掉,看下执行结果:
16:45:17.873 [t2] INFO com.example.demo.TestHeader - t2outcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:45:17.876 [t2] INFO com.example.demo.TestHeader - t2com.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 48 27 1f (00000101 01001000 00100111 00011111) (522668037) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:45:17.876 [t2] INFO com.example.demo.TestHeader - t2 16:45:19.843 [t1] INFO com.example.demo.TestHeader - t1outcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 48 27 1f (00000101 01001000 00100111 00011111) (522668037) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:45:19.844 [t1] INFO com.example.demo.TestHeader - t1com.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) f0 f3 ac 1f (11110000 11110011 10101100 00011111) (531428336) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment)
前面三次还是一样的,由于t2没有休眠,所以拿完锁直接释放了,而t1休眠2秒过来抢锁,偏向已经撤销,转为轻量锁00了。
我们再看一下刚才说的hashCode的情况:
public class TestHeader { static Response response = new Response(); public static void aaa(Response response) throws InterruptedException { log.info(Thread.currentThread().getName() + "out" +ClassLayout.parseInstance(response).toPrintable()); response.hashCode(); log.info(Thread.currentThread().getName() + "hash" +ClassLayout.parseInstance(response).toPrintable()); synchronized (response){ log.info(Thread.currentThread().getName() + ClassLayout.parseInstance(response).toPrintable()); // sleep(5000); } } public static void main(String[] args) throws InterruptedException { Thread t2 = new Thread("t2"){ @SneakyThrows @Override public void run(){ aaa(response); } }; t2.start(); t2.join(); }
这里只启动了一个线程,分别在hash计算前,计算后和加锁后打印:
16:50:19.440 [t2] INFO com.example.demo.TestHeader - t2outcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:50:19.443 [t2] INFO com.example.demo.TestHeader - t2hashcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 63 bb 3f (00000001 01100011 10111011 00111111) (1069245185) 4 4 (object header) 50 00 00 00 (01010000 00000000 00000000 00000000) (80) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:50:19.444 [t2] INFO com.example.demo.TestHeader - t2com.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 10 ee 1e 1f (00010000 11101110 00011110 00011111) (522120720) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059) 12 4 (loss due to the next object alignment)
可以看到第一次就是常规的匿名可偏向,而计算完hash之后,变为不可偏向,并计算了hash值,加锁之后也不再是偏向锁,而是直接变为了轻量锁并保存线程id,再看一下,如果已经偏向某个线程后在调用hashCode的结果:
public class TestHeader { static Response response = new Response(); public static void aaa(Response response) throws InterruptedException { log.info(Thread.currentThread().getName() + "out" +ClassLayout.parseInstance(response).toPrintable()); synchronized (response){ log.info(Thread.currentThread().getName() + ClassLayout.parseInstance(response).toPrintable()); response.hashCode(); log.info(Thread.currentThread().getName() + "hash" +ClassLayout.parseInstance(response).toPrintable()); // sleep(5000); } } public static void main(String[] args) throws InterruptedException { Thread t2 = new Thread("t2"){ @SneakyThrows @Override public void run(){ aaa(response); } }; t2.start(); t2.join(); } }
执行结果:
16:59:12.601 [t2] INFO com.example.demo.TestHeader - t2outcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 9f c1 00 f8 (10011111 11000001 00000000 11111000) (-134168161) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:59:12.604 [t2] INFO com.example.demo.TestHeader - t2com.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 05 68 40 1f (00000101 01101000 01000000 00011111) (524314629) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 9f c1 00 f8 (10011111 11000001 00000000 11111000) (-134168161) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 16:59:12.604 [t2] INFO com.example.demo.TestHeader - t2hashcom.example.demo.Response object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 2a 12 d4 1c (00101010 00010010 11010100 00011100) (483660330) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 9f c1 00 f8 (10011111 11000001 00000000 11111000) (-134168161) 12 4 (loss due to the next object alignment)
可以看到由偏向锁直接升级为重量锁(10)。
总结:
对象头其实在我看来就是一个死的概念,更多的时在gc或者是锁甚至是以后其他的操作,在jdk源码和jvm中看到了很多对于一个int值或者其他多字节的字段进行拆解操作,比如像jdk中的读写锁,便是用高低位分别表示,,而像这里也是用了一个word表示出那么多的花样,这一篇本来是不打算写的,但是当我要写synchronized的源码分析时,写了一小段突然发现卡壳了,完全没有办法绕开它,不过这也说明了对象头的重要性吧。
对于锁的升级,从上面的例子也可以看出默认情况下为匿名可偏向(这里是默认去除偏向延迟的,可以加上-XX:BiasedLockingStartupDelay=0),当有一个线程过来时,会偏向当前线程,而多个线程交替执行(即一个线程执行完再执行下一个,永远不会出现两个线程同时在锁临界区内),则会升级为轻量锁,而多个线程竞争(两个或以上线程同时在临界区中),而在计算hash值之后,匿名偏向计算hash后加锁则升级为轻量锁,加锁后计算hash则直接升级为重量锁。