JVM 内部原理(七)— Java 字节码基础之二

JVM 内部原理(七)— Java 字节码基础之二

介绍

版本:Java SE 7

为什么需要了解 Java 字节码?

无论你是一名 Java 开发者、架构师、CxO 还是智能手机的普通用户,Java 字节码都在你面前,它是 Java 虚拟机的基础。

总监、管理者和非技术人员可以放轻松点:他们所要知道的就是开发团队在正在进行下一版的开发,Java 字节码默默的在 JVM 平台上运行。

简单地说,Java 字节码是 Java 代码(如,class 文件)的中间表现形式,它在 JVM 内部执行,那么为什么你需要关心它?因为如果没有 Java 字节码,Java 程序就无法运行,因为它定义了 Java 开发者编写代码的方式。

从技术角度看,JVM 在运行时将 Java 字节码以 JIT 的编译方式将它们转换成原生代码。如果没有 Java 字节码在背后运行,JVM 就无法进行编译并映射到原生代码上。

很多 IT 的专业技术人员可能没有时间去学习汇编程序或者机器码,可以将 Java 字节码看成是某种与底层代码相似的代码。但当出问题的时候,理解 JVM 的基本运行原理对解决问题非常有帮助。

在本篇文章中,你会知道如何阅读与编写 JVM 字节码,更好的理解运行时的工作原理,以及结构某些关键库的能力。

本篇文章会包括一下话题:

  • 如何获得字节码列表
  • 如何阅读字节码
  • 语言结构是如何被编译器映射的:局部变量,方法调用,条件逻辑
  • ASM 简介
  • 字节码在其他 JVM 语言(如,Groovy 和 Kotlin)中是如何工作的

目录

  • 为什么需要了解 Java 字节码?
  • 第一部分:Java 字节码简介
    • 基础
    • 基本特性
    • JVM 栈模型
      • 方法体里面是什么?
      • 局部栈详解
      • 局部变量详解
      • 流程控制
      • 算术运算及转换
      • new & &
      • 方法调用及参数传递
  • 第二部分:ASM
    • ASM 与工具
  • 第三部分:Javassist
  • 总结

ASM

ObjectWeb ASM 事实上是 Java 字节码分析和操作的标准。ASM 通过它面向访问者的 API 暴露 Java 类的内部聚合组件。API 本身不是很广泛 - 只需要使用一部分类,就可以实现几乎所有想要的。ASM 可以用来修改二进制字节码,以及生成新的字节码。例如,ASM 可以应用到新的编程语言(Groovy、Kotlin、Scala),将高级程序语言的语法编译成可供 JVM 执行的字节码。

“We didn’t even consider using anything else instead of ASM, because other projects at JetBrains use ASM successfully for a long time.”

– ANDREY BRESLAV, KOTLIN


My first touch with bytecode first hand was when I started helping in the Groovy project and by then we settled to ASM. ASM can do what is needed, is small and doesn’t try to be too smart to get into your way. ASM tries to be memory and performance effective. For example you don’t have to create huge piles of objects to create your bytecode. It was one of the first with support for invokedynamic btw. Of course it has its pro and con sides, but all in all I am happy with it, simply because I can get the job done using it.

– JOCHEN THEODOROU, GROOVY


I mostly know about ASM, just because it’s the one used by Groovy 😃 However, knowing that it’s backed by people like Rémi Forax, who is a major contributor in the JVM world is very important and guarantees that it follows the latest improvements.

– CÉDRIC CHAMPEAU, GROOVY

为了提供一个合适的介绍,我们会用 ASM 库生成一个 “Hello World” 的示例,并循环打印任意数量的的短语。

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println(“Hello, World!”);
    }
}

要生成与示例对应的字节码,通常会创建 ClassWriter ,访问结构 - 字段、方法等,在任务完成后,输出最终的字节。

首先,创建一个 ClassWriter 实例:

ClassWriter cw = new ClassWriter(
        ClassWriter.COMPUTE_MAXS |
        ClassWriter.COMPUTE_FRAMES);

ClassWriter 实例可以通过一些常量来实例化,这些常量用来表示实例应该具有的行为。COMPUTE_MAXS 告诉 ASM 自动计算栈的最大值以及最大数量的方法的本地变量。COMPUTE_FRAMES 标识让 ASM 自动计算方法的栈桢。

要定义类就必须调用 ClassWriter 上的 visit() 方法:

cw.visit(
    Opcodes.V1_6,
    Opcodes.ACC_PUBLIC,
    "HelloWorld",
    null,
    "java/lang/Object",
    null);

下一步,我们要生成缺省的构造器和 main 方法。如果跳过生成缺省构造器,也不会发生什么坏事,但最好还是生成一个。

 MethodVisitor constructor =
    cw.visitMethod(
          Opcodes.ACC_PUBLIC,
          "",
          "()V",
          null,
          null);

 constructor.visitCode();

 //super()
 constructor.visitVarInsn(Opcodes.ALOAD, 0);
 constructor.visitMethodInsn(Opcodes.INVOKESPECIAL, 
    "java/lang/Object", "", "()V");
 constructor.visitInsn(Opcodes.RETURN);

 constructor.visitMaxs(0, 0);
 constructor.visitEnd();

首先用 visitMethod() 方法创建构造器。接着,我们通过调用 visitCode() 方法生成构造器体。然后调用 visitMaxs() 让 ASM 重新计算堆栈的大小。如我们指出的那样 ASM 可以自动为我们使用 ClassWriter 构造器内的 COMPUTE_MAXS 标识,我们可以随机传递参数到 visitMaxs() 方法里。最后,通过 visitEnd() 方法完成生成方法字节码的过程。

main 方法的 ASM 代码如下:

MethodVisitor mv = cw.visitMethod(
    Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC,
    "main", "([Ljava/lang/String;)V", null, null);

mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", 
    "out", "Ljava/io/PrintStream;");
mv.visitLdcInsn("Hello, World!");
mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream",
    "println", "(Ljava/lang/String;)V");
mv.visitInsn(Opcodes.RETURN);

mv.visitMaxs(0, 0);
mv.visitEnd();

通过再次调用 visitMethod() ,我们用 name、modifiers 和 signature 生成新的方法定义。然后和生成构造器的方式一样使用 visitCode()、visitMaxs() 和 visitEnd() 方法。

可以发现代码里都是常量、“标识(flags)” 和 “指示器(indicators)”,最终的代码不太容易通过肉眼来理解。同时,为了写这些代码,需要关注字节码执行计划才能生成正确版本的字节码。这也让写这种代码非常复杂。这也是为什么每个人都有他们自己的方式使用 ASM 。


Our approach is using Kotlin’s ability to enhance existing Java APIs: we created some helper functions (many of them extension functions) that make ASM APIs look very much like a bytecode manipulation DSL.

– ANDREY BRESLAV, KOTLIN


I built some meta api into the compiler. For example it let’s you do a swap, regardless of the involved types. It was not in the links above, but I assume you know, that double and long consume two slots, while anything else does only one. The swap instruction handles only the 1-slot version. So if you have to swap an int and a long, a long and an int or a long and a long, you get a different set of instructions. I also added a helper API for local variables, to avoid to have to manage the index. If you want more nice looking code… Cedric wrote a Groovy DSL to generate bytecode. It is still the bytecode more or less, but less method around to make it less clear.

– JOCHEN THEODOROU, GROOVY


ASM is a nice low-level API, but I think we miss an up-to-date higher level API, for example for generating proxies and so on. In Groovy we want to limit the number of dependencies we add to the project, so it would be cool if ASM provided this out- of-the-box, but the general idea behind ASM is more to stick with a low level API.

– CÉDRIC CHAMPEAU, GROOVY

ASM 与 Tooling

工具对于学习和使用字节码有很大的帮助。学习使用 ASM 最好的方式就是写一个与想要生成的 Java 源文件等价的文件,然后使用 Eclipse 字节码概览(Bytecode Outline)插件的 ASMifier 模式(或 ASMifier 工具)查看等价的 ASM 编码。如果想要实现一个类转换工具,写两个 Java 源文件(在转换之前与之后)然后用插件的比较视图以 ASMifier 模式比较等价的 ASM 编码。

Eclipse 字节码概览插件

image

对于 IntelliJ IDEA 用户,ASM 字节码插件也可以从插件库中获取,并且非常容易使用。右键点击源文件然后选择 Show Bytecode 概览 - 这样可以打开一个 ASMifier 工具生成的视图。

ASM outline plugin in IntelliJ IDEA

image

你也可以直接应用 ASMifier ,不需要 IDE 插件,它是 ASM 库的一部分:

$java -classpath "asm.jar;asm-util.jar" \
       org.objectweb.asm.util.ASMifier \
       HelloWorld.class

We use ASM bytecode outline for IntelliJ IDEA and our own similar plugin that displays bytecodes generated by our compiler.

– ANDREY BRESLAV, KOTLIN


Actually, I wrote the “bytecode viewer” plugin for IntelliJ IDEA, and I’m using it quite often 😃 On the Groovy side, I also use the AST browser view, which provides a bytecode view too, although it seriously needs improvements.

– CÉDRIC CHAMPEAU, GROOVY


My tools are mostly org.objectweb.asm.util.Textifier and org.objectweb.asm.util.CheckClassAdapter. Some time ago I also wrote a tool helping me to visualize the bytecode and the stack information. It allows me to go through the bytecode and see what happens on the stack. And while bytecode used to be a pita to read for me in the beginning, I have seen so much of it, that I don’t even use that tool anymore, because I am usually faster just looking at the text produced by Textifier.

That is not supposed to tell you I am good at generating bytecode… no no.. I wouldn’t be able to read it so good if I had not the questionable pleasure of looking at it countless times, because there again was a pop of an empty stack or something like that. It is more that the problems I have to look for tend to repeat themselves and I have a whit of what to look for even before I fire up Textifier.

– JOCHEN THEODOROU, GROOVY

来自字节码专家的有趣故事

我们问 Andrey, Jochen 和 Cédric 分享一些他们 Java 字节码 的经验。尽管词 “bytecode” 和 “fun” 可能在一起不太合适,但这些热心的朋友仍然分享了一些案例:


Hmm… bytecode and fun? What a strange combination of words in the same sentence 😉

Well.. one time maybe a little… I told you about the API I use to do a swap. In the beginning it was not working properly of course. That was partially due to me misunderstanding one for those DUP instructions, but mainly it was because I had a simple bug in my code in which I execute the 1-2 swap instead of the 2-1 swap (meaning swapping 1 and 2 slot operands). So I was looking at the code, totally confused, thinking this should work, looking at my code… then thinking I made it wrong with those dups and replacing the code with my new understanding…

All the while the code was not really all that wrong, only the swap cases where swapped. Anyway… after about a full day of getting a headache from too much looking at the bytecode I finally found my mistake and looked at the code to find it looks almost the same as before… and then it dawned on me, that it was only that simple mistake, that could have been corrected in a minute and which took me a full day. Not really funny, but there I laughed a bit at myself actually.

– JOCHEN THEODOROU, GROOVY


Actually, the funniest thing was when I wrote the “bytecode DSL” for Groovy, which allows you to write bytecode directly in the body of a method, using a DSL which is very close to what the ASM outline provides, and a nicer “groovy flavoured” DSL too. Although I started this project as a proof-of-concept and a personal experiment, I received a lot of feedback and interest about it.

Today I think it’s a very simple way to have people test bytecode directly, for example for students. It makes writing bytecode a lot easier than using ASM directly. However, I also received a lot of complains, people saying I opened the Pandora box and that it would produce unreadable code in production 😄 (and I would definitely not recommend using it in production). Yet, it’s been more than one year the project is out, and I haven’t heard of anyone using it, so probably bytecode is really not that fun!

– CÉDRIC CHAMPEAU, GROOVY


Many fun things come in connection with Android: Dalvik is very picky about your bytecode conformance to the JVM spec. And HotSpot doesn’t care a bit about many of these things. We were running smoothly on HotSpot for a long time, without knowing that we had so many things done wrong. Now we use Dalvik’s verifier to check every class file we generate, to make sure nobody forgot to put ACC_SUPER on a class, proper offsets to a local variable table, and things like that.

We also came across a few interesting things in HotSpot, for example, if you call an absent method on an array object (like array.set()), you don’t get a NoSuchMethodError, or anything like that. What you get (what we got on a HotSpot we had a year ago, anyway) is… a native crash. Segmentation fault, if I am not mistaken. Our theory is that the vtable for arrays if so optimized that it is not even there, and lookup crashes because of that.

– ANDREY BRESLAV, KOTLIN

结语

JVM 是工程的杰作,和其他任何美妙的机器一样,能够理解和欣赏底层的技术非常重要。Java 字节码是一种机器码,它让 JVM 解释和编译如 Java、Scala、Groovy、Kotlin 以及更多的程序语言编码,从而为用户生产出更多的应用。

Java 字节码在大多数时候悄悄的在 JVM 中在后台运行,所以一般程序员很少考虑到它。但它是 JVM 上执行的指令,所以它对于某些领域的工具和程序分析非常重要,应用程序可以修改字节码从而根据应用的领域调整它们的行为。任何试图开发性能分析工具,模取(mocking)框架,AOP 和其他工具的开发者都需要彻底的理解 Java bytecode

参考

参考来源:

The Java® Language Specification - Java SE 7 Edition

The Java® Language Specification - Chapter 6. The Java Virtual Machine Instruction Set

2015.01 A Java Programmer’s Guide to Byte Code

2012.11 Mastering Java Bytecode at the Core of the JVM

2011.01 Java Bytecode Fundamentals

2001.07 Java bytecode: Understanding bytecode makes you a better programmer

Wiki: Java bytecode

Wiki: Java bytecode instruction listings

结束

posted @ 2016-12-23 15:26  Richaaaard  阅读(2218)  评论(0编辑  收藏  举报