Java编译
原文地址:http://matt.might.net/articles/compiling-to-java/
在我们的高级编译器类中,我们涵盖了一系列中间语言和目标语言。C 语言的流行是由于性能原因 , 但是 Java 具有常常被忽视的优点。Java的一些使用率较低的特性使得它很容易成为高级语言构造的目标,例如,词法范围的闭包成为匿名类。编写一个针对Java而不是c的编译器大约需要三分之一到一半的时间(和代码)。(Scheme的宏系统可以很容易地将完整的Scheme解生成核心Scheme。)因此,本页底部的参考编译器只有400多行高度注释的代码。(同伴 C 编译器(companion compiler into C)刚超过 1000 行的代码。 )(In my advanced compilers class, we cover a range of intermediate and target languages. C is a popular target language for performance reasons, but Java has often-overlooked advantages. Some of Java's less-utilized features conspire to make it an easy target for high-level language constructs, e.g., lexically scoped closures become anonymous classes. It takes roughly a third to half the time (and code) to write a compiler that targets Java instead of C. In fact, we can compile the essential core of Scheme to Java with purely local transforms. (Scheme's macro system can easily desugar full Scheme into core Scheme.) As a result, the reference compiler at the bottom of this page is barely over 400 lines of highly-commented code. (The)(is just over 1000 lines of code.))
帮助资源 :(Helpful resources:)
- 官方Java 语言规范(Java Language Specification)覆盖的角落和缝隙的语言 , 而很少或从不使用在编程、编译期间变得非常方便。(The official)(covers the nooks and crannies of the language which, while rarely or never used in ordinary programming, become extremely convenient during compilation.)
相关博客文章 :(Related blog posts:)
跳转到 :(Jump to:)
- 利弊(Pros and cons)
- 与糖核心方案(Core Scheme with Sugar)
- 在 Java 编码值(Encoding Scheme values in Java)
- 整数(Integers)
- 原语(Primitives)
- 变量引用(Variable references)
- 术语 λ(Lambda terms)
- 条件句(Conditionals)
- 可变变量(Mutable variables)
- 变量结合(Variable binding)
- 递归(Recursion)
- 测序(Sequencing)
- 代码(Code)
- 外部资源(External resources)
1.利与弊
除了这些实现好处之外,Java还提供了对Java库系统的语言访问、JVM的可移植性、即时编译/优化和垃圾收集。Java中缺少尾部调用优化是直接编译策略的一个缺点,但是可以减少直接编译到Java的操作,从而减少使用蹦床执行尾部调用优化。(On top of these implementation benefits, targeting Java gives the language access to the Java library system, the portability of the JVM, just-in-time compilation/optimization and garbage collection. The lack of tail-call optimization in Java is a downside with the direct compilation strategy, but it's possible to do less direct compilation into Java that performs tail-call optimization with trampolining.)
更广泛地说,在我的课堂上,我教授语言实现是关于实现复杂性和性能之间的权衡::(More broadly, in my class, I teach that language implementation is about trade-offs between implementation complexity and performance:)
- 我认为 , 第一 , 你应该尝试用你的语言的解释器。(I argue that first, you should try an interpreter for your language.)
- 如果这不是足够快的话 , 试试 SiCp 风格优化解释器。(If that's not fast enough, try an SICP-style optimizing interpreter.)
- 如果这不是足够好 , 尝试以 Java 编写。(If that's not good enough, try compiling to Java.)
- 如果这样还不足够快 , 以 C 编写。(If that's still not fast enough, try compiling to C.)
- 如果这样还不能过慢 , 后续传递风格尝试编译组件。(If that's still too slow, try continuation-passing-style compilation to assembly.)
- 如果你需要更多的速度、开始做基本的编译器优化。(If you need more speed, start doing basic compiler optimizations.)
- 如果你仍然不走运 , 开始做静态分析的编译器优化。(If you're still out of luck, start doing static-analysis-driven compiler optimizations.)
每一次从N(n)以N + 1(n + 1)在此 , 性能就会上涨 , 但实现代码尺寸和复杂度会增加约 2 倍。事实证明,Java占据了一个最佳位置:实现复杂度相对较低,但性能上获得的百分比收益最大。(Each time you go down from)(to)(on this ladder, performance will go up, but implementation code size and complexity will go up by about a factor of two. It turns out that Java occupies a sweet spot: relatively low implementation complexity, but the biggest percentage-wise gain in performance.)
2.装饰核心方案(Core Scheme with Sugar)
我创建的编译器是针对核心方案的,将这些相同的技术应用于编译Python或Ruby这样的语言并不难: :(The compiler I created is for a core Scheme, but it would not be hard to apply these same techniques to compiling a language like Python or Ruby:)
<exp> ::= <const> | <prim> | <var> | (lambda (<var> ...) <exp>) | (if <exp> <exp> <exp>) | (set! <var> <exp>) | (let ((<var> <exp>) ...) <exp>) | (letrec ((<var> (lambda (<var>...) <exp>))) <exp>) | (begin <exp> ...) | (<exp> <exp> ...) <const> ::= <int>
Scheme编译器可以很容易地使用宏将完整的Scheme提取到这种语言中,或者实际上,使用更简单的宏。许多实际的Scheme编译器就是这样做的。(A Scheme compiler could easily use macros to desugar full Scheme into this language, or in fact, an even simpler one. Many real Scheme compilers do exactly that.)
3.在 Java 编码值(Encoding Scheme values in Java)
第一任务是在以 Java 编译的 Java 编码水平对应的值的方案。为此 , 我创建 Java 接口Value
方案 , 并从该所有继承值。子类别和接口。Value
包括VoidValue
,BooleanValue
,IntValue
,ProcValue
和Primitive
。Java 代码存根也修理 :RuntimeEnvironment
课堂中 , 结合所有的原语 , 如加法、减法和 I / O 以 Java 的名称。编译后的程序应该继承RuntimeEnvironment
。(The first task in compiling to Java is to come up with a Java-level encoding of the corresponding Scheme values. To do that, I created the Java interface)(, and had all Scheme values inherit from that. The sub-classes and -interfaces of)(include)(,)(,)(,)(and)(. The Java stub code also provies a)(class, which binds all of the top-level primitives like addition, subtraction and I/O to Java names. Compiled programs are supposed to inherit from)
该顶层具有编译功能通常感觉 schemish 调度 :(The top-level compile function has a typically Schemish dispatching feel:)
; java-compile-exp : exp -> string (define (java-compile-exp exp) (cond ; core forms: ((const? exp) (java-compile-const exp)) ((prim? exp) (java-compile-prim exp)) ((ref? exp) (java-compile-ref exp)) ((lambda? exp) (java-compile-lambda exp)) ((if? exp) (java-compile-if exp)) ((set!? exp) (java-compile-set! exp)) ; syntactic sugar: ((let? exp) (java-compile-exp (let=>lambda exp))) ((letrec1? exp) (java-compile-exp (letrec1=>Y exp))) ((begin? exp) (java-compile-exp (begin=>let exp))) ; applications: ((app? exp) (java-compile-app exp))))
所以 , 编译过程分解成单独的构建体。
4.整数(Integers)
编译到一个整数IntValue
对象 , 而不是自己。例如 , 方案级3
编译为new IntValue(3)
在 Java 中。(Integers compile into a)(objects, rather than to themselves. For example, Scheme-level)(compiles to)(in Java.)
5.原语(Primitives)
原语在表和他们的翻译RuntimeEnvironment
名称 :(Primitives are looked up in table and translated into their)(name:)
(define (java-compile-prim p) (cond ((eq? '+ p) "sum") ((eq? '- p) "difference") ((eq? '* p) "product") ((eq? '= p) "numEqual") ((eq? 'display p) "display") (else (error "unhandled primitive " p))))
6.变量引用(Variable references)
变量引用必须是名称 - 血肉模糊 , 由于方案标识符比 Java identfiers 。可变变量 (那些设置 !d ’) 的方式也不同。这些包裹ValueCell
对象和前缀m_
因为 , 被匿名函数变量在 Java 中标记final
。预编译代码走找到的所有可变的变量。(Variable references have to be name-mangled, since Scheme identifiers are richer than Java identfiers. Mutable variables (those which are set!'d) are also handled differently. These are wrapped in)(objects and prefixed with)(, since variables captured by anonymous functions in Java have to be marked)(. A pre-compilation code walk finds all of the mutable variables.)
7.术语 λ(Lambda terms)
λ 项被匿名类。例如 ,(lambda (v1 ... vN) exp)
变为 :(Lambda terms are compiled into anonymous classes. For example,)(becomes:)
new NullProcValueN () { public apply (final Value [mangle v1],...,final Value [mangle vN]) { // for each mutable formal vi: final ValueCell m_[mangle vi] = new ValueCell([mangle vi]) ; return [compile exp] ; }
有一个NullProcValueN
对于每个程序格式N(N)。在NullProcValue
默认类别提供的一些定义中定义的方法Value
。
显然 ,[mangle v]
代表从变量名v
, 和[compile exp]
表示文本的编译exp
。(stands for the compiled text of)(.)(Clearly,)(stands for the mangled name of the variable)(, and)
8.条件句(Conditionals)
形式(if exp1 exp2 exp3)
实际上编译为三元运算符?:
代替if () {} else {}
:(The form)(actually compiles into the ternary operator)(instead of)(:)
([compile exp1]) ? ([compile exp2]) : ([compile exp3])
9.可变变量(Mutable variables)
该构建体(set! var exp)
依赖于 λ 变量的术语汇编和引用包var
在ValueCell
因此 , 编译如下 :(The construct)(relies on the compilation of lambda terms and variables references to wrap)(in a)(, so that it compiles to:)
VoidValue.Void(m_[mangle var].value = [compile exp])
10.变量结合(Variable binding)
让该构建 desugars 应用的 lambda 项。也就是说 ,(let ((v e) ...) body)
变为 :(The let construct desugars into the application of a lambda term. That is,)(becomes:)
((lambda (v ...) body) e ...)
11.递归(Recursion)
Letrec
脱糖可以进入 “设置” 或 Y Combinator 。我选择了Y Combinator(Y combinator)只是想表示它可以使用而没有副作用。实际上 , 编译器产生一个新的 Y Combinator (on the fly) 以使其匹配的递归过程 :(can be desugared into "lets and sets" or the Y combinator. I opted for the)(just to show that it can be done without using side effects. Actually, the compiler generates a new Y combinator on the fly so that it matches the arity of the recursive procedure:)
; xargs : nat -> list[symbol] (define (xargs n) (if (<= n 0) '() (cons (string->symbol (string-append "x" (number->string n))) (xargs (- n 1))))) ; Yn generates the Y combinator for n-arity procedures. (define (Yn n) `((lambda (h) (lambda (F) (F (lambda (,@(xargs n)) (((h h) F) ,@(xargs n)))))) (lambda (h) (lambda (F) (F (lambda (,@(xargs n)) (((h h) F) ,@(xargs n))))))))
在 Java 中 , 在 Y Combinator 的辩论程序为 :(In Java, the Y combinator for one-argument procedures ends up as:)
((ProcValue1)(new NullProcValue1 () { public Value apply(final Value h) { return new NullProcValue1 () { public Value apply(final Value F) { return ((ProcValue1)(F)).apply(new NullProcValue1 () { public Value apply(final Value x) { return ((ProcValue1)(((ProcValue1)(((ProcValue1)(h)).apply(h) )).apply(F) )).apply(x) ; }} ) ; }} ; }} )).apply(new NullProcValue1 () { public Value apply(final Value h) { return new NullProcValue1 () { public Value apply(final Value F) { return ((ProcValue1)(F)).apply(new NullProcValue1 () { public Value apply(final Value x) { return ((ProcValue1)(((ProcValue1)(((ProcValue1)(h)).apply(h) )).apply(F) )).apply(x) ; }} ) ; }} ; }} )
12.测序(Sequencing)
语句排序为脱糖让未使用的绑定 - 变量。也就是说 ,(begin e1 ... eN)
变为 :(Sequencing statements are desugared into let-bindings of unused variables. That is,)(becomes:)
(let ((_ e1)) (begin e2 ... eN))