LLVM IR介绍及各种示例
LLVM IR介绍及各种示例
参考文献链接
https://www.cnblogs.com/Tu9oh0st/p/16358531.html
https://github.com/llir/llvm
https://github.com/Evian-Zhang/llvm-ir-tutorial/tree/master/code
- 优化是对LLVM IR进行操作:
什么是LLVM IR
- LLVM IR 是一门低级语言,语法类似于汇编
- 任何高级编程语言(如C++)都可以用LLVM IR表示
- 基于LLVM IR可以很方便地进行代码优化
- 第一种是人类可以阅读的文本形式,文件后缀为.ll
- 第二种是易于机器处理的二进制格式,文件后缀为.bc
LLVM IR的两种表示方法
LLVM IR结构
- 源代码被编译为LLVM IR后,具有以下结构:
LLVM IR结构:模块 Module
- 一个源代码对应LLVM IR中的一个模块。
- 头部信息包含程序的目标平台,如X86、ARM等,和一些其他信息。
- 全局符号包含全局变量、函数的定义与声明。
LLVM IR结构:函数 Function
- LLVM IR中的函数表示源代码中的某个函数。
- 参数,顾名思义为函数的参数。
- 一个函数由若干基本块组成,其中函数最先执行的基本块为入口块。
LLVM IR结构:基本块 BasicBlock
- 一个基本块由若干个指令和标签组成。
- 正常情况下,基本块的最后一条指令为跳转指令(br或者switch),或返回指令(retn),也叫作终结指令(Terminator Instruction)。
- PHI指令是一种特殊的指令。
LLVM IR结构
- 了解LLVM IR的结构是我们学习代码混淆的基础,举个例子
- 以函数为基本单位的混淆:控制流平坦化
- 以基本块为基本单位的混淆:虚假控制流
- 以指令为基本单位的混淆:指令替代
- 终结指令 Terminator Instructions
- ret指令
- 函数返回指令,对应C/C++中的return。
- br指令
- br是“分支”的英文branch的缩写,分为非条件分支和条件分支,对应C/C++的if语句
- 无条件分支类似有x86汇编中的jmp指令,条件分支类似于x86汇编中的jnz,je等条件跳转指令。
- ret指令
- 比较指令
- icmp指令
- 整数或指针的比较指令
- 条件cond可以是eq(相等),ne(不相等),ugt(无符号相等)
- fcmp指令
- 浮点数的比较指令
- 条件cond可以是oeq(ordered and equal),ueq(unordered or equal)
- switch指令
- 分支指令,可看做是br指令的升级版,支持的分支更多,但使用也更复杂,对应C/C++中的switch。
- icmp指令
- 二元运算 Binary Operations
- add指令
- sub指令
- mul指令
- udiv指令
- 无符号整数除法指令
- sdiv指令
- 有符号整数除法指令
- urem指令
- 无符号整数取余指令
- srem指令
- 有符号整数取余指令
- 按位二元运算 Bitwise Binary Operations
- shl指令
- 整数左移操作指令
- lshr指令
- 整数右移指令
- ashr指令
- 整数算数右移指令
- and指令
- 整数按位与运算指令
- or指令
- 整数按位或运算指令
- xor指令
- 整数按位异或运算指令
- shl指令
- 内存访问和寻址操作 Memory Access and Addressing Operations
- alloca指令
- 内存分配指令,在栈中分配一块空间并获得指向该空间的指针,类似与C/C++中的malloc函数
- store指令
- 内存存储指令,向指针指向的内存中存储数据,类似与C/C++中的指针引用后的赋值操作
- alloca指令
- 类型转换操作 Conversion Operations
- trunc..to指令
- 截断指令,将一种类型的变量截断为另一种类型的变量。
- zext..to指令
- 零扩展指令,将一种类型的变量拓展为另一种类型的变量,高位补0。
- sext..to指令
- 符号位拓展指令,通过复制符号位(最高位)将一种类型的变量拓展为另一种类型的变量。
- trunc..to指令
- 其他操作 Other Operations
- phi指令:由静态单赋值引起的问题
- select指令
- ? : 三元运算符
- call指令
- call指令用来调用某个函数,对应C/C++中的函数调用,与x86汇编中的call指令类似。
LLVM IR常用指令讲解
Library for interacting with LLVM IR in pure Go.
Introduction
Installation
go get -u github.com/llir/llvm/...
Versions
Map between llir/llvm tagged releases and LLVM release versions.
- llir/llvm v0.3.7: LLVM 15.0 (yet to be released)
- llir/llvm v0.3.6: LLVM 14.0
- llir/llvm v0.3.5: LLVM 13.0
- llir/llvm v0.3.4: LLVM 12.0
- llir/llvm v0.3.3: LLVM 11.0
- llir/llvm v0.3.2: LLVM 10.0
- llir/llvm v0.3.0: LLVM 9.0
Users
- decomp: LLVM IR to Go decompiler by @decomp.
- geode: Geode to LLVM IR compiler by @nickwanninger.
- leaven: LLVM IR to Go decompiler by @andybalholm.
- slate: Slate to LLVM IR compiler by @nektro.
- tre: Go to LLVM IR compiler by @zegl.
- uc: µC to LLVM IR compiler by @sangisos and @mewmew.
- B++: B++ to LLVM IR compiler by @Nv7-Github.
Usage
Input example - Parse LLVM IR assembly
// This example parses an LLVM IR assembly file and pretty-prints the data types
// of the parsed module to standard output.
package main
import (
"log"
"github.com/kr/pretty"
"github.com/llir/llvm/asm"
)
func main() {
// Parse the LLVM IR assembly file `foo.ll`.
m, err := asm.ParseFile("foo.ll")
if err != nil {
log.Fatalf("%+v", err)
}
// Pretty-print the data types of the parsed LLVM IR module.
pretty.Println(m)
}
Output example - Produce LLVM IR assembly
// This example produces LLVM IR code equivalent to the following C code, which
// implements a pseudo-random number generator.
//
// int abs(int x);
//
// int seed = 0;
//
// // ref: https://en.wikipedia.org/wiki/Linear_congruential_generator
// // a = 0x15A4E35
// // c = 1
// int rand(void) {
// seed = seed*0x15A4E35 + 1;
// return abs(seed);
// }
package main
import (
"fmt"
"github.com/llir/llvm/ir"
"github.com/llir/llvm/ir/constant"
"github.com/llir/llvm/ir/types"
)
func main() {
// Create convenience types and constants.
i32 := types.I32
zero := constant.NewInt(i32, 0)
a := constant.NewInt(i32, 0x15A4E35) // multiplier of the PRNG.
c := constant.NewInt(i32, 1) // increment of the PRNG.
// Create a new LLVM IR module.
m := ir.NewModule()
// Create an external function declaration and append it to the module.
//
// int abs(int x);
abs := m.NewFunc("abs", i32, ir.NewParam("x", i32))
// Create a global variable definition and append it to the module.
//
// int seed = 0;
seed := m.NewGlobalDef("seed", zero)
// Create a function definition and append it to the module.
//
// int rand(void) { ... }
rand := m.NewFunc("rand", i32)
// Create an unnamed entry basic block and append it to the `rand` function.
entry := rand.NewBlock("")
// Create instructions and append them to the entry basic block.
tmp1 := entry.NewLoad(i32, seed)
tmp2 := entry.NewMul(tmp1, a)
tmp3 := entry.NewAdd(tmp2, c)
entry.NewStore(tmp3, seed)
tmp4 := entry.NewCall(abs, tmp3)
entry.NewRet(tmp4)
// Print the LLVM IR assembly of the module.
fmt.Println(m)
}
Analysis example - Process LLVM IR
// This example program analyses an LLVM IR module to produce a callgraph in
// Graphviz DOT format.
package main
import (
"fmt"
"strings"
"github.com/llir/llvm/asm"
"github.com/llir/llvm/ir"
)
func main() {
// Parse LLVM IR assembly file.
m, err := asm.ParseFile("foo.ll")
if err != nil {
panic(err)
}
// Produce callgraph of module.
callgraph := genCallgraph(m)
// Output callgraph in Graphviz DOT format.
fmt.Println(callgraph)
}
// genCallgraph returns the callgraph in Graphviz DOT format of the given LLVM
// IR module.
func genCallgraph(m *ir.Module) string {
buf := &strings.Builder{}
buf.WriteString("digraph {\n")
// For each function of the module.
for _, f := range m.Funcs {
// Add caller node.
caller := f.Ident()
fmt.Fprintf(buf, "\t%q\n", caller)
// For each basic block of the function.
for _, block := range f.Blocks {
// For each non-branching instruction of the basic block.
for _, inst := range block.Insts {
// Type switch on instruction to find call instructions.
switch inst := inst.(type) {
case *ir.InstCall:
callee := inst.Callee.Ident()
// Add edges from caller to callee.
fmt.Fprintf(buf, "\t%q -> %q\n", caller, callee)
}
}
// Terminator of basic block.
switch term := block.Term.(type) {
case *ir.TermRet:
// do something.
_ = term
}
}
}
buf.WriteString("}")
return buf.String()
}
3 lines (3 sloc) 26 Bytes
窗体顶端
窗体底端
int main() { |
|
return 0; |
|
} |
7 lines (6 sloc) 149 Bytes
窗体顶端
窗体底端
; main.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
define i32 @main() { |
|
ret i32 0 |
|
} |
14 lines (12 sloc) 262 Bytes
窗体顶端
窗体底端
; extract_insert_value.ll |
|
%MyStruct = type { |
|
i32, |
|
i32 |
|
} |
|
@my_struct = global %MyStruct { i32 1, i32 2 } |
|
|
|
define i32 @main() { |
|
%1 = load %MyStruct, %MyStruct* @my_struct |
|
%2 = extractvalue %MyStruct %1, 1 |
|
%3 = insertvalue %MyStruct %1, i32 233, 1 |
|
|
|
ret i32 0 |
|
} |
9 lines (7 sloc) 198 Bytes
窗体顶端
窗体底端
; global_variable_test.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
@global_variable = global i32 0 |
|
|
|
define i32 @main() { |
|
ret i32 0 |
|
} |
41 lines (37 sloc) 1.03 KB
窗体顶端
窗体底端
; many_registers_test.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
@global_variable = global i32 0 |
|
|
|
define i32 @main() { |
|
%1 = add i32 1, 2 |
|
%2 = add i32 1, 2 |
|
%3 = add i32 1, 2 |
|
%4 = add i32 1, 2 |
|
%5 = add i32 1, 2 |
|
%6 = add i32 1, 2 |
|
%7 = add i32 1, 2 |
|
%8 = add i32 1, 2 |
|
%9 = add i32 1, 2 |
|
%10 = add i32 1, 2 |
|
%11 = add i32 1, 2 |
|
%12 = add i32 1, 2 |
|
%13 = add i32 1, 2 |
|
%14 = add i32 1, 2 |
|
%15 = add i32 1, 2 |
|
|
|
store i32 %1, i32* @global_variable |
|
store i32 %2, i32* @global_variable |
|
store i32 %3, i32* @global_variable |
|
store i32 %4, i32* @global_variable |
|
store i32 %5, i32* @global_variable |
|
store i32 %6, i32* @global_variable |
|
store i32 %7, i32* @global_variable |
|
store i32 %8, i32* @global_variable |
|
store i32 %9, i32* @global_variable |
|
store i32 %10, i32* @global_variable |
|
store i32 %11, i32* @global_variable |
|
store i32 %12, i32* @global_variable |
|
store i32 %13, i32* @global_variable |
|
store i32 %14, i32* @global_variable |
|
store i32 %15, i32* @global_variable |
|
|
|
ret i32 0 |
|
} |
13 lines (12 sloc) 132 Bytes
窗体顶端
窗体底端
// max.c |
|
int max(int a, int b) { |
|
if (a > b) { |
|
return a; |
|
} else { |
|
return b; |
|
} |
|
} |
|
|
|
int main() { |
|
int a = max(1, 2); |
|
return 0; |
|
} |
8 lines (7 sloc) 204 Bytes
窗体顶端
窗体底端
; register_test.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
define i32 @main() { |
|
%local_variable = add i32 1, 2 |
|
ret i32 %local_variable |
|
} |
10 lines (8 sloc) 194 Bytes
窗体顶端
窗体底端
; div_test.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
define i8 @main() { |
|
%1 = udiv i8 -6, 2 |
|
%2 = sdiv i8 -6, 2 |
|
|
|
ret i8 %1 |
|
} |
22 lines (20 sloc) 507 Bytes
窗体顶端
窗体底端
; for.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
define i32 @main() { |
|
%i = alloca i32 ; int i = ... |
|
store i32 0, i32* %i ; ... = 0 |
|
br label %start |
|
start: |
|
%i_value = load i32, i32* %i |
|
%comparison_result = icmp slt i32 %i_value, 4 ; test if i < a |
|
br i1 %comparison_result, label %A, label %B |
|
A: |
|
; do something A |
|
%1 = add i32 %i_value, 1 ; ... = i + 1 |
|
store i32 %1, i32* %i ; i = ... |
|
br label %start |
|
B: |
|
; do something B |
|
|
|
ret i32 0 |
|
} |
13 lines (11 sloc) 419 Bytes
窗体顶端
窗体底端
; calling_convention_test.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
%ReturnType = type { i32, i32 } |
|
define %ReturnType @foo(i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) { |
|
ret %ReturnType { i32 1, i32 2 } |
|
} |
|
|
|
define i32 @main() { |
|
%1 = call %ReturnType @foo(i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8) |
|
ret i32 0 |
|
} |
14 lines (13 sloc) 345 Bytes
窗体顶端
窗体底端
; tail_call_test.ll |
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" |
|
target triple = "x86_64-apple-macosx10.15.0" |
|
|
|
define fastcc i32 @foo(i32 %a) { |
|
%res = icmp eq i32 %a, 1 |
|
br i1 %res, label %btrue, label %bfalse |
|
btrue: |
|
ret i32 1 |
|
bfalse: |
|
%sub = sub i32 %a, 1 |
|
%tail_call = tail call fastcc i32 @foo(i32 %sub) |
|
ret i32 %tail_call |
|
} |
26 lines (23 sloc) 384 Bytes
窗体顶端
窗体底端
// try_catch_test.cpp |
|
struct SomeOtherStruct { }; |
|
struct AnotherError { }; |
|
|
|
struct MyError { /* ... */ }; |
|
void foo() { |
|
SomeOtherStruct other_struct; |
|
throw MyError(); |
|
return; |
|
} |
|
|
|
void bar() { |
|
try { |
|
foo(); |
|
} catch (MyError err) { |
|
// do something with err |
|
} catch (AnotherError err) { |
|
// do something with err |
|
} catch (...) { |
|
// do something |
|
} |
|
} |
|
|
|
int main() { |
|
return 0; |
|
} |
参考文献链接
https://www.cnblogs.com/Tu9oh0st/p/16358531.html
https://github.com/llir/llvm
https://github.com/Evian-Zhang/llvm-ir-tutorial/tree/master/code