VEX IR语言语法
/*---------------------------------------------------------------*/
/*--- High-level IR description ---*/
/*---------------------------------------------------------------*/
/* Vex IR is an architecture-neutral intermediate representation.
Unlike some IRs in systems similar to Vex, it is not like assembly
language (ie. a list of instructions). Rather, it is more like the
IR that might be used in a compiler.
相对汇编语言,VEX IR更像是Compiler的中间语言
Code blocks
~~~~~~~~~~~
The code is broken into small code blocks ("superblocks", type:
'IRSB'). Each code block typically represents from 1 to perhaps 50
instructions. IRSBs are single-entry, multiple-exit code blocks.
Each IRSB contains three things:
单入口,多出口的代码块,与Intel Pin中的Trace级别相仿
- a type environment, which indicates the type of each temporary
value present in the IRSB
【实例:】
(*ir_block).tyenv
-types
-[0] Ity_I32
-[1] Ity_I32
-types_size 0x00000008
-types_used 0x00000002
types_used提示有多少个Temp变量被使用,types数组里面分别保存着每个Temp变量的类型
- a list of statements, which represent code
【实例:】
stmts_size 0x00000003 int stmts_used 0x00000003 int - (*ir_block).stmts[0]
tag Ist_IMark - (*ir_block).stmts[1]
tag Ist_WrTmp - (*ir_block).stmts[2]
tag Ist_Put
Statements也是保存在stmts数组中,stmts_used代表实际上使用的Statements的数目
- a jump that exits from the end the IRSB
【实例:】
jumpkind Ijk_Boring
最后打印出来的结果如下
0x77D699A0: movl %esi,%esp IRSB { t0:I32 t1:I32 【2个Temp变量】 ------ IMark(0x77D699A0, 2, 0) ------ 【3个Statements,包含IMark,但是没有包含最后一条,因为它是对于IP寄存器操作的,是自动的】 t0 = GET:I32(32) 【整条是一个Statements,而GET:I32(32)是Expression】 PUT(24) = t0 PUT(68) = 0x77D699A2:I32; exit-Boring
其中, 第二条Statements可以继续分解
- (*ir_block).stmts[1] tag Ist_WrTmp .tmp 0 .tag Iex_Get .offset 32 .ty Ity_I32
Because the blocks are multiple-exit, there can be additional
conditional exit statements that cause control to leave the IRSB
before the final exit. Also because of this, IRSBs can cover
multiple non-consecutive sequences of code (up to 3). These are
recorded in the type VexGuestExtents (see libvex.h).
Statements and expressions
~~~~~~~~~~~~~~~~~~~~~~~~~~
Statements (type 'IRStmt') represent operations with side-effects,
eg. guest register writes, stores, and assignments to temporaries.
Expressions (type 'IRExpr') represent operations without
side-effects, eg. arithmetic operations, loads, constants.
Expressions can contain sub-expressions, forming expression trees,
eg. (3 + (4 * load(addr1)).
Statements可以有Side-Effects,但是Expressions是Pure的,没有副作用的。
ST代表从寄存器到内存的数据转移, LD代表从内存到寄存器转移数据
Expression的类型
typedef enum { Iex_Binder=0x15000, Iex_Get, Iex_GetI, Iex_RdTmp, Iex_Qop, Iex_Triop, Iex_Binop, Iex_Unop, Iex_Load, Iex_Const, Iex_Mux0X, Iex_CCall } IRExprTag;
Statements的类型
typedef enum { Ist_NoOp=0x19000, Ist_IMark, /* META */ Ist_AbiHint, /* META */ Ist_Put, Ist_PutI, Ist_WrTmp, Ist_Store, Ist_CAS, Ist_LLSC, Ist_Dirty, Ist_MBE, /* META (maybe) */ Ist_Exit } IRStmtTag;