浅谈CTF中Pycode字节码逆向
一、题目原题
题目给出一个python汇编文件和一个输出文件,要求逆向出程序中的flag值
3 0 LOAD_CONST 1 ('XXXXXX') //This is flag,try to figure it out ! Don't forget to fill it in flag{} ! 2 STORE_FAST 0 (flag) 4 4 LOAD_CONST 2 (0) 6 BUILD_LIST 1 8 LOAD_CONST 3 (18) 10 BINARY_MULTIPLY 12 STORE_FAST 1 (num) 5 14 LOAD_CONST 2 (0) 16 STORE_FAST 2 (k) 6 18 LOAD_GLOBAL 0 (range) 20 LOAD_GLOBAL 1 (len) 22 LOAD_FAST 0 (flag) 24 CALL_FUNCTION 1 26 CALL_FUNCTION 1 28 GET_ITER >> 30 FOR_ITER 112 (to 144) 32 STORE_FAST 3 (i) 7 34 LOAD_GLOBAL 2 (ord) 36 LOAD_FAST 0 (flag) 38 LOAD_FAST 3 (i) 40 BINARY_SUBSCR 42 CALL_FUNCTION 1 44 LOAD_FAST 3 (i) 46 BINARY_ADD 48 LOAD_FAST 2 (k) 50 LOAD_CONST 4 (3) 52 BINARY_MODULO 54 LOAD_CONST 5 (1) 56 BINARY_ADD 58 BINARY_XOR 60 LOAD_FAST 1 (num) 62 LOAD_FAST 3 (i) 64 STORE_SUBSCR 8 66 LOAD_GLOBAL 2 (ord) 68 LOAD_FAST 0 (flag) 70 LOAD_GLOBAL 1 (len) 72 LOAD_FAST 0 (flag) 74 CALL_FUNCTION 1 76 LOAD_FAST 3 (i) 78 BINARY_SUBTRACT 80 LOAD_CONST 5 (1) 82 BINARY_SUBTRACT 84 BINARY_SUBSCR 86 CALL_FUNCTION 1 88 LOAD_GLOBAL 1 (len) 90 LOAD_FAST 0 (flag) 92 CALL_FUNCTION 1 94 BINARY_ADD 96 LOAD_FAST 3 (i) 98 BINARY_SUBTRACT 100 LOAD_CONST 5 (1) 102 BINARY_SUBTRACT 104 LOAD_FAST 2 (k) 106 LOAD_CONST 4 (3) 108 BINARY_MODULO 110 LOAD_CONST 5 (1) 112 BINARY_ADD 114 BINARY_XOR 116 LOAD_FAST 1 (num) 118 LOAD_GLOBAL 1 (len) 120 LOAD_FAST 0 (flag) 122 CALL_FUNCTION 1 124 LOAD_FAST 3 (i) 126 BINARY_SUBTRACT 128 LOAD_CONST 5 (1) 130 BINARY_SUBTRACT 132 STORE_SUBSCR 9 134 LOAD_FAST 2 (k) 136 LOAD_CONST 5 (1) 138 INPLACE_ADD 140 STORE_FAST 2 (k) 142 JUMP_ABSOLUTE 30 10 >> 144 LOAD_GLOBAL 3 (print) 146 LOAD_FAST 1 (num) 148 CALL_FUNCTION 1 150 POP_TOP 152 LOAD_CONST 0 (None) 154 RETURN_VALUE
output文件:
[115, 120, 96, 84, 116, 103, 105, 56, 102, 59, 127, 105, 115, 128, 95, 124, 139, 49]
二、解题思路
首先看到汇编可以发现程序创建了一个长度18的num数组,而程序结尾的print函数调用也是输出的num值,所以整体我们只需要关注num的值变化即可
4 4 LOAD_CONST 2 (0) 6 BUILD_LIST 1 8 LOAD_CONST 3 (18) 10 BINARY_MULTIPLY 12 STORE_FAST 1 (num)
之后程序创建了k变量,循环flag的长度,也就是18次,for循环具体可以分为两个步骤来
7 34 LOAD_GLOBAL 2 (ord) 36 LOAD_FAST 0 (flag) 38 LOAD_FAST 3 (i) 40 BINARY_SUBSCR 42 CALL_FUNCTION 1 44 LOAD_FAST 3 (i) 46 BINARY_ADD 48 LOAD_FAST 2 (k) 50 LOAD_CONST 4 (3) 52 BINARY_MODULO 54 LOAD_CONST 5 (1) 56 BINARY_ADD 58 BINARY_XOR 60 LOAD_FAST 1 (num) 62 LOAD_FAST 3 (i) 64 STORE_SUBSCR
按函数调用栈来看获取flag[i]的十进制数值,再加上i的值
然后就是BINARY_MODULO调用取模,也就是 k % 3,但是别忘了后面还有一个BINARY_ADD
最后就是将上述两个值进行XOR异或
再来看看后一部分
8 66 LOAD_GLOBAL 2 (ord) 68 LOAD_FAST 0 (flag) 70 LOAD_GLOBAL 1 (len) 72 LOAD_FAST 0 (flag) 74 CALL_FUNCTION 1 76 LOAD_FAST 3 (i) 78 BINARY_SUBTRACT 80 LOAD_CONST 5 (1) 82 BINARY_SUBTRACT 84 BINARY_SUBSCR 86 CALL_FUNCTION 1 88 LOAD_GLOBAL 1 (len) 90 LOAD_FAST 0 (flag) 92 CALL_FUNCTION 1 94 BINARY_ADD 96 LOAD_FAST 3 (i) 98 BINARY_SUBTRACT 100 LOAD_CONST 5 (1) 102 BINARY_SUBTRACT 104 LOAD_FAST 2 (k) 106 LOAD_CONST 4 (3) 108 BINARY_MODULO 110 LOAD_CONST 5 (1) 112 BINARY_ADD 114 BINARY_XOR 116 LOAD_FAST 1 (num) 118 LOAD_GLOBAL 1 (len) 120 LOAD_FAST 0 (flag) 122 CALL_FUNCTION 1 124 LOAD_FAST 3 (i) 126 BINARY_SUBTRACT 128 LOAD_CONST 5 (1) 130 BINARY_SUBTRACT 132 STORE_SUBSCR 9 134 LOAD_FAST 2 (k) 136 LOAD_CONST 5 (1) 138 INPLACE_ADD 140 STORE_FAST 2 (k) 142 JUMP_ABSOLUTE 30
前一段手撕的时候还能接受,但是到这一段确实有点绕,还是用工具习惯了,突然手撕汇编确实有点吃力
其实根据调用栈的平衡就可以很好的逆出来对应的代码
这里我直接放出我手撕的源代码:
num = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] flag = 'UNCTF{qwertyuiopa}' k=0 for i in range(len(flag)): num[i] = (ord(flag[i]) + i) ^ (k % 3 + 1) num[len(flag)-i-1] = ((ord(flag[len(flag)-i-1]) + len(flag)) - i - 1) ^ (k % 3 +1) k+=1 print(num)
现在就是编写对应的poc来逆序生成flag,但是看到这个程序发现,其实这是一个双指针的加密方式,而且会覆盖到之前指针已经填充的num下标中
因此我将解密程序分为两个部分,前9个字符用第二部分的加密方式来解密,后9个字符则用第一部分的
最终poc如下:
enc = [115, 120, 96, 84, 116, 103, 105, 56, 102, 59, 127, 105, 115, 128, 95, 124, 139, 49] str1 = '' for i in range(9): str1 += chr(((enc[i] ^ ((17-i) % 3 + 1)) + (17-i) + 1) - 18) str2 = '' for i in range(9,18): str2 += chr((enc[i] ^ (i % 3 + 1)) - i) print(str1+str2) #py_Trad3_1s_fuNny!
读者看到文章的时候一定要自己上手试试,会发现解题过程还是有点意思的
三、SCUCTF的一道逆向题
首先拿到题目的时候是一个pyc文件,所以我第一思路就是想反编译成py文件
但是问题来了,uncompyle6这些解密都不能成功,因为目前不支持解密python 3.9编写的
所以只能另寻出路,后来编译了pydcd来解密pyc发现是缺省的,关键代码没有翻译完整,因此在这里卡了很久。
之后给了hint是用了pydisasm工具来直接转换成汇编代码
这里截取部分关键代码处
1: BUILD_LIST 0 LOAD_CONST 0 ((0, 250, 444, 678, 880, 1260, 1788, 952, 2352, 1944, 1960, 1144, 2784, 2522, 2576, 3450, 3712, 4182, 5040, 5282, 4680, 3906, 5676, 5520, 3504, 7550)) LIST_EXTEND 1 STORE_NAME 0 (enc) 3: LOAD_NAME 1 (input) LOAD_CONST 1 ('Input:') CALL_FUNCTION 1 STORE_NAME 2 (flag) 7: LOAD_NAME 3 (len) LOAD_NAME 2 (flag) CALL_FUNCTION 1 LOAD_CONST 2 (26) COMPARE_OP 3 (!=) POP_JUMP_IF_FALSE L44 (to 44) 8: LOAD_NAME 4 (print) LOAD_CONST 3 ('Wrong!') CALL_FUNCTION 1 POP_TOP 9: LOAD_NAME 5 (exit) LOAD_CONST 4 (1) CALL_FUNCTION 1 POP_TOP
往下看
L44: 13: LOAD_CONST 5 (<Code38 code object listcomp_0x2e83dc0 at 0x2ea2080, file Re7_PyCode.py>, line 13) LOAD_CONST 6 ('<listcomp>') MAKE_FUNCTION 0 (Neither defaults, keyword-only args, annotations, nor closures) LOAD_NAME 2 (flag) GET_ITER CALL_FUNCTION 1 STORE_NAME 6 (flag_arr)
这里调用了13地址上的节选代码,并将13地址上的调用结果存放到本地的flag_arr变量上,再来看看13上的汇编代码
13: BUILD_LIST 0 LOAD_FAST 0 (.0) L4: FOR_ITER L18 (to 18) STORE_FAST 1 (i) LOAD_GLOBAL 0 (ord) LOAD_FAST 1 (i) CALL_FUNCTION 1 LIST_APPEND 2 JUMP_ABSOLUTE L4 (to 4) L18: RETURN_VALUE
大概的意思就是将传进来的flag获取对应的十进制数值以数组的形式返回
继续往下走
L66: FOR_ITER L88 (to 88) STORE_NAME 8 (j) 17: LOAD_NAME 6 (flag_arr) LOAD_NAME 8 (j) DUP_TOP_TWO BINARY_SUBSCR LOAD_NAME 8 (j) INPLACE_ADD ROT_THREE STORE_SUBSCR JUMP_ABSOLUTE L66 (to 66) L88: 20: LOAD_NAME 7 (range) LOAD_CONST 2 (26) CALL_FUNCTION 1 GET_ITER
看到这里就可以发现,此处的代码就是for循环,并以j为下标,与arr[j]做加法运算,最后将结果存放到flag_arr[j]
再继续往下走
L96: FOR_ITER L122 (to 122) STORE_NAME 8 (j) 21: LOAD_NAME 6 (flag_arr) LOAD_NAME 8 (j) DUP_TOP_TWO BINARY_SUBSCR LOAD_CONST 2 (26) LOAD_NAME 8 (j) BINARY_SUBTRACT INPLACE_XOR ROT_THREE STORE_SUBSCR JUMP_ABSOLUTE L96 (to 96)
老样子,还是循环,将(26 - j)的值与flag_arr[j]异或,再将结果存放到flag_arr
23: BUILD_LIST 0 LOAD_FAST 0 (.0) L4: FOR_ITER L26 (to 26) STORE_FAST 1 (i) LOAD_GLOBAL 0 (flag_arr) LOAD_FAST 1 (i) BINARY_SUBSCR LOAD_CONST 0 (2) BINARY_MULTIPLY LOAD_FAST 1 (i) BINARY_MULTIPLY LIST_APPEND 2 JUMP_ABSOLUTE L4 (to 4) L26: RETURN_VALUE L122: 23: LOAD_CONST 7 (<Code38 code object listcomp_0x2ea7330 at 0x2e83da0, file Re7_PyCode.py>, line 23) LOAD_CONST 6 ('<listcomp>') MAKE_FUNCTION 0 (Neither defaults, keyword-only args, annotations, nor closures) LOAD_NAME 7 (range) LOAD_CONST 2 (26) CALL_FUNCTION 1 GET_ITER CALL_FUNCTION 1 STORE_NAME 6 (flag_arr)
再来看看关键代码的最后一处,for循环i下标,将flag_arr[i]乘上CONST数值2,再乘上下标i
L148: FOR_ITER L186 (to 186) STORE_NAME 9 (i) 26: LOAD_NAME 6 (flag_arr) LOAD_NAME 9 (i) BINARY_SUBSCR LOAD_NAME 0 (enc) LOAD_NAME 9 (i) BINARY_SUBSCR COMPARE_OP 3 (!=) POP_JUMP_IF_FALSE L148 (to 148) 27: LOAD_NAME 4 (print) LOAD_CONST 8 ('Wrong') CALL_FUNCTION 1 POP_TOP 28: LOAD_NAME 5 (exit) LOAD_CONST 4 (1) CALL_FUNCTION 1 POP_TOP JUMP_ABSOLUTE L148 (to 148) L186: 30: LOAD_NAME 4 (print) LOAD_CONST 9 ('Success!') CALL_FUNCTION 1 POP_TOP
最后就是判断flag_arr和enc的数组值是否相等。
经过上述分析,发现flag_arr数组经过三次关键代码运算,分别用py表示如下:
-
j + flag_arr[j]
-
(26 - j) ^ flag_arr[j]
- flag_arr[i] * 2 * i
所以就可以编写获取flag的脚本:
enc = [0, 250, 444, 678, 880, 1260, 1788, 952, 2352, 1944, 1960, 1144, 2784, 2522, 2576, 3450, 3712, 4182, 5040, 5282, 4680, 3906, 5676, 5520, 3504, 7550] str = '' for i in range(1,26): str += chr(((enc[i] / 2 / i) ^ (26 - i)) - i) print(str) #scuctf{Pyth0n_Binary_Cod3}