Python字节码与解释器学习
参考:http://blog.jobbole.com/55327/
http://blog.jobbole.com/56300/
http://blog.jobbole.com/56761/
1. 在交互式命令行中执行命令的内部过程
当你敲下return键的时候,python完成了以下四步:词法分析、句法分析、编译、解释。词法分析的工作就是将你刚才输入的那行代码分解为一些符号token(译者注:包括标示符,关键字,数字, 操作符等)。句法分析程序再接收这些符号,并用一种结构来展现它们之间的关系(在这种情况下使用的抽象语法树)。然后编译器接收这棵抽象语法树,并将它转化为一个(或多个)代码对象。最后,解释器逐个接收这些代码对象,并执行它们所代表的代码。
每一行我们输入的命令,都要经过上面的四个步骤,才能够被执行。
2. 函数对象
对象是面向对象理论中的基本元素,在一些动态或者解释性语言中,函数也可以看作是一种对象,比如在JavaScript,以及功能性编程语言Haskell/Ocaml中,函数都是一种特殊的对象。
函数是对象,就意味着函数可以像对象一样被执行各种操作,比如分配,释放,复制,赋值......
“函数是最好的对象”说明函数是一种对象。它就如同一个列表或者举个例子来说 :MyObject 就是一个对象。既然 foo 是一个对象,那么我们就能在不调用它的情况下使用它(也就是说,foo 和 foo() 是大相径庭的)。我们能够将 foo 当作一个参数传递给另一个函数或者赋值给一个新函数名( other_function = foo )。有了如此棒的函数,一切皆为可能!
另外,函数作为对象出现的时候,就是和函数调用有区别的,函数调用是一个动态的过程;而函数作为一个对象,是一个静态的实体概念,意思是你可以对这个对象施予一些操作,这与这个对象的类型有关,或者以面向对象的思想来说,你可以执行这个对象提供的各种接口操作(函数)。
既然是对象,那么函数对象有哪些成员呢?
>>> dir <built-in function dir> >>> dir(dir) ['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] >>> dir(dir.func_code) Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> dir(dir.func_code) AttributeError: 'builtin_function_or_method' object has no attribute 'func_code' >>> def foo(a): x = 3 return x + a >>> foo <function foo at 0x0000000002E8F128> >>> dir(foo) ['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name'] >>>
其中,内置函数dir的功能描述如下:
dir([object])
Without arguments, return the list of names in the current local scope. With an argument, attempt to return a list of valid attributes for that object.
If the object has a method named __dir__(), this method will be called and must return the list of attributes. This allows objects that implement a custom __getattr__() or __getattribute__() function to customize the way dir() reports their attributes.
If the object does not provide __dir__(), the function tries its best to gather information from the object’s __dict__ attribute, if defined, and from its type object. The resulting list is not necessarily complete, and may be inaccurate when the object has a custom __getattr__().
The default dir() mechanism behaves differently with different types of objects, as it attempts to produce the most relevant, rather than complete, information:
- If the object is a module object, the list contains the names of the module’s attributes.
- If the object is a type or class object, the list contains the names of its attributes, and recursively of the attributes of its bases.
- Otherwise, the list contains the object’s attributes’ names, the names of its class’s attributes, and recursively of the attributes of its class’s base classes.
The resulting list is sorted alphabetically.
除此之外,help内置函数也很重要,可以查看内置函数的帮助内容。
首先,查看当前Python程序加载了哪些模块
>>> for i in sys.modules.keys(): ... print "%20s:\t%s\n" % (i, sys.modules[i]) ... print "*"*100
copy_reg: <module 'copy_reg' from '/usr/lib/python2.7/copy_reg.pyc'> **************************************************************************************************** sre_compile: <module 'sre_compile' from '/usr/lib/python2.7/sre_compile.pyc'> **************************************************************************************************** _sre: <module '_sre' (built-in)> **************************************************************************************************** encodings: <module 'encodings' from '/usr/lib/python2.7/encodings/__init__.pyc'> **************************************************************************************************** site: <module 'site' from '/usr/lib/python2.7/site.pyc'> **************************************************************************************************** __builtin__: <module '__builtin__' (built-in)> **************************************************************************************************** sysconfig: <module 'sysconfig' from '/usr/lib/python2.7/sysconfig.pyc'> **************************************************************************************************** __main__: <module '__main__' (built-in)> **************************************************************************************************** encodings.encodings: None **************************************************************************************************** abc: <module 'abc' from '/usr/lib/python2.7/abc.pyc'> **************************************************************************************************** posixpath: <module 'posixpath' from '/usr/lib/python2.7/posixpath.pyc'> **************************************************************************************************** _weakrefset: <module '_weakrefset' from '/usr/lib/python2.7/_weakrefset.pyc'> **************************************************************************************************** errno: <module 'errno' (built-in)> **************************************************************************************************** encodings.codecs: None **************************************************************************************************** sre_constants: <module 'sre_constants' from '/usr/lib/python2.7/sre_constants.pyc'> **************************************************************************************************** re: <module 're' from '/usr/lib/python2.7/re.pyc'> **************************************************************************************************** _abcoll: <module '_abcoll' from '/usr/lib/python2.7/_abcoll.pyc'> **************************************************************************************************** types: <module 'types' from '/usr/lib/python2.7/types.pyc'> **************************************************************************************************** _codecs: <module '_codecs' (built-in)> **************************************************************************************************** encodings.__builtin__: None **************************************************************************************************** _warnings: <module '_warnings' (built-in)> **************************************************************************************************** genericpath: <module 'genericpath' from '/usr/lib/python2.7/genericpath.pyc'> **************************************************************************************************** stat: <module 'stat' from '/usr/lib/python2.7/stat.pyc'> **************************************************************************************************** zipimport: <module 'zipimport' (built-in)> **************************************************************************************************** _sysconfigdata: <module '_sysconfigdata' from '/usr/lib/python2.7/_sysconfigdata.pyc'> **************************************************************************************************** warnings: <module 'warnings' from '/usr/lib/python2.7/warnings.pyc'> **************************************************************************************************** UserDict: <module 'UserDict' from '/usr/lib/python2.7/UserDict.pyc'> **************************************************************************************************** encodings.utf_8: <module 'encodings.utf_8' from '/usr/lib/python2.7/encodings/utf_8.pyc'> **************************************************************************************************** sys: <module 'sys' (built-in)> **************************************************************************************************** codecs: <module 'codecs' from '/usr/lib/python2.7/codecs.pyc'> **************************************************************************************************** readline: <module 'readline' from '/usr/lib/python2.7/lib-dynload/readline.i386-linux-gnu.so'> **************************************************************************************************** _sysconfigdata_nd: <module '_sysconfigdata_nd' from '/usr/lib/python2.7/plat-i386-linux-gnu/_sysconfigdata_nd.pyc'> **************************************************************************************************** os.path: <module 'posixpath' from '/usr/lib/python2.7/posixpath.pyc'> **************************************************************************************************** sitecustomize: <module 'sitecustomize' from '/usr/lib/python2.7/sitecustomize.pyc'> **************************************************************************************************** signal: <module 'signal' (built-in)> **************************************************************************************************** traceback: <module 'traceback' from '/usr/lib/python2.7/traceback.pyc'> **************************************************************************************************** linecache: <module 'linecache' from '/usr/lib/python2.7/linecache.pyc'> **************************************************************************************************** posix: <module 'posix' (built-in)> **************************************************************************************************** encodings.aliases: <module 'encodings.aliases' from '/usr/lib/python2.7/encodings/aliases.pyc'> **************************************************************************************************** exceptions: <module 'exceptions' (built-in)> **************************************************************************************************** sre_parse: <module 'sre_parse' from '/usr/lib/python2.7/sre_parse.pyc'> **************************************************************************************************** os: <module 'os' from '/usr/lib/python2.7/os.pyc'> **************************************************************************************************** _weakref: <module '_weakref' (built-in)> ****************************************************************************************************
可以通过下面代码查看__builtin__模块中的成员
>>> num = 0 >>> for i in dir(sys.modules["__builtin__"]): ... print "%20s\t" % i, ... num += 1 ... if num == 5: ... print "" ... num = 0 ... ArithmeticError AssertionError AttributeError BaseException BufferError BytesWarning DeprecationWarning EOFError Ellipsis EnvironmentError Exception False FloatingPointError FutureWarning GeneratorExit IOError ImportError ImportWarning IndentationError IndexError KeyError KeyboardInterrupt LookupError MemoryError NameError None NotImplemented NotImplementedError OSError OverflowError PendingDeprecationWarning ReferenceError RuntimeError RuntimeWarning StandardError StopIteration SyntaxError SyntaxWarning SystemError SystemExit TabError True TypeError UnboundLocalError UnicodeDecodeError UnicodeEncodeError UnicodeError UnicodeTranslateError UnicodeWarning UserWarning ValueError Warning ZeroDivisionError _ __debug__ __doc__ __import__ __name__ __package__ abs all any apply basestring bin bool buffer bytearray bytes callable chr classmethod cmp coerce compile complex copyright credits delattr dict dir divmod enumerate eval execfile exit file filter float format frozenset getattr globals hasattr hash help hex id input int intern isinstance issubclass iter len license list locals long map max memoryview min next object oct open ord pow print property quit range raw_input reduce reload repr reversed round set setattr slice sorted staticmethod str sum super tuple type unichr unicode vars xrange zip >>>
3. dir内置命令是怎么实现的
在/Python-2.7.8/Objects/object.c中
1963 /* Implementation of dir() -- if obj is NULL, returns the names in the current 1964 (local) scope. Otherwise, performs introspection of the object: returns a 1965 sorted list of attribute names (supposedly) accessible from the object 1966 */ 1967 PyObject * 1968 PyObject_Dir(PyObject *obj) 1969 { 1970 PyObject * result; 1971 1972 if (obj == NULL) 1973 /* no object -- introspect the locals */ 1974 result = _dir_locals(); 1975 else 1976 /* object -- introspect the object */ 1977 result = _dir_object(obj); 1978 1979 assert(result == NULL || PyList_Check(result)); 1980 1981 if (result != NULL && PyList_Sort(result) != 0) { 1982 /* sorting the list failed */ 1983 Py_DECREF(result); 1984 result = NULL; 1985 } 1986 1987 return result; 1988 }
可见,与help(dir)描述的基本一致。
>>> def foo(a): ... if a > x: ... return a/1024 ... else: ... return a ... >>> type(foo) <type 'function'> >>> dir(foo) ['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name'] >>> foo.__call__ <method-wrapper '__call__' of function object at 0xb7420df4> >>> foo.__str__ <method-wrapper '__str__' of function object at 0xb7420df4> >>> foo <function foo at 0xb7420df4> >>> foo.func_closure >>> type(foo.func_closure) <type 'NoneType'> >>> type(foo.func_code) <type 'code'> >>> foo.func_code <code object foo at 0xb7409d10, file "<stdin>", line 1> >>> dir(foo.func_code) ['__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames'] >>> foo.func_code.co_argcount 1 >>> foo.func_code.co_cellvars () >>> foo.func_code.co_code '|\x00\x00t\x00\x00k\x04\x00r\x14\x00|\x00\x00d\x01\x00\x15S|\x00\x00Sd\x00\x00S' >>> foo.func_code.co_consts (None, 1024) >>> foo.func_code.co_filename '<stdin>' >>> foo.func_code.co_firstlineno 1 >>> foo.func_code.co_flags 67 >>> foo.func_code.co_freevars () >>> foo.func_code.co_lnotab '\x00\x01\x0c\x01\x08\x02' >>> foo.func_code.co_name 'foo' >>> foo.func_code.co_names ('x',) >>> foo.func_code.co_nlocals 1 >>> foo.func_code.co_stacksize 2 >>> foo.func_code.co_varnames ('a',) >>>
其中,foo.func_code.co_code打印出来的就是Python的字节码。
Help on built-in function ord in module __builtin__: ord(...) ord(c) -> integer Return the integer ordinal of a one-character string.
>>> [ord(i) for i in foo.func_code.co_code] [124, 0, 0, 116, 0, 0, 107, 4, 0, 114, 20, 0, 124, 0, 0, 100, 1, 0, 21, 83, 124, 0, 0, 83, 100, 0, 0, 83]
这就是那些组成python字节码的字节。解释器会循环接收各个字节,查找每个字节的指令然后执行这个指令。需要注意的是,字节码本身并不包括任何python对象,或引用任何对象。
如果你想知道python字节码的意思,可以去找到CPython解释器文件(ceval.c),然后查阅100的意思、1的意思、0的意思,等等。
>>> import dis >>> dir(dis) ['EXTENDED_ARG', 'HAVE_ARGUMENT', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_have_code', '_test', 'cmp_op', 'dis', 'disassemble', 'disassemble_string', 'disco', 'distb', 'findlabels', 'findlinestarts', 'hascompare', 'hasconst', 'hasfree', 'hasjabs', 'hasjrel', 'haslocal', 'hasname', 'opmap', 'opname', 'sys', 'types'] >>> type(dis.dis) <type 'function'> >>> dir(dis.dis) ['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name'] >>> [ord(i) for i in dis.dis.func_code.co_code] [124, 0, 0, 100, 1, 0, 107, 8, 0, 114, 23, 0, 116, 1, 0, 131, 0, 0, 1, 100, 1, 0, 83, 116, 2, 0, 124, 0, 0, 116, 3, 0, 106, 4, 0, 131, 2, 0, 114, 53, 0, 124, 0, 0, 106, 5, 0, 125, 0, 0, 110, 0, 0, 116, 6, 0, 124, 0, 0, 100, 2, 0, 131, 2, 0, 114, 80, 0, 124, 0, 0, 106, 7, 0, 125, 0, 0, 110, 0, 0, 116, 6, 0, 124, 0, 0, 100, 3, 0, 131, 2, 0, 114, 107, 0, 124, 0, 0, 106, 8, 0, 125, 0, 0, 110, 0, 0, 116, 6, 0, 124, 0, 0, 100, 4, 0, 131, 2, 0, 114, 246, 0, 124, 0, 0, 106, 9, 0, 106, 10, 0, 131, 0, 0, 125, 1, 0, 124, 1, 0, 106, 11, 0, 131, 0, 0, 1, 120, 174, 0, 124, 1, 0, 68, 93, 85, 0, 92, 2, 0, 125, 2, 0, 125, 3, 0, 116, 2, 0, 124, 3, 0, 116, 12, 0, 131, 2, 0, 114, 154, 0, 100, 5, 0, 124, 2, 0, 22, 71, 72, 121, 14, 0, 116, 13, 0, 124, 3, 0, 131, 1, 0, 1, 87, 110, 28, 0, 4, 116, 14, 0, 107, 10, 0, 114, 234, 0, 1, 125, 4, 0, 1, 100, 6, 0, 71, 124, 4, 0, 71, 72, 110, 1, 0, 88, 72, 113, 154, 0, 113, 154, 0, 87, 110, 78, 0, 116, 6, 0, 124, 0, 0, 100, 7, 0, 131, 2, 0, 114, 18, 1, 116, 15, 0, 124, 0, 0, 131, 1, 0, 1, 110, 50, 0, 116, 2, 0, 124, 0, 0, 116, 16, 0, 131, 2, 0, 114, 46, 1, 116, 17, 0, 124, 0, 0, 131, 1, 0, 1, 110, 22, 0, 116, 14, 0, 100, 8, 0, 116, 18, 0, 124, 0, 0, 131, 1, 0, 106, 19, 0, 22, 130, 2, 0, 100, 1, 0, 83]
>>> dir(dis.dis.func_code) ['__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames'] >>> dis.dis.func_code.co_filename '/usr/lib/python2.7/dis.py' >>> dis.dis.func_code.co_consts ('Disassemble classes, methods, functions, or code.\n\n With no argument, disassemble the last traceback.\n\n ', None, 'im_func', 'func_code', '__dict__', 'Disassembly of %s:', 'Sorry:', 'co_code', "don't know how to disassemble %s objects") >>> dis.dis.func_code.co_names ('None', 'distb', 'isinstance', 'types', 'InstanceType', '__class__', 'hasattr', 'im_func', 'func_code', '__dict__', 'items', 'sort', '_have_code', 'dis', 'TypeError', 'disassemble', 'str', 'disassemble_string', 'type', '__name__') >>> dis.dis.func_code.co_varnames ('x', 'items', 'name', 'x1', 'msg') >>> dis.dis.func_code.co_stacksize 6 >>> dis.dis.func_code.co_nlocals 5
其实dis.dis也不过就是是一连串的字节码而已,它被Python解释器执行,从而完成指定的功能。
下面我们就使用dis.dis来反汇编一下字节码
>>> dis.dis(foo.func_code.co_code) 0 LOAD_FAST 0 (0) 3 LOAD_GLOBAL 0 (0) 6 COMPARE_OP 4 (>) 9 POP_JUMP_IF_FALSE 20 12 LOAD_FAST 0 (0) 15 LOAD_CONST 1 (1) 18 BINARY_DIVIDE 19 RETURN_VALUE >> 20 LOAD_FAST 0 (0) 23 RETURN_VALUE 24 LOAD_CONST 0 (0) 27 RETURN_VALUE