Python字节码与解释器学习

参考:http://blog.jobbole.com/55327/

http://blog.jobbole.com/56300/

http://blog.jobbole.com/56761/

 

1. 在交互式命令行中执行命令的内部过程

当你敲下return键的时候,python完成了以下四步:词法分析、句法分析、编译、解释。词法分析的工作就是将你刚才输入的那行代码分解为一些符号token(译者注:包括标示符,关键字,数字, 操作符等)。句法分析程序再接收这些符号,并用一种结构来展现它们之间的关系(在这种情况下使用的抽象语法树)。然后编译器接收这棵抽象语法树,并将它转化为一个(或多个)代码对象。最后,解释器逐个接收这些代码对象,并执行它们所代表的代码。

 

每一行我们输入的命令,都要经过上面的四个步骤,才能够被执行。

 

2. 函数对象

对象是面向对象理论中的基本元素,在一些动态或者解释性语言中,函数也可以看作是一种对象,比如在JavaScript,以及功能性编程语言Haskell/Ocaml中,函数都是一种特殊的对象。

 

函数是对象,就意味着函数可以像对象一样被执行各种操作,比如分配,释放,复制,赋值......

“函数是最好的对象”说明函数是一种对象。它就如同一个列表或者举个例子来说 :MyObject 就是一个对象。既然 foo 是一个对象,那么我们就能在不调用它的情况下使用它(也就是说,foo 和 foo() 是大相径庭的)。我们能够将 foo 当作一个参数传递给另一个函数或者赋值给一个新函数名( other_function = foo )。有了如此棒的函数,一切皆为可能!

 

另外,函数作为对象出现的时候,就是和函数调用有区别的,函数调用是一个动态的过程;而函数作为一个对象,是一个静态的实体概念,意思是你可以对这个对象施予一些操作,这与这个对象的类型有关,或者以面向对象的思想来说,你可以执行这个对象提供的各种接口操作(函数)。

 

既然是对象,那么函数对象有哪些成员呢?

>>> dir
<built-in function dir>
>>> dir(dir)
['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
>>> dir(dir.func_code)

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    dir(dir.func_code)
AttributeError: 'builtin_function_or_method' object has no attribute 'func_code'
>>> def foo(a):
	x = 3
	return x + a

>>> foo
<function foo at 0x0000000002E8F128>
>>> dir(foo)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
>>> 

其中,内置函数dir的功能描述如下:

dir([object])

Without arguments, return the list of names in the current local scope. With an argument, attempt to return a list of valid attributes for that object.

If the object has a method named __dir__(), this method will be called and must return the list of attributes. This allows objects that implement a custom __getattr__() or __getattribute__() function to customize the way dir() reports their attributes.

If the object does not provide __dir__(), the function tries its best to gather information from the object’s __dict__ attribute, if defined, and from its type object. The resulting list is not necessarily complete, and may be inaccurate when the object has a custom __getattr__().

The default dir() mechanism behaves differently with different types of objects, as it attempts to produce the most relevant, rather than complete, information:

  • If the object is a module object, the list contains the names of the module’s attributes.
  • If the object is a type or class object, the list contains the names of its attributes, and recursively of the attributes of its bases.
  • Otherwise, the list contains the object’s attributes’ names, the names of its class’s attributes, and recursively of the attributes of its class’s base classes.

The resulting list is sorted alphabetically.

  

除此之外,help内置函数也很重要,可以查看内置函数的帮助内容。

 

首先,查看当前Python程序加载了哪些模块

>>> for i in sys.modules.keys():
...     print "%20s:\t%s\n" % (i, sys.modules[i])
...     print "*"*100

 

            copy_reg:	<module 'copy_reg' from '/usr/lib/python2.7/copy_reg.pyc'>

****************************************************************************************************
         sre_compile:	<module 'sre_compile' from '/usr/lib/python2.7/sre_compile.pyc'>

****************************************************************************************************
                _sre:	<module '_sre' (built-in)>

****************************************************************************************************
           encodings:	<module 'encodings' from '/usr/lib/python2.7/encodings/__init__.pyc'>

****************************************************************************************************
                site:	<module 'site' from '/usr/lib/python2.7/site.pyc'>

****************************************************************************************************
         __builtin__:	<module '__builtin__' (built-in)>

****************************************************************************************************
           sysconfig:	<module 'sysconfig' from '/usr/lib/python2.7/sysconfig.pyc'>

****************************************************************************************************
            __main__:	<module '__main__' (built-in)>

****************************************************************************************************
 encodings.encodings:	None

****************************************************************************************************
                 abc:	<module 'abc' from '/usr/lib/python2.7/abc.pyc'>

****************************************************************************************************
           posixpath:	<module 'posixpath' from '/usr/lib/python2.7/posixpath.pyc'>

****************************************************************************************************
         _weakrefset:	<module '_weakrefset' from '/usr/lib/python2.7/_weakrefset.pyc'>

****************************************************************************************************
               errno:	<module 'errno' (built-in)>

****************************************************************************************************
    encodings.codecs:	None

****************************************************************************************************
       sre_constants:	<module 'sre_constants' from '/usr/lib/python2.7/sre_constants.pyc'>

****************************************************************************************************
                  re:	<module 're' from '/usr/lib/python2.7/re.pyc'>

****************************************************************************************************
             _abcoll:	<module '_abcoll' from '/usr/lib/python2.7/_abcoll.pyc'>

****************************************************************************************************
               types:	<module 'types' from '/usr/lib/python2.7/types.pyc'>

****************************************************************************************************
             _codecs:	<module '_codecs' (built-in)>

****************************************************************************************************
encodings.__builtin__:	None

****************************************************************************************************
           _warnings:	<module '_warnings' (built-in)>

****************************************************************************************************
         genericpath:	<module 'genericpath' from '/usr/lib/python2.7/genericpath.pyc'>

****************************************************************************************************
                stat:	<module 'stat' from '/usr/lib/python2.7/stat.pyc'>

****************************************************************************************************
           zipimport:	<module 'zipimport' (built-in)>

****************************************************************************************************
      _sysconfigdata:	<module '_sysconfigdata' from '/usr/lib/python2.7/_sysconfigdata.pyc'>

****************************************************************************************************
            warnings:	<module 'warnings' from '/usr/lib/python2.7/warnings.pyc'>

****************************************************************************************************
            UserDict:	<module 'UserDict' from '/usr/lib/python2.7/UserDict.pyc'>

****************************************************************************************************
     encodings.utf_8:	<module 'encodings.utf_8' from '/usr/lib/python2.7/encodings/utf_8.pyc'>

****************************************************************************************************
                 sys:	<module 'sys' (built-in)>

****************************************************************************************************
              codecs:	<module 'codecs' from '/usr/lib/python2.7/codecs.pyc'>

****************************************************************************************************
            readline:	<module 'readline' from '/usr/lib/python2.7/lib-dynload/readline.i386-linux-gnu.so'>

****************************************************************************************************
   _sysconfigdata_nd:	<module '_sysconfigdata_nd' from '/usr/lib/python2.7/plat-i386-linux-gnu/_sysconfigdata_nd.pyc'>

****************************************************************************************************
             os.path:	<module 'posixpath' from '/usr/lib/python2.7/posixpath.pyc'>

****************************************************************************************************
       sitecustomize:	<module 'sitecustomize' from '/usr/lib/python2.7/sitecustomize.pyc'>

****************************************************************************************************
              signal:	<module 'signal' (built-in)>

****************************************************************************************************
           traceback:	<module 'traceback' from '/usr/lib/python2.7/traceback.pyc'>

****************************************************************************************************
           linecache:	<module 'linecache' from '/usr/lib/python2.7/linecache.pyc'>

****************************************************************************************************
               posix:	<module 'posix' (built-in)>

****************************************************************************************************
   encodings.aliases:	<module 'encodings.aliases' from '/usr/lib/python2.7/encodings/aliases.pyc'>

****************************************************************************************************
          exceptions:	<module 'exceptions' (built-in)>

****************************************************************************************************
           sre_parse:	<module 'sre_parse' from '/usr/lib/python2.7/sre_parse.pyc'>

****************************************************************************************************
                  os:	<module 'os' from '/usr/lib/python2.7/os.pyc'>

****************************************************************************************************
            _weakref:	<module '_weakref' (built-in)>

****************************************************************************************************

  

可以通过下面代码查看__builtin__模块中的成员

>>> num = 0
>>> for i in dir(sys.modules["__builtin__"]):
...     print "%20s\t" % i,
...     num += 1
...     if num == 5:
...             print ""
...             num = 0
... 
     ArithmeticError	      AssertionError	      AttributeError	       BaseException	         BufferError	
        BytesWarning	  DeprecationWarning	            EOFError	            Ellipsis	    EnvironmentError	
           Exception	               False	  FloatingPointError	       FutureWarning	       GeneratorExit	
             IOError	         ImportError	       ImportWarning	    IndentationError	          IndexError	
            KeyError	   KeyboardInterrupt	         LookupError	         MemoryError	           NameError	
                None	      NotImplemented	 NotImplementedError	             OSError	       OverflowError	
PendingDeprecationWarning	      ReferenceError	        RuntimeError	      RuntimeWarning	       StandardError	
       StopIteration	         SyntaxError	       SyntaxWarning	         SystemError	          SystemExit	
            TabError	                True	           TypeError	   UnboundLocalError	  UnicodeDecodeError	
  UnicodeEncodeError	        UnicodeError	UnicodeTranslateError	      UnicodeWarning	         UserWarning	
          ValueError	             Warning	   ZeroDivisionError	                   _	           __debug__	
             __doc__	          __import__	            __name__	         __package__	                 abs	
                 all	                 any	               apply	          basestring	                 bin	
                bool	              buffer	           bytearray	               bytes	            callable	
                 chr	         classmethod	                 cmp	              coerce	             compile	
             complex	           copyright	             credits	             delattr	                dict	
                 dir	              divmod	           enumerate	                eval	            execfile	
                exit	                file	              filter	               float	              format	
           frozenset	             getattr	             globals	             hasattr	                hash	
                help	                 hex	                  id	               input	                 int	
              intern	          isinstance	          issubclass	                iter	                 len	
             license	                list	              locals	                long	                 map	
                 max	          memoryview	                 min	                next	              object	
                 oct	                open	                 ord	                 pow	               print	
            property	                quit	               range	           raw_input	              reduce	
              reload	                repr	            reversed	               round	                 set	
             setattr	               slice	              sorted	        staticmethod	                 str	
                 sum	               super	               tuple	                type	              unichr	
             unicode	                vars	              xrange	                 zip	>>> 

  

  

 3. dir内置命令是怎么实现的

在/Python-2.7.8/Objects/object.c中

1963 /* Implementation of dir() -- if obj is NULL, returns the names in the current
1964    (local) scope.  Otherwise, performs introspection of the object: returns a
1965    sorted list of attribute names (supposedly) accessible from the object
1966 */
1967 PyObject *
1968 PyObject_Dir(PyObject *obj)
1969 {
1970     PyObject * result;
1971 
1972     if (obj == NULL)
1973         /* no object -- introspect the locals */
1974         result = _dir_locals();
1975     else
1976         /* object -- introspect the object */
1977         result = _dir_object(obj);
1978 
1979     assert(result == NULL || PyList_Check(result));
1980 
1981     if (result != NULL && PyList_Sort(result) != 0) {
1982         /* sorting the list failed */
1983         Py_DECREF(result);
1984         result = NULL;
1985     }
1986 
1987     return result;
1988 }

  

可见,与help(dir)描述的基本一致。

 

>>> def foo(a):
...     if a > x:
...             return a/1024
...     else:
...             return a
... 
>>> type(foo)
<type 'function'>
>>> dir(foo)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
>>> foo.__call__
<method-wrapper '__call__' of function object at 0xb7420df4>
>>> foo.__str__
<method-wrapper '__str__' of function object at 0xb7420df4>
>>> foo
<function foo at 0xb7420df4>
>>> foo.func_closure
>>> type(foo.func_closure)
<type 'NoneType'>
>>> type(foo.func_code)
<type 'code'>
>>> foo.func_code
<code object foo at 0xb7409d10, file "<stdin>", line 1>
>>> dir(foo.func_code)
['__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames']
>>> foo.func_code.co_argcount
1
>>> foo.func_code.co_cellvars
()
>>> foo.func_code.co_code
'|\x00\x00t\x00\x00k\x04\x00r\x14\x00|\x00\x00d\x01\x00\x15S|\x00\x00Sd\x00\x00S'
>>> foo.func_code.co_consts
(None, 1024)
>>> foo.func_code.co_filename
'<stdin>'
>>> foo.func_code.co_firstlineno
1
>>> foo.func_code.co_flags
67
>>> foo.func_code.co_freevars
()
>>> foo.func_code.co_lnotab
'\x00\x01\x0c\x01\x08\x02'
>>> foo.func_code.co_name
'foo'
>>> foo.func_code.co_names
('x',)
>>> foo.func_code.co_nlocals
1
>>> foo.func_code.co_stacksize
2
>>> foo.func_code.co_varnames
('a',)
>>> 

  

其中,foo.func_code.co_code打印出来的就是Python的字节码。

 

Help on built-in function ord in module __builtin__:

ord(...)
    ord(c) -> integer
    
    Return the integer ordinal of a one-character string.

  

>>> [ord(i) for i in foo.func_code.co_code]
[124, 0, 0, 116, 0, 0, 107, 4, 0, 114, 20, 0, 124, 0, 0, 100, 1, 0, 21, 83, 124, 0, 0, 83, 100, 0, 0, 83]

 

这就是那些组成python字节码的字节。解释器会循环接收各个字节,查找每个字节的指令然后执行这个指令。需要注意的是,字节码本身并不包括任何python对象,或引用任何对象。

如果你想知道python字节码的意思,可以去找到CPython解释器文件(ceval.c),然后查阅100的意思、1的意思、0的意思,等等。  

 

>>> import dis
>>> dir(dis)
['EXTENDED_ARG', 'HAVE_ARGUMENT', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_have_code', '_test', 'cmp_op', 'dis', 'disassemble', 'disassemble_string', 'disco', 'distb', 'findlabels', 'findlinestarts', 'hascompare', 'hasconst', 'hasfree', 'hasjabs', 'hasjrel', 'haslocal', 'hasname', 'opmap', 'opname', 'sys', 'types']
>>> type(dis.dis)
<type 'function'>
>>> dir(dis.dis)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
>>> [ord(i) for i in dis.dis.func_code.co_code]
[124, 0, 0, 100, 1, 0, 107, 8, 0, 114, 23, 0, 116, 1, 0, 131, 0, 0, 1, 100, 1, 0, 83, 116, 2, 0, 124, 0, 0, 116, 3, 0, 106, 4, 0, 131, 2, 0, 114, 53, 0, 124, 0, 0, 106, 5, 0, 125, 0, 0, 110, 0, 0, 116, 6, 0, 124, 0, 0, 100, 2, 0, 131, 2, 0, 114, 80, 0, 124, 0, 0, 106, 7, 0, 125, 0, 0, 110, 0, 0, 116, 6, 0, 124, 0, 0, 100, 3, 0, 131, 2, 0, 114, 107, 0, 124, 0, 0, 106, 8, 0, 125, 0, 0, 110, 0, 0, 116, 6, 0, 124, 0, 0, 100, 4, 0, 131, 2, 0, 114, 246, 0, 124, 0, 0, 106, 9, 0, 106, 10, 0, 131, 0, 0, 125, 1, 0, 124, 1, 0, 106, 11, 0, 131, 0, 0, 1, 120, 174, 0, 124, 1, 0, 68, 93, 85, 0, 92, 2, 0, 125, 2, 0, 125, 3, 0, 116, 2, 0, 124, 3, 0, 116, 12, 0, 131, 2, 0, 114, 154, 0, 100, 5, 0, 124, 2, 0, 22, 71, 72, 121, 14, 0, 116, 13, 0, 124, 3, 0, 131, 1, 0, 1, 87, 110, 28, 0, 4, 116, 14, 0, 107, 10, 0, 114, 234, 0, 1, 125, 4, 0, 1, 100, 6, 0, 71, 124, 4, 0, 71, 72, 110, 1, 0, 88, 72, 113, 154, 0, 113, 154, 0, 87, 110, 78, 0, 116, 6, 0, 124, 0, 0, 100, 7, 0, 131, 2, 0, 114, 18, 1, 116, 15, 0, 124, 0, 0, 131, 1, 0, 1, 110, 50, 0, 116, 2, 0, 124, 0, 0, 116, 16, 0, 131, 2, 0, 114, 46, 1, 116, 17, 0, 124, 0, 0, 131, 1, 0, 1, 110, 22, 0, 116, 14, 0, 100, 8, 0, 116, 18, 0, 124, 0, 0, 131, 1, 0, 106, 19, 0, 22, 130, 2, 0, 100, 1, 0, 83]

  

>>> dir(dis.dis.func_code)
['__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames']
>>> dis.dis.func_code.co_filename
'/usr/lib/python2.7/dis.py'
>>> dis.dis.func_code.co_consts
('Disassemble classes, methods, functions, or code.\n\n    With no argument, disassemble the last traceback.\n\n    ', None, 'im_func', 'func_code', '__dict__', 'Disassembly of %s:', 'Sorry:', 'co_code', "don't know how to disassemble %s objects")
>>> dis.dis.func_code.co_names
('None', 'distb', 'isinstance', 'types', 'InstanceType', '__class__', 'hasattr', 'im_func', 'func_code', '__dict__', 'items', 'sort', '_have_code', 'dis', 'TypeError', 'disassemble', 'str', 'disassemble_string', 'type', '__name__')
>>> dis.dis.func_code.co_varnames
('x', 'items', 'name', 'x1', 'msg')
>>> dis.dis.func_code.co_stacksize
6
>>> dis.dis.func_code.co_nlocals
5

  

其实dis.dis也不过就是是一连串的字节码而已,它被Python解释器执行,从而完成指定的功能。

下面我们就使用dis.dis来反汇编一下字节码

>>> dis.dis(foo.func_code.co_code)
          0 LOAD_FAST           0 (0)
          3 LOAD_GLOBAL         0 (0)
          6 COMPARE_OP          4 (>)
          9 POP_JUMP_IF_FALSE    20
         12 LOAD_FAST           0 (0)
         15 LOAD_CONST          1 (1)
         18 BINARY_DIVIDE  
         19 RETURN_VALUE   
    >>   20 LOAD_FAST           0 (0)
         23 RETURN_VALUE   
         24 LOAD_CONST          0 (0)
         27 RETURN_VALUE  

  

 

 

 

 

 

 

  

 

 

posted @ 2014-07-11 14:19  Daniel King  阅读(1389)  评论(0编辑  收藏  举报