关于metaclass,我原以为我是懂的
关于Python2.x中metaclass这一黑科技,我原以为我是懂的,只有当被打脸的时候,我才认识到自己too young too simple sometimes native。
为什么之前我认为自己懂了呢,因为我阅读过stackoverflow上的《what-is-a-metaclass-in-python》这一神作(注意,本文中专指e-satis的回答),在伯乐在线上也有不错的翻译《深刻理解Python中的元类(metaclass)》。而且在实际项目中也使用过metaclass,比如creating-a-singleton-in-python一文中提到的用metaclass创建单例,比如用metaclass实现mixin效果,当然,正是后面这个使用案列让我重新认识metaclass。
本文地址:http://www.cnblogs.com/xybaby/p/7927407.html
要点回顾
不得不承认《what-is-a-metaclass-in-python》真的是非常棒,仔细阅读完这篇文章,基本上就搞清了metaclass。因此在这里,只是强调一些要点,强烈建议还没阅读过原文的pythoner去阅读一下。
第一:everything is object
python中,一切都是对象,比如一个数字、一个字符串、一个函数。对象是类(class)的是实例,类(class)也是对象,是type的实例。type对象本身又是type类的实例(鸡生蛋还是蛋生鸡?),因此我们称type为metaclass(中文元类)。在《python源码剖析》中,有清晰的表示
在python中,可以通过对象的__class__属性来查看对应的类,也可以通过isinstance来判断一个对象是不是某一个类的实例。for example:
>>> class OBJ(object):
... a = 1
...
>>> o = OBJ()
>>> o.__class__
<class '__main__.OBJ'>
>>> isinstance(o, OBJ)
True
>>> OBJ.__class__
<type 'type'>
>>> isinstance(OBJ, type)
True
>>> type.__class__
<type 'type'>
>>>
第二:metaclass可以定制类的创建
我们都是通过class OBJ(obejct):pass的方式来创建一个类,上面有提到,类(class)是type类型的实例,按照我们常见的创建类的实例(instance)的方法,那么类(class)应该就是用 class="type"(*args)的方式创建的。确实如此,python document中有明确描述:
class type(name, bases, dict)
With three arguments, return a new type object. This is essentially a dynamic form of the class statement. The name string is the class name and becomes the __name__ attribute; the bases tuple itemizes the base classes and becomes the __bases__ attribute; and the dict dictionary is the namespace containing definitions for class body and becomes the __dict__ attribute. For example, the following two statements create identical type objects:
该函数返回的就是一个class,三个参数分别是类名、基类列表、类的属性。比如在上面提到的OBJ类,完全等价于:OBJ = type('OBJ', (), {'a': 1})
当然,用上面的方式创建一个类(class)看起来很傻,不过其好处在于可以动态的创建一个类。
python将定制类开放给了开发者,type也是一个类型,那么自然可以被继承,type的子类替代了Python默认的创建类(class)的行为,什么时候需要做呢
Some ideas that have been explored including logging, interface checking, automatic delegation, automatic property creation, proxies, frameworks, and automatic resource locking/synchronization.
那么当我们用class OBJ(obejct):pass的形式声明一个类的时候,怎么指定OBJ的创建行为呢,那就是在类中使用__metaclass__。最简单的例子:
1 class Metaclass(type): 2 def __new__(cls, name, bases, dct): 3 print 'HAHAHA' 4 dct['a'] = 1 5 return type.__new__(cls, name, bases, dct) 6 7 print 'before Create OBJ' 8 class OBJ(object): 9 __metaclass__ = Metaclass 10 print 'after Create OBJ' 11 12 if __name__ == '__main__': 13 print OBJ.a
运行结果:
before Create OBJ
HAHAHA
after Create OBJ
1
可以看到在代码执行的时候,在创建OBJ这个类的时候,__metaclass__起了作用,为OBJ增加了一个类属性‘a'
第三:关于__metaclass__的两个细节
首先,__metaclass__是一个callable即可,不一定非得是一个类,在what-is-a-metaclass-in-python就有__metaclass__是function的实例,也解释了为什么__metaclass__为一个类是更好的选择。
其次,就是如何查找并应用__metaclass__,这哥在what-is-a-metaclass-in-python没用详细介绍,但是在python document中是有的:
The appropriate metaclass is determined by the following precedence rules:
● If dict['__metaclass__'] exists, it is used.
● Otherwise, if there is at least one base class, its metaclass is used (this looks for a __class__ attribute first and if not found, uses its type).
● Otherwise, if a global variable named __metaclass__ exists, it is used.
● Otherwise, the old-style, classic metaclass (types.ClassType) is used.
即:
先从类的dict中查找,否则从基类的dict查找(这里会有一些需要注意的细节,后文会提到),否则从global作用域查找,否则使用默认的创建方式
对应的python源码在ceval.c::build_class,核心代码如下,很明了。
1 static PyObject * 2 build_class(PyObject *methods, PyObject *bases, PyObject *name) 3 { 4 PyObject *metaclass = NULL, *result, *base; 5 6 if (PyDict_Check(methods)) 7 metaclass = PyDict_GetItemString(methods, "__metaclass__"); 8 if (metaclass != NULL) 9 Py_INCREF(metaclass); 10 else if (PyTuple_Check(bases) && PyTuple_GET_SIZE(bases) > 0) { 11 base = PyTuple_GET_ITEM(bases, 0); 12 metaclass = PyObject_GetAttrString(base, "__class__"); 13 if (metaclass == NULL) { 14 PyErr_Clear(); 15 metaclass = (PyObject *)base->ob_type; 16 Py_INCREF(metaclass); 17 } 18 } 19 else { 20 PyObject *g = PyEval_GetGlobals(); 21 if (g != NULL && PyDict_Check(g)) 22 metaclass = PyDict_GetItemString(g, "__metaclass__"); 23 if (metaclass == NULL) 24 metaclass = (PyObject *) &PyClass_Type; 25 Py_INCREF(metaclass); 26 } 27 result = PyObject_CallFunctionObjArgs(metaclass, name, bases, methods, 28 NULL); 29 Py_DECREF(metaclass); 30 if (result == NULL && PyErr_ExceptionMatches(PyExc_TypeError)) { 31 /* A type error here likely means that the user passed 32 in a base that was not a class (such the random module 33 instead of the random.random type). Help them out with 34 by augmenting the error message with more information.*/ 35 36 PyObject *ptype, *pvalue, *ptraceback; 37 38 PyErr_Fetch(&ptype, &pvalue, &ptraceback); 39 if (PyString_Check(pvalue)) { 40 PyObject *newmsg; 41 newmsg = PyString_FromFormat( 42 "Error when calling the metaclass bases\n" 43 " %s", 44 PyString_AS_STRING(pvalue)); 45 if (newmsg != NULL) { 46 Py_DECREF(pvalue); 47 pvalue = newmsg; 48 } 49 } 50 PyErr_Restore(ptype, pvalue, ptraceback); 51 } 52 return result; 53 }
我遇到的问题
在项目中,我们使用了metaclass来实现Mixin的行为,即某一个类拥有定义在其他一些类中的行为,简单来说,就是要把其他类的函数都注入到这个类。但是我们不想用继承的方法,一来,语义上不是is a的关系,不使用继承;二来,python的mro也不是很东西。我们是这么干的,伪码如下:
1 import inspect 2 import types 3 class RunImp(object): 4 def run(self): 5 print 'just run' 6 7 class FlyImp(object): 8 def fly(self): 9 print 'just fly' 10 11 class MetaMixin(type): 12 def __init__(cls, name, bases, dic): 13 super(MetaMixin, cls).__init__(name, bases, dic) 14 member_list = (RunImp, FlyImp) 15 16 for imp_member in member_list: 17 if not imp_member: 18 continue 19 20 for method_name, fun in inspect.getmembers(imp_member, inspect.ismethod): 21 setattr(cls, method_name, fun.im_func) 22 23 class Bird(object): 24 __metaclass__ = MetaMixin 25 26 print Bird.__dict__ 27 print Bird.__base__
运行结果如下:
{'fly': <function fly at 0x025220F0>, '__module__': '__main__', '__metaclass__': <class '__main__.MetaMixin'>, '__dict__': <attribute '__dict__' of 'Bird' objects>, 'run': <function run at 0x025220B0>, '__weakref__': <attribute '__weakref__' of 'Bird' objects>, '__doc__': None}
<type 'object'>
可以看到,通过metaclass,Bird拥有了run fly两个method。但是类的继承体系没有收到影响。
重载通过MetaMixin中注入的方法
某一日需求变化,需要继承自Brid,定义特殊的Bird,重载run方法,新增代码如下;
1 class Bird(object): 2 __metaclass__ = MetaMixin 3 4 class SpecialBird(Bird): 5 def run(self): 6 print 'SpecialBird run' 7 8 if __name__ == '__main__': 9 b = SpecialBird() 10 b.run()
运行结果:
just run
what?!,重载根本不生效。这似乎颠覆了我的认知:Bird类有一个run属性,子类SpecialBird重载了这个方法,那么就应该调用子类的方法啊。
什么原因呢,答案就在上面提到的__metaclass__查找顺序,因为SpecialBird自身没有定义__metaclass__属性,那么会使用基类Bird的__metaclass__属性,因此虽然在SpecialBird中定义了run方法,但是会被MetaMixin给覆盖掉。使用dis验证如下
1 import dis 2 3 class SpecialBird(Bird): 4 def run(self): 5 print 'SpecialBird run' 6 dis.dis(run) 7 dis.dis(SpecialBird.run)
可以看到在SpecialBird.run方法本来是类中显示定义的方法,后来被MetaMixin所覆盖了。
防止属性被意外覆盖
这就暴露出了一个问题,当前版本的MetaMixin可能导致属性的覆盖问题。比如在RunImp、FlyImp有同名的函数foo时,在创建好的Bird类中,其foo方法来自于FlyImp,而不是RunImp。通用,即使在Bird内部也定义foo方法,也会被FlyImp.foo覆盖。
这显然不是我们所期望的结果,这也是python的陷阱:没有报错,但是以错误的方式运行。我们要做的就是尽早把这个错误爆出来。实现很简单,只需要简单修改MetaMixin,见高亮标示。
9 class MetaMixin(type): 10 def __init__(cls, name, bases, dic): 11 super(MetaMixin, cls).__init__(name, bases, dic) 12 member_list = (RunImp, FlyImp) 13 14 for imp_member in member_list: 15 if not imp_member: 16 continue 17 18 for method_name, fun in inspect.getmembers(imp_member, inspect.ismethod): 19 assert not hasattr(cls, method_name), method_name 20 setattr(cls, method_name, fun.im_func)
当我们修改MetaMixin之后,再次运行下面的代码的时候就报错了
class RunImp(object): def run(self): print 'just run' class FlyImp(object): def fly(self): print 'just fly' class MetaMixin(type): def __init__(cls, name, bases, dic): super(MetaMixin, cls).__init__(name, bases, dic) member_list = (RunImp, FlyImp) for imp_member in member_list: if not imp_member: continue for method_name, fun in inspect.getmembers(imp_member, inspect.ismethod): assert not hasattr(cls, method_name), method_name setattr(cls, method_name, fun.im_func) class Bird(object): __metaclass__ = MetaMixin class SpecialBird(Bird): pass
运行结果抛了异常
Traceback (most recent call last):
assert not hasattr(cls, method_name), method_name
AssertionError: run
呃,代码总共就几行,只有一个run方法啊,怎么会报错说有重复的方法呢,在MetaMixin中加一点log
1 class RunImp(object): 2 def run(self): 3 print 'just run' 4 5 class FlyImp(object): 6 def fly(self): 7 print 'just fly' 8 9 class MetaMixin(type): 10 def __init__(cls, name, bases, dic): 11 super(MetaMixin, cls).__init__(name, bases, dic) 12 member_list = (RunImp, FlyImp) 13 14 for imp_member in member_list: 15 if not imp_member: 16 continue 17 18 for method_name, fun in inspect.getmembers(imp_member, inspect.ismethod): 19 print('class %s get method %s from %s' % (name, method_name, imp_member)) 20 # assert not hasattr(cls, method_name), method_name 21 setattr(cls, method_name, fun.im_func) 22 23 class Bird(object): 24 __metaclass__ = MetaMixin 25 26 class SpecialBird(Bird): 27 pass
运行结果:
class Bird get method run from <class '__main__.RunImp'>
class Bird get method fly from <class '__main__.FlyImp'>
class SpecialBird get method run from <class '__main__.RunImp'>
class SpecialBird get method fly from <class '__main__.FlyImp'>
一目了然,原来在创建Bird的时候已经将run、fly方法注入到了bird.__dict__, SpecialBird继承子Bird,那么在Speialbird使用__metaclass__定制化之前,SpecialBird已经有了run、fly属性,然后再度运用metaclass的时候就检查失败了。
简而言之,这个是一个很隐蔽的陷阱:如果基类定义了__metaclass__,那么子类在创建的时候会再次调用metaclass,然而理论上来说可能是没有必要的,甚至会有副作用。
解决重复使用metaclass
首先,既然我们知道首先在子类的dict中查找__metaclass__,找不到再考虑基类,那么我们子类(SpecialBird)中重新生命一个__metaclass__就好了,如下所示:
1 class DummyMetaIMixin(type): 2 pass 3 4 class SpecialBird(Bird): 5 __metaclass__ = DummyMetaIMixin
很遗憾,抛出了一个我之前从未见过的异常
TypeError: Error when calling the metaclass bases
metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
意思很明显,子类的__metaclass__必须继承自基类的__metaclass__,那么再改改
1 class DummyMetaIMixin(MetaMixin): 2 def __init__(cls, name, bases, dic): 3 type.__init__(cls, name, bases, dic) 4 5 class SpecialBird(Bird): 6 __metaclass__ = DummyMetaIMixin
This‘s OK!完整代码如下:
1 class RunImp(object): 2 def run(self): 3 print 'just run' 4 5 class FlyImp(object): 6 def fly(self): 7 print 'just fly' 8 9 class MetaMixin(type): 10 def __init__(cls, name, bases, dic): 11 super(MetaMixin, cls).__init__(name, bases, dic) 12 member_list = (RunImp, FlyImp) 13 14 for imp_member in member_list: 15 if not imp_member: 16 continue 17 18 for method_name, fun in inspect.getmembers(imp_member, inspect.ismethod): 19 print('class %s get method %s from %s' % (name, method_name, imp_member)) 20 assert not hasattr(cls, method_name), method_name 21 setattr(cls, method_name, fun.im_func) 22 23 class Bird(object): 24 __metaclass__ = MetaMixin 25 26 27 class DummyMetaIMixin(MetaMixin): 28 def __init__(cls, name, bases, dic): 29 type.__init__(cls, name, bases, dic) 30 31 class SpecialBird(Bird): 32 __metaclass__ = DummyMetaIMixin
metaclass __new__ __init__
行文至此,使用过metaclass的pythoner可能会有疑问,因为网上的很多case都是在metaclass中重载type的__new__方法,而不是__init__。实时上,对于我们使用了MetaMixin,也可以通过重载__new__方法实现,而且还有意外的惊喜!
1 class RunImp(object): 2 def run(self): 3 print 'just run' 4 5 class FlyImp(object): 6 def fly(self): 7 print 'just fly' 8 9 class MetaMixinEx(type): 10 def __new__(cls, name, bases, dic): 11 member_list = (RunImp, FlyImp) 12 13 for imp_member in member_list: 14 if not imp_member: 15 continue 16 17 for method_name, fun in inspect.getmembers(imp_member, inspect.ismethod): 18 print('class %s get method %s from %s' % (name, method_name, imp_member)) 19 assert method_name not in dic, (imp_member, method_name) 20 dic[method_name] = fun.im_func 21 return type.__new__(cls, name, bases, dic) 22 23 class Bird(object): 24 __metaclass__ = MetaMixinEx 25 26 class SpecialBird(Bird): 27 pass
运行结果
class Bird get method run from <class '__main__.RunImp'>
class Bird get method fly from <class '__main__.FlyImp'>
class SpecialBird get method run from <class '__main__.RunImp'>
class SpecialBird get method fly from <class '__main__.FlyImp'>
从结果可以看到,虽然子类也重复运行了一遍metaclass, 但并没有报错!注意代码第18行是有assert的!为什么呢,本质是因为__new__和__init__两个magic method的区别
绝大多数Python程序员都写过__init__方法,但很少有人写__new__方法,因为绝大多数时候,我们都无需重载__new__方法。python document也说了,哪些场景需要重载__new__方法呢
__new__() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. It is also commonly overridden in custom metaclasses in order to customize class creation.
即用于继承不可变对象,或者使用在metaclass中!
那么__new__和__init__有什么却别呢
__new__:
Called to create a new instance of class cls
__init__:
Called when the instance is created.
即__new__用于如何创建实例,而__init__是在实例已经创建好之后调用
注意,仅仅当__new__返回cls的实例时,才会调用__init__方法,__init__方法的参数同__new__方法。看下面的例子
1 class OBJ(object): 2 def __new__(self, a): 3 4 ins = object.__new__(OBJ, a) 5 print "call OBJ new with parameter %s, created inst %s" % (a, ins) 6 return ins # 去掉这行就不会再调用__init__ 7 8 def __init__(self, a): 9 print "call OBJ new with parameter %s, inst %s" % (a, self) 10 11 if __name__ == '__main__': 12 OBJ(123)
call OBJ new with parameter 123, created inst <__main__.OBJ object at 0x024C2470>
call OBJ new with parameter 123, inst <__main__.OBJ object at 0x024C2470>
可以看到,__init__中的self正是__new__中创建并返回的ins,正如第6行的注释所示,如果去掉第6行(即不返回ins), 那么是不会调用__init__方法的。
metaclass继承自type,那么其__new__、__init__和普通class的__new__、__init__是一样的,只不过metaclass的__new__返回的是一个类。我们看看metaclass的例子
1 class Meta(type): 2 def __new__(cls, name, bases, dic): 3 print 'here class is %s' % cls 4 print 'class %s will be create with bases class %s and attrs %s' % (name, bases, dic.keys()) 5 dic['what'] = name 6 return type.__new__(cls, name, bases, dic) 7 8 def __init__(cls, name, bases, dic): 9 print 'here class is %s' % cls 10 print 'class %s will be inited with bases class %s and attrs %s' % (name, bases, dic.keys()) 11 print cls.what 12 super(Meta, cls).__init__(name, bases, dic) 13 14 class OBJ(object): 15 __metaclass__ = Meta 16 attr = 1 17 18 print('-----------------------------------------------') 19 class SubObj(OBJ): 20 pass
输出结果:
here class is <class '__main__.Meta'>
class OBJ will be create with bases class (<type 'object'>,) and attrs ['__module__', '__metaclass__', 'attr']
here class is <class '__main__.OBJ'>
class OBJ will be inited with bases class (<type 'object'>,) and attrs ['__module__', '__metaclass__', 'attr', 'what']
OBJ
-----------------------------------------------
here class is <class '__main__.Meta'>
class SubObj will be create with bases class (<class '__main__.OBJ'>,) and attrs ['__module__']
here class is <class '__main__.SubObj'>
class SubObj will be inited with bases class (<class '__main__.OBJ'>,) and attrs ['__module__', 'what']
SubObj
注意分割线。
首先要注意虽然在new init方法的第一个参数都是cls,但是完全是两回事!
然后在调用new之后,产生的类对象(cls如OBJ)就已经有了动态添加的what 属性
在调用__new__的时候,dic只来自类的scope内所定义的属性,所以在创建SubObj的时候,dic里面是没有属性的,attr在基类OBJ的dict里面,也能看出在__new__中修改后的dic被传入到__init__方法当中。