起因:公司的移动APPsaas后台项目基本稳定,但是总感觉不够精炼,和一些成熟的开源python框架比感觉缺乏美感,总想着重构后台代码,但是做的时候一团乱麻,不知道从何处下手;
由于缺乏框架实现的经验,所以打算从使用的几个Python框架入手,先学习别人的框架设计思路;
以此为为记,2017年3月31日。
- pony,一个ORM的mode实现(ORM中M的实现)
pony的mode有点特殊,需要继承Database中的成员类,直接撸关键代码:
class Database(object): @cut_traceback def __init__(self, *args, **kwargs): # argument 'self' cannot be named 'database', because 'database' can be in kwargs self.priority = 0 self._insert_cache = {} # ER-diagram related stuff: self._translator_cache = {} self._constructed_sql_cache = {} self.entities = {} self.schema = None self.Entity = type.__new__(EntityMeta, 'Entity', (Entity,), {}) self.Entity._database_ = self # Statistics-related stuff: self._global_stats = {} self._global_stats_lock = RLock() self._dblocal = DbLocal() self.provider = None if args or kwargs: self._bind(*args, **kwargs)
用户自定义的mode是Database中的Entity变量,这个变量是一个类,实现用户自定义变量的获取的转化处理;这样实现和Database偶合在一起了,即mode实例不能单独存在,必须依附于Database实例。
self.Entity = type.__new__(EntityMeta, 'Entity', (Entity,), {})
自己实现的mode继承方式:
class Customer(db.Entity): id = PrimaryKey(int, auto=True) name = Required(str) email = Required(str, unique=True) orders = Set("Order")
既然是一个db实例的的db.Entity成员,只不过这个成员比较特殊,是一个类:
继续查看EntityMeta、Entity是何方神圣:
代码太长只摘取关键部分:
class EntityMeta(type): def __new__(meta, name, bases, cls_dict): if 'Entity' in globals(): if '__slots__' in cls_dict: throw(TypeError, 'Entity classes cannot contain __slots__ variable') cls_dict['__slots__'] = () return super(EntityMeta, meta).__new__(meta, name, bases, cls_dict) @cut_traceback def __init__(entity, name, bases, cls_dict): super(EntityMeta, entity).__init__(name, bases, cls_dict) .......
# 查找mode中用户自定义属性,并根据属性类型做转化从而适配数据库,具体看Attribute类; direct_bases = [ c for c in entity.__bases__ if issubclass(c, Entity) and c.__name__ != 'Entity' ] entity._direct_bases_ = direct_bases base_attrs = [] for base in direct_bases: for a in base._attrs_: prev = base_attrs_dict.get(a.name) if prev is None: base_attrs_dict[a.name] = a base_attrs.append(a) entity._base_attrs_ = base_attrs new_attrs = [] for name, attr in items_list(entity.__dict__): if name in base_attrs_dict: throw(ERDiagramError, "Name '%s' hides base attribute %s" % (name,base_attrs_dict[name])) if not isinstance(attr, Attribute): continue if name.startswith('_') and name.endswith('_'): throw(ERDiagramError, 'Attribute name cannot both start and end with underscore. Got: %s' % name) if attr.entity is not None: throw(ERDiagramError, 'Duplicate use of attribute %s in entity %s' % (attr, entity.__name__)) attr._init_(entity, name) new_attrs.append(attr)
# 按照定义的顺序排序
new_attrs.sort(key=attrgetter('id'))
# 完成属性的收集
entity._new_attrs_ = new_attrs
entity._attrs_ = base_attrs + new_attrs
entity._adict_ = {attr.name: attr for attr in entity._attrs_}
用户调用接口:
@cut_traceback def __getitem__(entity, key): if type(key) is not tuple: key = (key,) if len(key) != len(entity._pk_attrs_): throw(TypeError, 'Invalid count of attrs in %s primary key (%s instead of %s)' % (entity.__name__, len(key), len(entity._pk_attrs_))) kwargs = {attr.name: value for attr, value in izip(entity._pk_attrs_, key)} return entity._find_one_(kwargs)
Entity是以EntityMeta为元类的一个类,主要处理数据库中的复杂关系:
class Entity(with_metaclass(EntityMeta)): .......
上面的定义和下面等价:
class Entity(object): __metaclass__ = EntityMeta
这样写是为了兼容py2和py3的差异:
py3中的语法为:
class MyClass(metaclass=Meta): pass
由于牵涉到元类的使用,实现难度:4颗星
关键:捕获用户自定义变量,实现底层存储和转化的封装,常用户ORM的M层实现。
总结:要实现子类成员的收集分以下3步,
1、需要实现自己的元类;
2、对子类类型进行判断,同类型属性合并
4、对外实现接口,如:__getitem__,__setter__
元类的使用可以参考:http://blog.jobbole.com/21351/
下面对比infi.clickhouse_orm中M的实现方式:
第一步:创建自己的元类
class ModelBase(type): ''' A metaclass for ORM models. It adds the _fields list to model classes. ''' ad_hoc_model_cache = {} def __new__(cls, name, bases, attrs): new_cls = super(ModelBase, cls).__new__(cls, name, bases, attrs) # Collect fields from parent classes base_fields = [] for base in bases: if isinstance(base, ModelBase): base_fields += base._fields # Build a list of fields, in the order they were listed in the class fields = base_fields + [item for item in attrs.items() if isinstance(item[1], Field)] fields.sort(key=lambda item: item[1].creation_counter) setattr(new_cls, '_fields', fields) return new_cls
其中,_fields存放用户自定义(类)属性:
第二步:实现M的基类,提供对外调用的接口
class Model(with_metaclass(ModelBase)): ''' A base class for ORM models. ''' engine = None readonly = False def __init__(self, **kwargs): ''' Creates a model instance, using keyword arguments as field values. Since values are immediately converted to their Pythonic type, invalid values will cause a ValueError to be raised. Unrecognized field names will cause an AttributeError. ''' super(Model, self).__init__() self._database = None # Assign field values from keyword arguments for name, value in kwargs.items(): field = self.get_field(name) if field: setattr(self, name, value) else: raise AttributeError('%s does not have a field called %s' % (self.__class__.__name__, name)) # Assign default values for fields not included in the keyword arguments for name, field in self._fields: if name not in kwargs: setattr(self, name, field.default) def __setattr__(self, name, value): ''' When setting a field value, converts the value to its Pythonic type and validates it. This may raise a ValueError. ''' field = self.get_field(name)
# 当field没有被覆盖,还是Field类型 if field: value = field.to_python(value, pytz.utc) field.validate(value)
# 如果已经被覆盖,直接覆盖(此处有bug,初次赋值对类型做检查,再次赋值不会对类型做检查) super(Model, self).__setattr__(name, value) def get_field(self, name): ''' Get a Field instance given its name, or None if not found. ''' field = getattr(self.__class__, name, None) return field if isinstance(field, Field) else None
其中:
__init__提供类似ModeSome(**kwargs)的构建方式,_fields的作用1、初始化时设置默认值,2、在类级别保存Field,因为ModeSome(**kwargs)及__setattr__会覆盖Field属性。
__setattr__提供类似字典赋值的接口,
在抓取界赫赫有名的Scrapy中的用户自定义Item也用到了ORM模型的思想:
大家感受一下scrapy中元类的实现方式:
class ItemMeta(ABCMeta): def __new__(mcs, class_name, bases, attrs): classcell = attrs.pop('__classcell__', None) new_bases = tuple(base._class for base in bases if hasattr(base, '_class')) _class = super(ItemMeta, mcs).__new__(mcs, 'x_' + class_name, new_bases, attrs) fields = getattr(_class, 'fields', {}) new_attrs = {} for n in dir(_class): v = getattr(_class, n) if isinstance(v, Field): fields[n] = v elif n in attrs: new_attrs[n] = attrs[n] new_attrs['fields'] = fields new_attrs['_class'] = _class if classcell is not None: new_attrs['__classcell__'] = classcell return super(ItemMeta, mcs).__new__(mcs, class_name, bases, new_attrs)
元类继承了ABCMeta而来,子类的区分方式是根据是否包含_class变量来区分的
scrapy中Item的父类:
live_refs = defaultdict(weakref.WeakKeyDictionary) class object_ref(object): """Inherit from this class (instead of object) to a keep a record of live instances""" __slots__ = () def __new__(cls, *args, **kwargs): obj = object.__new__(cls) live_refs[cls][obj] = time() return obj class BaseItem(object_ref): """Base class for all scraped items.""" pass
Item的实现:
class DictItem(MutableMapping, BaseItem):
fields = {}
def __init__(self, *args, **kwargs):
self._values = {}
if args or kwargs: # avoid creating dict for most common case
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
...........
@six.add_metaclass(ItemMeta)
class Item(DictItem):
pass
Item复用了MutableMapping类,其行为更像python原生字典。
- pony加载不同的provider实现(动态创建实例)
def _bind(self, *argv, **kwargs): if self.provider is not None: throw(TypeError, 'Database object was already bound to %s provider' % self.provider.dialect) if args: provider, args = args[0], args[1:] elif 'provider' not in kwargs: throw(TypeError, 'Database provider is not specified') else: provider = kwargs.pop('provider') if isinstance(provider, type) and issubclass(provider, DBAPIProvider): provider_cls = provider else: if not isinstance(provider, basestring): throw(TypeError) if provider == 'pygresql': throw(TypeError, 'Pony no longer supports PyGreSQL module. Please use psycopg2 instead.') provider_module = import_module('pony.orm.dbproviders.' + provider) provider_cls = provider_module.provider_cls self.provider = provider = provider_cls(*args, **kwargs)
关键代码:
provider_module = import_module('pony.orm.dbproviders.' + provider) provider_cls = provider_module.provider_cls self.provider = provider = provider_cls(*args, **kwargs)
provider由是用户输入的标识字符串,所有的provider模块对外统一接口名:provider_cls
以SQLite模块为例:
provider_cls = SQLiteProvider
调用举例:
db = Database("sqlite", "demo.sqlite", create_db=True)
实现技巧:1颗星
适用场景,通过标识创建所需要的对象,工场模式。