marshmallow基本使用
- What is marshmallow
marshmallow(Object serialization and deserialization, lightweight and fluffy.)用于对对象进行序列化和反序列化,并同步进行数据验证。
- What is schema
对对象进行序列化和反序列化需要一个中间载体,schema就是这个中间载体。
- Declaring Schemas
申明一个常见的模型类(Model)
1 import datetime as dt 2 3 class User(object): 4 def __init__(self, name, email): 5 self.name = name 6 self.email = email 7 self.created_at = dt.datetime.now() 8 9 def __repr__(self): 10 return '<User(name={self.name!r})>'.format(self=self)
给模型User申明一个对应的schema
1 from marshmallow import Schema, fields 2 3 class UserSchema(Schema): 4 name = fields.Str() 5 email = fields.Email() 6 created_at = fields.DateTime()
或者使用from_dict从一个字典申明一个schema
1 from marshmallow import Schema, fields 2 3 UserSchema = Schema.from_dict( 4 { 5 "name": fields.Str(), 6 "email": fields.Email(), 7 "created_at": fields.DateTime()} 8 )
- Serializing Objects ("Dumping")
序列化对象,使用dump方法
1 from pprint import pprint 2 3 user = User(name="Monty", email="monty@python.org") 4 schema = UserSchema() 5 result = schema.dump(user) 6 pprint(result) 7 8 {'created_at': '2019-11-13T15:36:16.952377', 9 'email': 'monty@python.org', 10 'name': 'Monty'}
使用dumps方法,则转换成字符串
1 user = User(name="Monty", email="monty@python.org") 2 schema = UserSchema() 3 result = schema.dumps(user) 4 pprint(result) 5 6 ('{"created_at": "2019-11-13T15:38:44.327872", "name": "Monty", "email": ' 7 '"monty@python.org"}')
- Filtering Output
过滤输出,如果只想输出特定字段,可以使用only或者exclude参数
1 user = User(name="Monty", email="monty@python.org") 2 schema = UserSchema(only=('name', )) 3 result = schema.dump(user) 4 pprint(result) 5 6 {'name': 'Monty'}
- Deserializing Objects (“Loading”)
默认情况下,反序列化的load方法会返回一个字典(没有触发ValidationError的情况下)
1 user_data = { 2 'created_at': '2014-08-11T05:26:03.869245', 3 'name': 'Ken', 4 'email': 'ken@yahoo.com' 5 } 6 7 schema = UserSchema() 8 result = schema.load(user_data) 9 pprint(result) 10 11 {'created_at': datetime.datetime(2014, 8, 11, 5, 26, 3, 869245), 12 'email': 'ken@yahoo.com', 13 'name': 'Ken'}
为了反序列化成一个对象,需要给schema的一个方法加上post_load装饰器
1 from marshmallow import Schema, fields, post_load 2 3 class UserSchema(Schema): 4 name = fields.Str() 5 email = fields.Email() 6 created_at = fields.DateTime() 7 8 @post_load 9 def make_user(self, data, **kwargs): 10 return User(**data)
现在使用load方法就可以返回一个对象了
1 user_data = { 2 'name': 'Ken', 3 'email': 'ken@yahoo.com' 4 } 5 6 schema = UserSchema() 7 result = schema.load(user_data) 8 pprint(result) 9 10 <User(name='Ken')>
- Handling Collections of Objects
设置many=True可以一次处理一个集合的对象
1 user1 = User(name='Mick', email='mick@stones.com') 2 user2 = User(name='Keith', email='keith@stones.com') 3 users = [user1, user2] 4 schema = UserSchema(many=True) 5 result = schema.dump(users) # OR UserSchema().dump(users, many=True) 6 pprint(result) 7 8 [{'created_at': '2019-11-13T16:07:03.841332', 9 'email': 'mick@stones.com', 10 'name': 'Mick'}, 11 {'created_at': '2019-11-13T16:07:03.841332', 12 'email': 'keith@stones.com', 13 'name': 'Keith'}]
- Validation
load方法可以对字段进行验证,并引发ValidationError异常,可以调用ValidationError.valid_data查看通过验证的字段,schema内置有常见类型的字段检查,例如:Email,URL等
1 try: 2 result = UserSchema().load({'name': 'Lily', 'email': 'foo@yahoo'}) 3 except ValidationError as err: 4 print(err.messages) # => {'email': ['Not a valid email address.']} 5 print(err.valid_data) # => {'name': 'Lily'}
当验证一个集合对象,未通过验证的对象的字段会以当前在集合中的序号为键的字典的形式抛出
1 from marshmallow import Schema, fields, ValidationError 2 3 class BandMembersSchema(Schema): 4 name = fields.String(required=True) 5 email = fields.Email() 6 7 user_data = [ 8 {'email': 'mick@stones.com', 'name': 'Mick'}, 9 {'email': 'invalid', 'name': 'Invalid'}, # invalid email 10 {'email': 'keith@stones.com', 'name': 'Keith'}, 11 {'email': 'charlie@stones.com'}, # missing "name" 12 ] 13 14 try: 15 BandMembersSchema(many=True).load(user_data2) 16 except ValidationError as err: 17 pprint(err.messages) 18 19 {1: {'email': ['Not a valid email address.']}, 20 3: {'name': ['Missing data for required field.']}}
可以使用validate参数来对字段进行验证,这里可以找到更多内建的validate方法marshmallow.validate
1 from marshmallow import Schema, fields, validate, ValidationError 2 3 class UserSchema(Schema): 4 name = fields.Str(validate=validate.Length(min=1)) 5 permission = fields.Str(validate=validate.OneOf(["read", "write", "admin"])) 6 age = fields.Int(validate=validate.Range(min=18, max=40)) 7 8 in_data = {"name": "", "permission": "invalid", "age": 71} 9 try: 10 UserSchema().load(in_data) 11 except ValidationError as err: 12 pprint(err.messages) 13 14 {'age': ['Must be greater than or equal to 18 and less than or equal to 40.'], 15 'name': ['Shorter than minimum length 1.'], 16 'permission': ['Must be one of: read, write, admin.']}
也可以自定义验证方法
1 from marshmallow import Schema, fields, ValidationError 2 3 def validate_quantity(n): 4 if n < 0: 5 raise ValidationError('Quantity must be greater than 0.') 6 if n > 30: 7 raise ValidationError('Quantity must not be greater than 30.') 8 9 class ItemSchema(Schema): 10 quantity = fields.Integer(validate=validate_quantity) 11 12 in_data = {'quantity': 31} 13 try: 14 ItemSchema().load(in_data) 15 except ValidationError as err: 16 pprint(err.messages) 17 18 {'quantity': ['Quantity must not be greater than 30.']}
有时候把字段验证写成方法会更方便,使用validates装饰器可以注册字段验证方法
1 from marshmallow import fields, Schema, validates, ValidationError 2 3 class ItemSchema(Schema): 4 quantity = fields.Integer() 5 6 @validates("quantity") 7 def validate_quantity(self, value): 8 if value < 0: 9 raise ValidationError("Quantity must be greater than 0.") 10 if value > 30: 11 raise ValidationError("Quantity must not be greater than 30.")
如果在class的Meta里定义strict=True,则一次可对多个字段进行验证
1 from marshmallow import Schema, fields, ValidationError, validates_schema 2 3 class ItemSchema(Schema): 4 quantity = fields.Integer() 5 age = fields.Integer() 6 7 class Meta: 8 strict = True 9 10 @validates_schema 11 def validate_quantity(self, data, **kwargs): 12 if data['quantity'] < 0: 13 raise ValidationError('Quantity must be greater than 0.') 14 if data['quantity'] > 30: 15 raise ValidationError('Quantity must not be greater than 30.') 16 17 if data['age'] < 0: 18 raise ValidationError('Age must be greater than 0.') 19 if data['age'] > 30: 20 raise ValidationError('Age must not be greater than 30.') 21 22 try: 23 ItemSchema().load({'quantity': 21, 'age': 31}) 24 except ValidationError as err: 25 pprint(err.messages) 26 27 {'_schema': ['Age must not be greater than 30.']}
- Required Fields
设置required=True,如果没有传入对应字段,当调用load方法时会抛出一个错误信息,设置error_message参数可以定制错误信息
1 from marshmallow import Schema, fields, ValidationError 2 3 class UserSchema(Schema): 4 name = fields.String(required=True) 5 age = fields.Integer(required=True, error_messages={"required": "Age is required."}) 6 city = fields.String( 7 required=True, 8 error_messages={"required": {"message": "City required", "code": 400}}, 9 ) 10 email = fields.Email() 11 12 try: 13 result = UserSchema().load({"email": "foo@bar.com"}) 14 except ValidationError as err: 15 pprint(err.messages) 16 17 {'age': ['Age is required.'], 18 'city': {'code': 400, 'message': 'City required'}, 19 'name': ['Missing data for required field.']}
自定义全局的默认错误信息
1 from marshmallow import fields 2 3 fields.Field.default_error_messages = { 4 "required": "缺少必要数据", 5 "null": "数据不能为空", 6 "validator_failed": "非法数据", 7 } 8 9 fields.Str.default_error_messages = { 10 'invalid': "不是合法文本" 11 } 12 13 fields.Int.default_error_messages = { 14 "invalid": "不是合法整数" 15 } 16 17 fields.Number.default_error_messages = { 18 "invalid": "不是合法数字" 19 } 20 21 fields.Boolean.default_error_messages = { 22 "invalid": "不是合法布尔值" 23 }
- Partial Loading
当schema中的字段设置了required=True,如果传入的对象缺少了相应的字段,则通不过schema的校验,这时候可以设置partial参数来忽略缺失的字段
1 class UserSchema(Schema): 2 name = fields.String(required=True) 3 age = fields.Integer(required=True) 4 5 result = UserSchema().load({"age": 42}, partial=("name",)) 6 # OR result = UserSchema().load({"age": 42}, partial=True) 7 8 print(result) 9 10 {'age': 42}
- Specifying Defaults
指定默认值,missing用来指定反序列化时默认缺省值,default用来指定序列化时默认缺省值
1 class UserSchema(Schema): 2 id = fields.UUID(missing=uuid.uuid1) 3 birthdate = fields.DateTime(default=dt.datetime(2017, 9, 29)) 4 5 UserSchema().load({}) 6 # {'id': UUID('337d946c-32cd-11e8-b475-0022192ed31b')} 7 UserSchema().dump({}) 8 # {'birthdate': '2017-09-29T00:00:00+00:00'}
- Handling Unknown Fields
默认情况下,使用load方法,如果传入的对象字段在schema中不存在,则会引发ValidationError异常
1 class UserSchema(Schema): 2 name = fields.Str() 3 4 try: 5 pprint(UserSchema().load({"name": "Mike", "age": 10})) 6 except ValidationError as err: 7 print(err.messages) 8 9 {'age': ['Unknown field.']}
可以通过设置EXCLUDE或者INCLUDE来改变这种默认行为
1 from marshmallow import Schema, INCLUDE, EXCLUDE 2 3 class UserSchema(Schema): 4 name = fields.Str() 5 6 class Meta: 7 unknown = INCLUDE 8 # unknown = EXCLUDE 9 10 try: 11 pprint(UserSchema().load({"name": "Mike", "age": 10})) 12 # OR schema = UserSchema(unknown=INCLUDE) 13 # OR UserSchema().load({"name": "Mike", "age": 10}, unknown=INCLUDE) 14 except ValidationError as err: 15 print(err.messages) 16 17 {'age': 10, 'name': 'Mike'}
- Validation Without Deserialization
如果仅是需要校验输入数据(而不需要反序列化对象),使用Schema.validate()方法即可
1 errors = UserSchema().validate({"name": "Ronnie", "email": "invalid-email"}) 2 print(errors) 3 4 {'email': ['Not a valid email address.']}
- "Read-only" and "Write-only" Fields
在通常的web api中,dump_only和load_only参数就类似于“read-only”和“write-only”字段
1 class UserSchema(Schema): 2 name = fields.Str() 3 # password is "write-only" 4 password = fields.Str(load_only=True) 5 # created_at is "read-only" 6 created_at = fields.DateTime(dump_only=True)
注意:当使用load方法反序列化对象时,dump-only字段会被认为是unknown未知字段
1 class UserSchema(Schema): 2 name = fields.Str() 3 password = fields.Str(load_only=True) 4 created_at = fields.DateTime(required=True, dump_only=True) 5 6 7 try: 8 UserSchema().load({"name": "Mike", "password": "111111", "created_at": "2014-08-11T05:26:03.869245"}) 9 except ValidationError as err: 10 print(err.messages) 11 12 {'created_at': ['Unknown field.']}
如果此时设置INCLUDE,则对应的字段将不会得到校验
1 class UserSchema(Schema): 2 name = fields.Str() 3 password = fields.Str(load_only=True) 4 created_at = fields.DateTime(required=True, allow_none=False, dump_only=True) 5 6 7 try: 8 UserSchema().load({"name": "Mike", "password": "111111", "created_at": None}, unknown=INCLUDE) 9 print("success") 10 except ValidationError as err: 11 print(err.messages) 12 13 success
- Specifying Serialization/Deserialization Keys
如果在序列化和反序列化的时候,传入的数据字段不一致怎么办呢?可以指定data_key参数解决
1 class UserSchema(Schema): 2 name = fields.String() 3 email = fields.Email(data_key="emailAddress") 4 5 s = UserSchema() 6 7 data = {"name": "Mike", "email": "foo@bar.com"} 8 result = s.dump(data) 9 # {'name': u'Mike', 10 # 'emailAddress': 'foo@bar.com'} 11 12 data = {"name": "Mike", "emailAddress": "foo@bar.com"} 13 result = s.load(data) 14 # {'name': u'Mike', 15 # 'email': 'foo@bar.com'}
- Implicit Field Creation
隐式字段创建,当模型有太多的参数,为每一个参数指定字段会是一件很重复的工作,特别是很多参数本身就是python内置数据类型
schema允许指定创建隐式字段,marshmallow会根据参数的类型选择合适的字段类型
1 class UserSchema(Schema): 2 uppername = fields.Function(lambda obj: obj.name.upper()) 3 4 class Meta: 5 fields = ("name", "email", "created_at", "uppername") 6 7 class User(object): 8 def __init__(self, name, email): 9 self.name = name 10 self.email = email 11 self.created_at = dt.datetime.now() 12 13 user = User(name="Mike", email="mike@example.com") 14 try: 15 pprint(UserSchema().dump(user)) 16 except ValidationError as err: 17 print(err.messages) 18 19 {'created_at': '2019-11-14T13:45:38.350031', 20 'email': 'mike@example.com', 21 'name': 'Mike', 22 'uppername': 'MIKE'}
注意:如果想指定除了显示声明的字段外还要包括哪些字段名,可以设置additional。
下面的schema和上面的等同
1 class UserSchema(Schema): 2 uppername = fields.Function(lambda obj: obj.name.upper()) 3 4 class Meta: 5 # No need to include 'uppername' 6 additional = ("name", "email", "created_at")
- Ordering Output
为了保持字段的顺序,可以设置ordered=True,Marshmallow会将对象序列化为collections.OrderedDict对象
1 class UserSchema(Schema): 2 uppername = fields.Function(lambda obj: obj.name.upper()) 3 4 class Meta: 5 additional = ("name", "email", "created_at") 6 ordered = True 7 8 class User(object): 9 def __init__(self, name, email): 10 self.name = name 11 self.email = email 12 self.created_at = dt.datetime.now() 13 14 user = User(name="Mike", email="mike@example.com") 15 result = UserSchema().dump(user) 16 assert isinstance(result, OrderedDict) 17 pprint(result) 18 19 OrderedDict([('uppername', 'MIKE'), 20 ('name', 'Mike'), 21 ('email', 'mike@example.com'), 22 ('created_at', '2019-11-14T14:02:29.680152')])
参考:https://marshmallow.readthedocs.io/en/stable/quickstart.html