Python------序列化pickleVSjson

对python对象进行序列化:把变量从内存中变成可存储或者传输的过程称之为序列化
反序列化:把序列化的内容从序列化的对象中重新读到内存里,称之为反序列化

-------------------------------------------------------------------------------
序列化:
打开两个Python Shell窗口,分别定义
>>> shell=1
>>> shell=2

1========================================保存文件到序列化文件中=======================================
>>> shell
1

>>> entry = {}
>>> entry['title'] = 'Dive into history, 2009 edition'
>>> entry['article_link'] = 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
>>> entry['comments_link'] = None
>>> entry['internal_id'] = b'\xDE\xD5\xB4\xF8'
>>> entry['tags'] = ('diveintopython', 'docbook', 'html')
>>> entry['published'] = True
>>> import time
>>> entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')
>>> entry['published_date']
time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)
这里建立了一个Python的目录,在这个entry中包含了不同类型是数据,用来测试pickle模块

-------将该entry保存到文件中----------------
>>> shell
1 #在Python Shell 1

>>> import pickle
>>> with open('entry.pickle','wb') as f:
... pickle.dump(entry,f)
...

2.====================================从序列化文件中加载数据====================================
使用第二个Python Shell窗口来加载
>>> shell
2

>>>entry
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'entry' is not defined
表示此时不存在entry

>>> import pickle
>>> with open('entry.pickle','rb') as f: #打开在Shell1中创建的entry.pickle
... entry=pickle.load(f)
...
>>> entry
{'comments_link': None,
'internal_id': b'\xDE\xD5\xB4\xF8',
'title': 'Dive into history, 2009 edition',
'tags': ('diveintopython', 'docbook', 'html'),
'article_link':
'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86,
tm_isdst=-1),
'published': True}
这样就实现了反序列化,读出了我们在Shell 1中序列化的entry数据

pickle.dump()和pickle.load()方法实现将python对象进行序列化和反序列化
3,=================================================不用文件进行序列化=============================
你也可以将python对象序列化到内存中bytes对象

>>> shell
1
>>> b = pickle.dumps(entry) ①
>>> type(b) ②
<class 'bytes'>
>>> entry3 = pickle.loads(b) ③
>>> entry3 == entry ④
True


注意:pickle协议在不断升级,新版本的协议兼容旧版本的协议


如何查看自己的pickle协议?
>>>shell
1
>>>
>>> import pickletools
>>> with open('entry.pickle','rb') as f:
... pickletools.dis(f)
0: \x80 PROTO 3
2: } EMPTY_DICT
3: q BINPUT 0
5: ( MARK
6: X BINUNICODE 'published_date'
25: q BINPUT 1
27: c GLOBAL 'time struct_time'
45: q BINPUT 2
47: ( MARK
48: M BININT2 2009
51: K BININT1 3
53: K BININT1 27
55: K BININT1 22
57: K BININT1 20
59: K BININT1 42
61: K BININT1 4
63: K BININT1 86
65: J BININT -1
70: t TUPLE (MARK at 47)
71: q BINPUT 3
73: } EMPTY_DICT
74: q BINPUT 4
76: \x86 TUPLE2
77: q BINPUT 5
79: R REDUCE
80: q BINPUT 6
82: X BINUNICODE 'comments_link'
100: q BINPUT 7
102: N NONE
103: X BINUNICODE 'internal_id'
119: q BINPUT 8
121: C SHORT_BINBYTES 'ÞÕ´ø'
127: q BINPUT 9
129: X BINUNICODE 'tags'
138: q BINPUT 10
140: X BINUNICODE 'diveintopython'
159: q BINPUT 11
161: X BINUNICODE 'docbook'
173: q BINPUT 12
175: X BINUNICODE 'html'
184: q BINPUT 13
186: \x87 TUPLE3
187: q BINPUT 14
189: X BINUNICODE 'title'
199: q BINPUT 15
201: X BINUNICODE 'Dive into history, 2009 edition'
237: q BINPUT 16
239: X BINUNICODE 'article_link'
256: q BINPUT 17
258: X BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
337: q BINPUT 18
339: X BINUNICODE 'published'
353: q BINPUT 19
355: \x88 NEWTRUE
356: u SETITEMS (MARK at 5)
357: . STOP
highest protocol among opcodes = 3
最后一行展示的本文件被保存时所使用的pickle协议版本
在pickle协议上没有明确标明版本号,需要在序列化的数据中找到标记(opcodes),pickletools.dis()完成的正是这项工作,并且打印了所有分
解结果

下面一个方法仅仅打印出pickle的版本号而没有其他信息
pickleversion.py:
import pickletools

def protocol_version(file_object):
maxproto=-1
for opcode,arg,pos in pickletools.genops(file_object):
maxproto=max(maxproto,opcode.proto)
return maxproto

 

>>> import pickleversion
>>> with open('entry.pickle','rb') as f:
... v=pickleversion.protocol_version(f)

>>> v
3


======================================================将文件保存到一个json文件中========================
>>> shell
1
>>> basic_entry = {}
>>> basic_entry['id'] = 256
>>> basic_entry['title'] = 'Dive into history, 2009 edition'
>>> basic_entry['tags'] = ('diveintopython', 'docbook', 'html')
>>> basic_entry['published'] = True
>>> basic_entry['comments_link'] = None
>>> import json
>>> with open('basic.json', mode='w', encoding='utf-8') as f:
... json.dump(basic_entry, f)

打开basic.json后可看到:
{"published": true, "tags": ["diveintopython", "docbook", "html"], "comments_link": null,
"id": 256, "title": "Dive into history, 2009 edition"}

json是一个以文本方式存储,这就意味着打开该文件的模式只能是文本(mode='w')并且明确该文件的编码形式,始终使用utf-8准没错
json可以允许包含任意数量的空格以使得json文件的可阅读性,因此在序列化的时候可加入一个字段,使得json文件具有更好的阅读性

>>> shell
1
>>> with open('basic-pretty.json', mode='w', encoding='utf-8') as f:
... json.dump(basic_entry, f, indent=2)
可以注意到,在dump()方法的后面,加入一个indent属性,值为0时意味着每个字段都在自己的那一行,值大于0意味着可以得到更加可读性的文件

查看basic.json可以得到:
{
"published": true,
"tags": [
"diveintopython",
"docbook",
"html"
],
"comments_link": null,
"id": 256,
"title": "Dive into history, 2009 edition"
}

 

========================================将Python的数据类型映射到JSON======================================


json没有与python中元组(tuple)和bytes相对应的类型

----------------序列化json不支持的数据类型--------------
虽然json不支持bytes类型,但这并不意味着我们无法使用json对bytes类型进行序列化,json提供了可扩展性的“钩子”进行编码和解码
如果要编码json不支持的字节或其他数据类型,则需要为这些类型提供自定义编码器和解码器。

>>> shell
1
>>> entry ①
{'comments_link': None,
'internal_id': b'\xDE\xD5\xB4\xF8',
'title': 'Dive into history, 2009 edition',
'tags': ('diveintopython', 'docbook', 'html'),
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86,
tm_isdst=-1),
'published': True}
>>> import json
>>> with open('entry.json', 'w', encoding='utf-8') as f: ②
... json.dump(entry, f) ③
...
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "C:\Python31\lib\json\__init__.py", line 178, in dump
for chunk in iterable:
File "C:\Python31\lib\json\encoder.py", line 408, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "C:\Python31\lib\json\encoder.py", line 382, in _iterencode_dict
for chunk in chunks:
File "C:\Python31\lib\json\encoder.py", line 416, in _iterencode
o = _default(o)
File "C:\Python31\lib\json\encoder.py", line 170, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'\xDE\xD5\xB4\xF8' is not JSON serializable

如果我们通过正常流程对包含bytes的entry进行用json序列化,显然是通过不了的,我们可以注意到,最后一行提示b'\xDE\xD5\xB4\xF8' is not
JSON serializable

如果该bytes尤为重要,那么我们就需要定义我们自己的序列化格式
customserializer.py:

def to_json(python_object):
if isinstance(python_object,bytes):
return {'__class__':'bytes',
'__value__':list(python_object)}
raise TypeError(repr(python_object)+'is not JSON serializable')

进行序列化操作后发现同样会因为time.struct_time而报TypeError错误
我们更新customserializer.py为:
import time

def to_json(python_object):
if isinstance(python_object, time.struct_time):
return {'__class__': 'time.asctime',
'__value__': time.asctime(python_object)}
if isinstance(python_object, bytes):
return {'__class__': 'bytes',
'__value__': list(python_object)}
raise TypeError(repr(python_object) + ' is not JSON serializable')

这里使用time.asctime()将其转换为一个string

>>> shell
1
>>> with open('entry.json', 'w', encoding='utf-8') as f:
... json.dump(entry, f, default=customserializer.to_json)
...
在json.dump()方法中加入default字段,标识可采用的序列化转换器,这样就可以完成json不支持类型的序列化

==========================================从json文件中加载数据======================
和pickle模板一样,json模板也包含一个load()方法
>>> shell
2
>>> del entry
>>> entry
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'entry' is not defined
>>> import json
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry = json.load(f)
...
>>> entry
{'comments_link': None,
'internal_id': {'__class__': 'bytes', '__value__': [222, 213, 180, 248]},
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': {'__class__': 'time.asctime', '__value__': 'Fri Mar 27 22:20:42 2009'},
'published': True}

仔细观察,我们会发现internal_id字段和published_date被还原为字典类型,这与我们序列化之前不符(之前的internal_id为bytes类型,
published_date也不为此种类型)

这是因为,json.load()方法并不知道在序列化过程(json.dump())使用了什么转换方法,这个时候需要一个和to_json()有相反功能的一个方法
,把序列化后的数据还原为初始数据
这里我们更新一下customserializer.py,加入from_json()方法:
def from_json(json_object):
if '__class__' in json_object:
if json_object['__class__'] == 'time.asctime':
return time.strptime(json_object['__value__'])
if json_object['__class__'] == 'bytes':
return bytes(json_object['__value__'])
return json_object


再进行反序列化:
>>> shell
2
>>> import customserializer
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry = json.load(f, object_hook=customserializer.from_json)
...
>>> entry
{'comments_link': None,
'internal_id': b'\xDE\xD5\xB4\xF8',
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86,
tm_isdst=-1),
'published': True}

注意在json.load()f方法的后面加入object_hook=customserializer.from_json(区别default=customserializer.to_json)

 

==========================注意==========================
>>> shell
1
>>> import customserializer
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry2 = json.load(f, object_hook=customserializer.from_json)
...
>>> entry2 == entry ①
False
>>> entry['tags'] ②
('diveintopython', 'docbook', 'html')
>>> entry2['tags'] ③
['diveintopython', 'docbook', 'html']

 

尽管我们借助了to_json()进行序列化,借助了from_json()进行了反序列化,但是我们和原始数据还是有一些出入。
由②③可以看出,这是因为json并不能区分开tuple和list,它只有一个和list类似的array,json模块在序列化过程中悄悄地将tuple和list转为
json的array类型。
对于大多数用户来说可以忽略tuple和list的区别,但是要始终记住这样的差别。

 

posted @ 2017-09-06 21:09  一十五画生  阅读(213)  评论(0编辑  收藏  举报