Python------序列化pickleVSjson

对python对象进行序列化：把变量从内存中变成可存储或者传输的过程称之为序列化
反序列化：把序列化的内容从序列化的对象中重新读到内存里，称之为反序列化

-------------------------------------------------------------------------------
序列化:
打开两个Python Shell窗口，分别定义
>>> shell=1
>>> shell=2

1========================================保存文件到序列化文件中=======================================
>>> shell
1

>>> entry = {}
>>> entry['title'] = 'Dive into history, 2009 edition'
>>> entry['article_link'] = 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
>>> entry['comments_link'] = None
>>> entry['internal_id'] = b'\xDE\xD5\xB4\xF8'
>>> entry['tags'] = ('diveintopython', 'docbook', 'html')
>>> entry['published'] = True
>>> import time
>>> entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')
>>> entry['published_date']
time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)
这里建立了一个Python的目录，在这个entry中包含了不同类型是数据，用来测试pickle模块

-------将该entry保存到文件中----------------
>>> shell
1 #在Python Shell 1

>>> import pickle
>>> with open('entry.pickle','wb') as f:
... pickle.dump(entry,f)
...

2.====================================从序列化文件中加载数据====================================
使用第二个Python Shell窗口来加载
>>> shell
2

>>>entry
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'entry' is not defined
表示此时不存在entry

>>> import pickle
>>> with open('entry.pickle','rb') as f: #打开在Shell1中创建的entry.pickle
... entry=pickle.load(f)
...
>>> entry
{'comments_link': None,
'internal_id': b'\xDE\xD5\xB4\xF8',
'title': 'Dive into history, 2009 edition',
'tags': ('diveintopython', 'docbook', 'html'),
'article_link':
'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86,
tm_isdst=-1),
'published': True}
这样就实现了反序列化，读出了我们在Shell 1中序列化的entry数据

pickle.dump()和pickle.load()方法实现将python对象进行序列化和反序列化
3，=================================================不用文件进行序列化=============================
你也可以将python对象序列化到内存中bytes对象

>>> shell
1
>>> b = pickle.dumps(entry) ①
>>> type(b) ②
<class 'bytes'>
>>> entry3 = pickle.loads(b) ③
>>> entry3 == entry ④
True

注意：pickle协议在不断升级，新版本的协议兼容旧版本的协议

如何查看自己的pickle协议？
>>>shell
1
>>>
>>> import pickletools
>>> with open('entry.pickle','rb') as f:
... pickletools.dis(f)
0: \x80 PROTO 3
2: } EMPTY_DICT
3: q BINPUT 0
5: ( MARK
6: X BINUNICODE 'published_date'
25: q BINPUT 1
27: c GLOBAL 'time struct_time'
45: q BINPUT 2
47: ( MARK
48: M BININT2 2009
51: K BININT1 3
53: K BININT1 27
55: K BININT1 22
57: K BININT1 20
59: K BININT1 42
61: K BININT1 4
63: K BININT1 86
65: J BININT -1
70: t TUPLE (MARK at 47)
71: q BINPUT 3
73: } EMPTY_DICT
74: q BINPUT 4
76: \x86 TUPLE2
77: q BINPUT 5
79: R REDUCE
80: q BINPUT 6
82: X BINUNICODE 'comments_link'
100: q BINPUT 7
102: N NONE
103: X BINUNICODE 'internal_id'
119: q BINPUT 8
121: C SHORT_BINBYTES 'ÞÕ´ø'
127: q BINPUT 9
129: X BINUNICODE 'tags'
138: q BINPUT 10
140: X BINUNICODE 'diveintopython'
159: q BINPUT 11
161: X BINUNICODE 'docbook'
173: q BINPUT 12
175: X BINUNICODE 'html'
184: q BINPUT 13
186: \x87 TUPLE3
187: q BINPUT 14
189: X BINUNICODE 'title'
199: q BINPUT 15
201: X BINUNICODE 'Dive into history, 2009 edition'
237: q BINPUT 16
239: X BINUNICODE 'article_link'
256: q BINPUT 17
258: X BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
337: q BINPUT 18
339: X BINUNICODE 'published'
353: q BINPUT 19
355: \x88 NEWTRUE
356: u SETITEMS (MARK at 5)
357: . STOP
highest protocol among opcodes = 3
最后一行展示的本文件被保存时所使用的pickle协议版本
在pickle协议上没有明确标明版本号，需要在序列化的数据中找到标记（opcodes），pickletools.dis()完成的正是这项工作，并且打印了所有分
解结果

下面一个方法仅仅打印出pickle的版本号而没有其他信息
pickleversion.py:
import pickletools

def protocol_version(file_object):
maxproto=-1
for opcode,arg,pos in pickletools.genops(file_object):
maxproto=max(maxproto,opcode.proto)
return maxproto

>>> import pickleversion
>>> with open('entry.pickle','rb') as f:
... v=pickleversion.protocol_version(f)

>>> v
3

======================================================将文件保存到一个json文件中========================
>>> shell
1
>>> basic_entry = {}
>>> basic_entry['id'] = 256
>>> basic_entry['title'] = 'Dive into history, 2009 edition'
>>> basic_entry['tags'] = ('diveintopython', 'docbook', 'html')
>>> basic_entry['published'] = True
>>> basic_entry['comments_link'] = None
>>> import json
>>> with open('basic.json', mode='w', encoding='utf-8') as f:
... json.dump(basic_entry, f)

打开basic.json后可看到：
{"published": true, "tags": ["diveintopython", "docbook", "html"], "comments_link": null,
"id": 256, "title": "Dive into history, 2009 edition"}

json是一个以文本方式存储，这就意味着打开该文件的模式只能是文本（mode='w'）并且明确该文件的编码形式，始终使用utf-8准没错
json可以允许包含任意数量的空格以使得json文件的可阅读性，因此在序列化的时候可加入一个字段，使得json文件具有更好的阅读性

>>> shell
1
>>> with open('basic-pretty.json', mode='w', encoding='utf-8') as f:
... json.dump(basic_entry, f, indent=2)
可以注意到，在dump()方法的后面，加入一个indent属性，值为0时意味着每个字段都在自己的那一行，值大于0意味着可以得到更加可读性的文件

查看basic.json可以得到：
{
"published": true,
"tags": [
"diveintopython",
"docbook",
"html"
],
"comments_link": null,
"id": 256,
"title": "Dive into history, 2009 edition"
}

========================================将Python的数据类型映射到JSON======================================

json没有与python中元组（tuple）和bytes相对应的类型

----------------序列化json不支持的数据类型--------------
虽然json不支持bytes类型，但这并不意味着我们无法使用json对bytes类型进行序列化，json提供了可扩展性的“钩子”进行编码和解码
如果要编码json不支持的字节或其他数据类型，则需要为这些类型提供自定义编码器和解码器。

>>> shell
1
>>> entry ①
{'comments_link': None,
'internal_id': b'\xDE\xD5\xB4\xF8',
'title': 'Dive into history, 2009 edition',
'tags': ('diveintopython', 'docbook', 'html'),
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86,
tm_isdst=-1),
'published': True}
>>> import json
>>> with open('entry.json', 'w', encoding='utf-8') as f: ②
... json.dump(entry, f) ③
...
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "C:\Python31\lib\json\__init__.py", line 178, in dump
for chunk in iterable:
File "C:\Python31\lib\json\encoder.py", line 408, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "C:\Python31\lib\json\encoder.py", line 382, in _iterencode_dict
for chunk in chunks:
File "C:\Python31\lib\json\encoder.py", line 416, in _iterencode
o = _default(o)
File "C:\Python31\lib\json\encoder.py", line 170, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'\xDE\xD5\xB4\xF8' is not JSON serializable

如果我们通过正常流程对包含bytes的entry进行用json序列化，显然是通过不了的，我们可以注意到，最后一行提示b'\xDE\xD5\xB4\xF8' is not
JSON serializable

如果该bytes尤为重要，那么我们就需要定义我们自己的序列化格式
customserializer.py:

def to_json(python_object):
if isinstance(python_object,bytes):
return {'__class__':'bytes',
'__value__':list(python_object)}
raise TypeError(repr(python_object)+'is not JSON serializable')

进行序列化操作后发现同样会因为time.struct_time而报TypeError错误
我们更新customserializer.py为：
import time

def to_json(python_object):
if isinstance(python_object, time.struct_time):
return {'__class__': 'time.asctime',
'__value__': time.asctime(python_object)}
if isinstance(python_object, bytes):
return {'__class__': 'bytes',
'__value__': list(python_object)}
raise TypeError(repr(python_object) + ' is not JSON serializable')

这里使用time.asctime()将其转换为一个string

>>> shell
1
>>> with open('entry.json', 'w', encoding='utf-8') as f:
... json.dump(entry, f, default=customserializer.to_json)
...
在json.dump()方法中加入default字段，标识可采用的序列化转换器，这样就可以完成json不支持类型的序列化

==========================================从json文件中加载数据======================
和pickle模板一样，json模板也包含一个load()方法
>>> shell
2
>>> del entry
>>> entry
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'entry' is not defined
>>> import json
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry = json.load(f)
...
>>> entry
{'comments_link': None,
'internal_id': {'__class__': 'bytes', '__value__': [222, 213, 180, 248]},
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': {'__class__': 'time.asctime', '__value__': 'Fri Mar 27 22:20:42 2009'},
'published': True}

仔细观察，我们会发现internal_id字段和published_date被还原为字典类型，这与我们序列化之前不符（之前的internal_id为bytes类型，
published_date也不为此种类型）

这是因为，json.load()方法并不知道在序列化过程（json.dump()）使用了什么转换方法，这个时候需要一个和to_json()有相反功能的一个方法
，把序列化后的数据还原为初始数据
这里我们更新一下customserializer.py，加入from_json()方法：
def from_json(json_object):
if '__class__' in json_object:
if json_object['__class__'] == 'time.asctime':
return time.strptime(json_object['__value__'])
if json_object['__class__'] == 'bytes':
return bytes(json_object['__value__'])
return json_object

再进行反序列化：
>>> shell
2
>>> import customserializer
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry = json.load(f, object_hook=customserializer.from_json)
...
>>> entry
{'comments_link': None,
'internal_id': b'\xDE\xD5\xB4\xF8',
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86,
tm_isdst=-1),
'published': True}

注意在json.load()f方法的后面加入object_hook=customserializer.from_json(区别default=customserializer.to_json)

==========================注意==========================
>>> shell
1
>>> import customserializer
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry2 = json.load(f, object_hook=customserializer.from_json)
...
>>> entry2 == entry ①
False
>>> entry['tags'] ②
('diveintopython', 'docbook', 'html')
>>> entry2['tags'] ③
['diveintopython', 'docbook', 'html']

尽管我们借助了to_json()进行序列化,借助了from_json()进行了反序列化，但是我们和原始数据还是有一些出入。
由②③可以看出，这是因为json并不能区分开tuple和list，它只有一个和list类似的array，json模块在序列化过程中悄悄地将tuple和list转为
json的array类型。
对于大多数用户来说可以忽略tuple和list的区别，但是要始终记住这样的差别。

posted @ 2017-09-06 21:09 一十五画生阅读(213) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

一十五画生

Python------序列化pickleVSjson

公告