python@pickle@joblib@序列化和反序列化@joblib导入失败问题

文章目录

pickle

pickle — Python 对象序列化 — Python 文档

Pickle是Python中的一个模块，用于将Python中的对象序列化为二进制流并保存到文件中，或者从文件中读取出这些二进制数据并反序列化成对象。这个过程也被称为对象的持久化。

Pickle可以处理几乎所有Python数据类型，包括数字、字符串、列表、元组、字典、类和函数等。使用Pickle可以非常方便地将Python对象存储到文件中，以便于在以后的时间中使用。

在使用Pickle时，通常需要先将Python对象序列化为二进制数据并写入文件中，可以使用pickle.dump()函数完成这个任务。读取文件中的数据并将其反序列化成Python对象，可以使用pickle.load()函数。

需要注意的是，Pickle可以处理Python中的大多数对象，但并不是所有对象都可以被序列化。例如，Pickle不能序列化一些网络连接、文件句柄和进程等系统资源，也不能序列化一些Python内部对象，如函数、模块和类等。此外，在使用Pickle时，需要注意安全问题，避免不受信任的数据对系统造成风险。

以下是一个使用Pickle存储和读取Python对象的示例代码：

 import pickle
 
# 定义一个 Python 对象
data = {'a': [1, 2.0, 3, 4+6j],
        'b': ("string", u"Unicode string"),
        'c': {None, True, False}}
 
# 将对象序列化为二进制数据并写入文件
with open('data.pkl', 'wb') as f:
    pickle.dump(data, f)
 
# 从文件中读取数据并反序列化成 Python 对象
with open('data.pkl', 'rb') as f:
    data_loaded = pickle.load(f)
 
# 打印反序列化后的 Python 对象
print(data_loaded)

这个示例代码中，首先定义了一个Python对象data，其中包含了数字、字符串、列表和字典等数据类型。然后，使用Pickle将data对象序列化为二进制数据并写入到文件data.pkl中。最后，再从data.pkl文件中读取数据并反序列化成Python对象data_loaded，并打印该对象以验证反序列化操作的正确性。

 [(SVC(C=10, gamma=0.001),
  {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'},
  0.9381835473133618),
 (RandomForestClassifier(max_depth=7, max_features=0.5, n_estimators=40),
  {'max_depth': 7,
   'max_features': 0.5,
   'min_samples_leaf': 1,
   'min_samples_split': 2,
   'n_estimators': 40},
  0.8854018069424631),
 (GradientBoostingClassifier(learning_rate=0.3, max_depth=7, subsample=0.7),
  {'learning_rate': 0.3,
   'max_depth': 7,
   'max_features': None,
   'min_samples_leaf': 1,
   'min_samples_split': 2,
   'n_estimators': 100,
   'subsample': 0.7},
  0.9476937708036139),
 (KNeighborsClassifier(n_neighbors=3, p=1, weights='distance'),
  {'n_neighbors': 3, 'p': 1, 'weights': 'distance'},
  0.9320019020446981),
 (MLPClassifier(alpha=0.01, batch_size=512, hidden_layer_sizes=(300,),
                learning_rate='adaptive', max_iter=400),
  {'alpha': 0.01,
   'batch_size': 512,
   'hidden_layer_sizes': (300,),
   'learning_rate': 'adaptive',
   'max_iter': 400},
  0.9358059914407989),
 (BaggingClassifier(max_features=0.5, n_estimators=50),
  {'max_features': 0.5, 'max_samples': 1.0, 'n_estimators': 50},
  0.9210651450309082)]

基本用法

 import pickle
 
# 定义一个Python对象
data = {'name': 'Alice', 'age': 25, 'city': 'New York'}
 
# 将对象序列化并保存到磁盘
with open('data.pkl', 'wb') as f:
    pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
 
# 从磁盘中加载序列化的数据并反序列化为Python对象
with open('data.pkl', 'rb') as f:
    deserialized_data = pickle.load(f)
 
# 打印反序列化后的对象
print(deserialized_data)

包装函数

编写一组可以直接通过指定文件名完成导入导出pickle文件的函数

 def load_pickle_by_name(pickle_file):
    with open(pickle_file, 'rb') as f:
        bclf=pkl.load(f)
    return bclf
 
def dump_pickle_by_name(bclf_objs,pickle_file,tag_time=True):
    # pkl = "bclf.pkl"
    name_fields=pickle_file.split(".")[:-1]
    name="".join(name_fields)
    if tag_time:
        name+=f"@{now_utc_field_str}"
    name+=".pickle"
    with open(name,"wb") as f:
        pkl.dump(bclf_objs,f)

joblib

Joblib: running Python functions as pipeline jobs — joblib .dev0 documentation
joblib的使用比pickle更加简单:
- ```
 from joblib import load,dump
```
- 就可以直接用load(FileName)和dump(FileName)做对象的导入导出工作

基本用法

 from joblib import dump, load
 
# 定义一个Python对象
data = {'name': 'Alice', 'age': 25, 'city': 'New York'}
 
# 将对象序列化并保存到磁盘
dump(data, 'data.joblib')
 
# 从磁盘中加载序列化的数据并反序列化为Python对象
deserialized_data = load('data.joblib')
 
# 打印反序列化后的对象
print(deserialized_data)

小结

Pickle和Joblib都是Python中用于序列化（将Python对象转换为字节流）和反序列化（将字节流转换回Python对象）的库，它们的主要区别在于以下几个方面：
1. 序列化速度：Joblib通常比Pickle更快，因为它使用了一些优化技巧，比如将对象缓存到内存中，以便下次序列化时可以更快地访问。
2. 内存使用：Pickle通常比Joblib使用更少的内存，因为它不需要将整个对象都加载到内存中，而是可以按需读取对象的各个部分。Joblib则会在序列化过程中将整个对象都加载到内存中。
3. 处理大型数据：对于大型数据，Joblib通常更适合，因为它可以将数据分成多个块并将它们并行处理，从而提高效率。而Pickle在处理大型数据时可能会遇到内存限制或性能瓶颈的问题。
4. 兼容性：Pickle是Python的标准库，因此它可以与所有Python对象兼容，并且在Python中使用非常方便。而Joblib则需要安装，且在某些情况下可能无法序列化某些Python对象。
总之，如果你需要快速序列化小型数据并且想要使用标准库，那么Pickle是一个不错的选择。但是，如果你需要处理大型数据或想要更快的序列化速度，那么Joblib可能更适合。

npy文件

npy 文件是 NumPy 的二进制文件格式，用于存储 NumPy 数组对象。它是一种高效的数据存储格式，可以快速地将大型数组保存到硬盘，并在需要时快速地读取和加载数据。

npy 文件保存的是二进制数据，因此它的存储效率比文本文件高，同时也具有更快的读写速度。与其他二进制格式相比，npy 文件具有更好的可移植性，因为它只依赖于 NumPy 库，而不依赖于特定的操作系统或硬件平台。

可以使用 NumPy 库中的 save 和 load 函数来保存和加载 npy 文件。例如，以下代码演示了如何将一个 NumPy 数组保存为 npy 文件，并读取该文件：

 import numpy as np
 
# 创建 NumPy 数组
arr = np.array([1, 2, 3, 4, 5])
 
# 保存为 npy 文件
np.save('my_array.npy', arr)
 
# 从 npy 文件中加载数据
loaded_arr = np.load('my_array.npy')
 
# 打印加载后的数据
print(loaded_arr)  # [1 2 3 4 5]

在上面的代码中，我们首先创建一个 NumPy 数组 arr，然后使用 np.save 函数将其保存到名为 'my_array.npy' 的文件中。接着，我们使用 np.load 函数从文件中加载数据，并将其存储在变量 loaded_arr 中。最后，我们打印加载后的数据，验证它与原始数组相同。

numpy.lib.format — NumPy Manual
numpy.save — NumPy Manual

对象导入问题🎈

以joblib的导入为例

通常导入失败可能和python环境有关

操作系统错误(例如长时间未关机,导致系统内部出现了错误),系统更细也可能导致系统内部错误
如果您使用conda管理python环境,那么conda环境由于某些操作(比如某些包的变动导致的)

相关症状表现为:在环境内安装/卸载某个包会出现(以windows系统为例)

权限错误
dll文件错误
vscode notebook 链接kernel无法成功

二进制对象导入报错(例如joblib导入失败):

 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\condaPythonEnvs\tf210\lib\site-packages\joblib\numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "d:\condaPythonEnvs\tf210\lib\site-packages\joblib\numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "d:\condaPythonEnvs\tf210\lib\pickle.py", line 1212, in load
    dispatch[key[0]](self)
KeyError: 0

解决办法:
1. 重启计算机,看错误是否消失
2. 创建一个新的conda python环境
3. 安装python包的时候尽量选用pip install的方式,至少尽量不要混合使用conda install和pip install

eg:

首先，定义一个名为“wordcount”的函数，功能为统计中文文本中某个关键字出现的次数，函数原型如下：

其中w和txtfile均为字符串。
其次，在存放本次实验材料的文件夹中，利用os.mkdir()创建一个新的文件夹，取名“mydir”；同时，自动识别出以“news_”开头的所有文本文件，将其移动至新建的文件目录“mydir”中(注：需编程自动实现移动文件)。
进一步，利用pickle模块将函数wordcount以及识别出的以“news_”开头的所有文本文件名组合成一个列表，永久保存至文件“wc.pkl”，并存储在文件夹“mydir”中。
最后，再次利用pickle模块将保存在“wc.pkl”中的列表数据载入，获得函数wordcount，并调用wordcount计算四个关键字“中国”、“美国”、“科技”和“芯片”在以“news_”开头的所有文本文件中出现的次数，打印输出，格式参考如下

code

 import os
import shutil
 
path_src = path_string_fix
path_des = path_string_fix+"mydir/"
""" create the dir mydir in the proper source path """
if not os.path.exists(path_des):
    os.mkdir(path_src+"mydir")
""" get the files in the path: """
files_list = os.listdir(path_src)
""" get the files start with news_: """
file_news_list=os.listdir(path_des)[:2]
 
""" move files from source path to destination path:"""
def move_news():    
    for file_name in files_list:
        if file_name.startswith("news_"):
            # print(file_name)
            shutil.move(path_src+file_name,path_des)
""" count the word in specified file """
def wordcount(w,txt_file):
    """the frequency of appearance of word w in the file txt_file(attention ,the txt_file use the absolute path)
    !attention2:the function read files which is encode in gbk,so the open() use the encoding="gbk"(gb18030 is ok too) to read it correctly
    Args:
        w (str): [description]
        txt_file (str): [absolute path]
    """
    # list=[]
    string=""
    with open(txt_file,"r",encoding='gbk') as file_input_stream:
        string= file_input_stream.read()
        # print(string)
    return string.count(w)
# print(wordcount("t",path_src+"log.txt"))
""" use(experience the serialize module pickle too store(dump) and use the object serialized:) """
 
def pickle_deal():
    # obj_list=obj_list
    with open(path_des+"wc.pkl","wb") as file_output_stream:
        pickle.dump((wordcount,file_news_list),file_output_stream)
    with open(path_des+"wc.pkl","rb") as file_input_stream:
        return pickle.load(file_input_stream)
# print(obj_list)
 
def print_head(word_list):
	#to format the head print:	
    for i in [""]+word_list:
        print(i.center(20),end="")
    print()
 
def print_result(word_list):
    print_head(word_list)
    for file in obj_list[1]:
        print(file.center(20),end="")
        file_full_path=path_des+file
        for word in word_list:
            frequency=0
            frequency=wordcount(word,file_full_path)
            frequency=str(frequency).center(20)
            print(frequency,end="")
        print()
        
word_list=["中国","美国","科技","芯片"]
move_news()
obj_list=pickle_deal()
"get the function from pickled file"
wordcount=obj_list[0]
print_result(word_list)

posted @ 2021-04-08 18:31 xuchaoxin1375 阅读(20) 评论(0) 编辑收藏举报来源

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· python@import_package导包@导入其他目录中的模块包.模块PYTHONPATH(os,sys)模块@相对导入和绝对

· python@模块和脚本@module@script@package_import

· Python——第五章：pickle模块

· Python | import pickle模块的使用

· Python pickle

阅读排行：
· 全程不用写代码，我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了，比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15：你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」

公告

昵称： xuchaoxin1375
园龄： 4年10个月
粉丝： 1
关注： 0

+加关注

2025年3月

日

一

二

三

四

五

六

xuchaoxin1375

python@pickle@joblib@序列化和反序列化@joblib导入失败问题

文章目录

pickle

相关api

基本用法

包装函数

joblib

基本用法

小结

npy文件

对象导入问题🎈

eg:

code

公告

搜索

常用链接

随笔档案

阅读排行榜

推荐排行榜

	import pickle

	# 定义一个 Python 对象
	data = {'a': [1, 2.0, 3, 4+6j],
	'b': ("string", u"Unicode string"),
	'c': {None, True, False}}

	# 将对象序列化为二进制数据并写入文件
	with open('data.pkl', 'wb') as f:
	pickle.dump(data, f)

	# 从文件中读取数据并反序列化成 Python 对象
	with open('data.pkl', 'rb') as f:
	data_loaded = pickle.load(f)

	# 打印反序列化后的 Python 对象
	print(data_loaded)

	[(SVC(C=10, gamma=0.001),
	{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'},
	0.9381835473133618),
	(RandomForestClassifier(max_depth=7, max_features=0.5, n_estimators=40),
	{'max_depth': 7,
	'max_features': 0.5,
	'min_samples_leaf': 1,
	'min_samples_split': 2,
	'n_estimators': 40},
	0.8854018069424631),
	(GradientBoostingClassifier(learning_rate=0.3, max_depth=7, subsample=0.7),
	{'learning_rate': 0.3,
	'max_depth': 7,
	'max_features': None,
	'min_samples_leaf': 1,
	'min_samples_split': 2,
	'n_estimators': 100,
	'subsample': 0.7},
	0.9476937708036139),
	(KNeighborsClassifier(n_neighbors=3, p=1, weights='distance'),
	{'n_neighbors': 3, 'p': 1, 'weights': 'distance'},
	0.9320019020446981),
	(MLPClassifier(alpha=0.01, batch_size=512, hidden_layer_sizes=(300,),
	learning_rate='adaptive', max_iter=400),
	{'alpha': 0.01,
	'batch_size': 512,
	'hidden_layer_sizes': (300,),
	'learning_rate': 'adaptive',
	'max_iter': 400},
	0.9358059914407989),
	(BaggingClassifier(max_features=0.5, n_estimators=50),
	{'max_features': 0.5, 'max_samples': 1.0, 'n_estimators': 50},
	0.9210651450309082)]

	import pickle

	# 定义一个Python对象
	data = {'name': 'Alice', 'age': 25, 'city': 'New York'}

	# 将对象序列化并保存到磁盘
	with open('data.pkl', 'wb') as f:
	pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)

	# 从磁盘中加载序列化的数据并反序列化为Python对象
	with open('data.pkl', 'rb') as f:
	deserialized_data = pickle.load(f)

	# 打印反序列化后的对象
	print(deserialized_data)

	def load_pickle_by_name(pickle_file):
	with open(pickle_file, 'rb') as f:
	bclf=pkl.load(f)
	return bclf

	def dump_pickle_by_name(bclf_objs,pickle_file,tag_time=True):
	# pkl = "bclf.pkl"
	name_fields=pickle_file.split(".")[:-1]
	name="".join(name_fields)
	if tag_time:
	name+=f"@{now_utc_field_str}"
	name+=".pickle"
	with open(name,"wb") as f:
	pkl.dump(bclf_objs,f)

	from joblib import dump, load

	# 定义一个Python对象
	data = {'name': 'Alice', 'age': 25, 'city': 'New York'}

	# 将对象序列化并保存到磁盘
	dump(data, 'data.joblib')

	# 从磁盘中加载序列化的数据并反序列化为Python对象
	deserialized_data = load('data.joblib')

	# 打印反序列化后的对象
	print(deserialized_data)

	import numpy as np

	# 创建 NumPy 数组
	arr = np.array([1, 2, 3, 4, 5])

	# 保存为 npy 文件
	np.save('my_array.npy', arr)

	# 从 npy 文件中加载数据
	loaded_arr = np.load('my_array.npy')

	# 打印加载后的数据
	print(loaded_arr) # [1 2 3 4 5]

	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	File "d:\condaPythonEnvs\tf210\lib\site-packages\joblib\numpy_pickle.py", line 587, in load
	obj = _unpickle(fobj, filename, mmap_mode)
	File "d:\condaPythonEnvs\tf210\lib\site-packages\joblib\numpy_pickle.py", line 506, in _unpickle
	obj = unpickler.load()
	File "d:\condaPythonEnvs\tf210\lib\pickle.py", line 1212, in load
	dispatch[key[0]](self)
	KeyError: 0

	import os
	import shutil

	path_src = path_string_fix
	path_des = path_string_fix+"mydir/"
	""" create the dir mydir in the proper source path """
	if not os.path.exists(path_des):
	os.mkdir(path_src+"mydir")
	""" get the files in the path: """
	files_list = os.listdir(path_src)
	""" get the files start with news_: """
	file_news_list=os.listdir(path_des)[:2]

	""" move files from source path to destination path:"""
	def move_news():
	for file_name in files_list:
	if file_name.startswith("news_"):
	# print(file_name)
	shutil.move(path_src+file_name,path_des)
	""" count the word in specified file """
	def wordcount(w,txt_file):
	"""the frequency of appearance of word w in the file txt_file(attention ,the txt_file use the absolute path)
	!attention2:the function read files which is encode in gbk,so the open() use the encoding="gbk"(gb18030 is ok too) to read it correctly
	Args:
	w (str): [description]
	txt_file (str): [absolute path]
	"""
	# list=[]
	string=""
	with open(txt_file,"r",encoding='gbk') as file_input_stream:
	string= file_input_stream.read()
	# print(string)
	return string.count(w)
	# print(wordcount("t",path_src+"log.txt"))
	""" use(experience the serialize module pickle too store(dump) and use the object serialized:) """

	def pickle_deal():
	# obj_list=obj_list
	with open(path_des+"wc.pkl","wb") as file_output_stream:
	pickle.dump((wordcount,file_news_list),file_output_stream)
	with open(path_des+"wc.pkl","rb") as file_input_stream:
	return pickle.load(file_input_stream)
	# print(obj_list)

	def print_head(word_list):
	#to format the head print:
	for i in [""]+word_list:
	print(i.center(20),end="")
	print()

	def print_result(word_list):
	print_head(word_list)
	for file in obj_list[1]:
	print(file.center(20),end="")
	file_full_path=path_des+file
	for word in word_list:
	frequency=0
	frequency=wordcount(word,file_full_path)
	frequency=str(frequency).center(20)
	print(frequency,end="")
	print()

	word_list=["中国","美国","科技","芯片"]
	move_news()
	obj_list=pickle_deal()
	"get the function from pickled file"
	wordcount=obj_list[0]
	print_result(word_list)