预处理后数据的保存与读取
在机器学习中,一般都需要先对数据进行数据预处理工作。模型一般需要反复的调参,因此可能需要多次使用预处理之后的数据,但是反复进行数据的预处理工作是多余的,我们可以将其保存下来。
#用pickle模块将处理好的数据存储成pickle格式,方便以后调用,即建立一个checkpoint # 保存数据方便调用 import os import pickle pickle_file = 'notMNIST.pickle' if not os.path.isfile(pickle_file): #判断是否存在此文件,若无则存储 print('Saving data to pickle file...') try: with open('fan.pickle', 'wb') as pfile: pickle.dump( { 'X_train': X_train, 'X_test': X_test, 'Ytrain': y_train, 'y_test': y_test, }, pfile, pickle.HIGHEST_PROTOCOL) except Exception as e: print('Unable to save data to', pickle_file, ':', e) raise print('Data cached in pickle file.')
#从pickle文件中读取数据 pickle_file = 'pickle.pickle' with open(pickle_file, 'rb') as f: pickle_data = pickle.load(f) # 反序列化,与pickle.dump相反 X_train = pickle_data['X_train'] X_test = pickle_data['X_test'] y_train = pickle_data['y_train'] y_test = pickle_data['y_test'] del pickle_data # 释放内存 print('Data and modules loaded.')