python处理hdf5文件
在深度学习任务中,如果将所有数据集都放入一个文件中再进行处理效率会更高。有多种数据模型和库可完成这种操作,例如hdf5。
hdf5 是一种存储相同类型数值的大数组的机制,适用于可被层次性组织且数据集需要被元数据标记的数据模型。
hdf5 files: 能够存储两类数据对象 dataset 和 group 的容器,其操作类似 python 标准的文件操作;File 实例对象本身就是一个组,以 / 为名,是遍历文件的入口
dataset(array-like): 可类比为 Numpy 数组,每个数据集都有一个名字(name)、形状(shape) 和类型(dtype),支持切片操作
group(folder-like): 可以类比为 字典,它是一种像文件夹一样的容器;group 中可以存放 dataset 或者其他的 group,键就是组成员的名称,值就是组成员对象本身(组或者数据集)
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
import h5py import numpy as np def main(): #=========================================================================== # Create a HDF5 file. f = h5py.File("h5py_example.hdf5", "w") # mode = {'w', 'r', 'a'} # Create two groups under root '/'. g1 = f.create_group("bar1") g2 = f.create_group("bar2") # Create a dataset under root '/'. d = f.create_dataset("dset", data=np.arange(16).reshape([4, 4])) # Add two attributes to dataset 'dset' d.attrs["myAttr1"] = [100, 200] d.attrs["myAttr2"] = "Hello, world!" # Create a group and a dataset under group "bar1". c1 = g1.create_group("car1") d1 = g1.create_dataset("dset1", data=np.arange(10)) # Create a group and a dataset under group "bar2". c2 = g2.create_group("car2") d2 = g2.create_dataset("dset2", data=np.arange(10)) # Save and exit the file. f.close() ''' h5py_example.hdf5 file structure +-- '/' | +-- group "bar1" | | +-- group "car1" | | | +-- None | | | | | +-- dataset "dset1" | | | +-- group "bar2" | | +-- group "car2" | | | +-- None | | | | | +-- dataset "dset2" | | | +-- dataset "dset" | | +-- attribute "myAttr1" | | +-- attribute "myAttr2" | | | ''' #=========================================================================== # Read HDF5 file. f = h5py.File("h5py_example.hdf5", "r") # mode = {'w', 'r', 'a'} # Print the keys of groups and datasets under '/'. print(f.filename, ":") print([key for key in f.keys()], "\n") #=================================================== # Read dataset 'dset' under '/'. d = f["dset"] # Print the data of 'dset'. print(d.name, ":") print(d[:]) # Print the attributes of dataset 'dset'. for key in d.attrs.keys(): print(key, ":", d.attrs[key]) print() #=================================================== # Read group 'bar1'. g = f["bar1"] # Print the keys of groups and datasets under group 'bar1'. print([key for key in g.keys()]) # Three methods to print the data of 'dset1'. print(f["/bar1/dset1"][:]) # 1. absolute path print(f["bar1"]["dset1"][:]) # 2. relative path: file[][] print(g['dset1'][:]) # 3. relative path: group[] # Delete a database. # Notice: the mode should be 'a' when you read a file. ''' del g["dset1"] ''' # Save and exit the file f.close() if __name__ == "__main__": #main()
end