python处理hdf5文件

在深度学习任务中，如果将所有数据集都放入一个文件中再进行处理效率会更高。有多种数据模型和库可完成这种操作，例如hdf5。

hdf5 是一种存储相同类型数值的大数组的机制，适用于可被层次性组织且数据集需要被元数据标记的数据模型。

hdf5 files：能够存储两类数据对象 dataset 和 group 的容器，其操作类似 python 标准的文件操作；File 实例对象本身就是一个组，以 / 为名，是遍历文件的入口
dataset(array-like)：可类比为 Numpy 数组，每个数据集都有一个名字（name）、形状（shape）和类型（dtype），支持切片操作
group(folder-like)：可以类比为字典，它是一种像文件夹一样的容器；group 中可以存放 dataset 或者其他的 group，键就是组成员的名称，值就是组成员对象本身(组或者数据集)

import h5py
import numpy as np

def main():
    #===========================================================================
    # Create a HDF5 file.
    f = h5py.File("h5py_example.hdf5", "w")    # mode = {'w', 'r', 'a'}

    # Create two groups under root '/'.
    g1 = f.create_group("bar1")
    g2 = f.create_group("bar2")

    # Create a dataset under root '/'.
    d = f.create_dataset("dset", data=np.arange(16).reshape([4, 4]))

    # Add two attributes to dataset 'dset'
    d.attrs["myAttr1"] = [100, 200]
    d.attrs["myAttr2"] = "Hello, world!"

    # Create a group and a dataset under group "bar1".
    c1 = g1.create_group("car1")
    d1 = g1.create_dataset("dset1", data=np.arange(10))

    # Create a group and a dataset under group "bar2".
    c2 = g2.create_group("car2")
    d2 = g2.create_dataset("dset2", data=np.arange(10))

    # Save and exit the file.
    f.close()

    ''' h5py_example.hdf5 file structure
    +-- '/'
    |   +--    group "bar1"
    |   |   +-- group "car1"
    |   |   |   +-- None
    |   |   |   
    |   |   +-- dataset "dset1"
    |   |
    |   +-- group "bar2"
    |   |   +-- group "car2"
    |   |   |   +-- None
    |   |   |
    |   |   +-- dataset "dset2"
    |   |   
    |   +-- dataset "dset"
    |   |   +-- attribute "myAttr1"
    |   |   +-- attribute "myAttr2"
    |   |   
    |   
    '''

    #===========================================================================
    # Read HDF5 file.
    f = h5py.File("h5py_example.hdf5", "r")    # mode = {'w', 'r', 'a'}

    # Print the keys of groups and datasets under '/'.
    print(f.filename, ":")
    print([key for key in f.keys()], "\n")  

    #===================================================
    # Read dataset 'dset' under '/'.
    d = f["dset"]

    # Print the data of 'dset'.
    print(d.name, ":")
    print(d[:])

    # Print the attributes of dataset 'dset'.
    for key in d.attrs.keys():
        print(key, ":", d.attrs[key])

    print()

    #===================================================
    # Read group 'bar1'.
    g = f["bar1"]

    # Print the keys of groups and datasets under group 'bar1'.
    print([key for key in g.keys()])

    # Three methods to print the data of 'dset1'.
    print(f["/bar1/dset1"][:])        # 1. absolute path

    print(f["bar1"]["dset1"][:])    # 2. relative path: file[][]

    print(g['dset1'][:])        # 3. relative path: group[]



    # Delete a database.
    # Notice: the mode should be 'a' when you read a file.
    '''
    del g["dset1"]
    '''

    # Save and exit the file
    f.close()

if __name__ == "__main__":
    #main()

View Code

end

posted @ 2022-04-04 20:15 一笑任逍遥阅读(501) 评论(0) 编辑收藏举报

刷新页面返回顶部

一笑任逍遥

python处理hdf5文件

公告