Windows下使用Fortran读取HDF5文件
需要用Fortran读取HDF5格式的GPM IMERG卫星降水文件,在已经安装HDF5库(参见VS2019+ Intel Fortran (oneAPI)+HDF5库的安装+测试 - chinagod - 博客园)的基础上,实现了上述功能。
下面是简要步骤:
环境:
Windows10 64位
Visual Studio 2019 Community Edition
Intel oneAPI 2021
HDF5-1.12.2
1. 读取HDF5 文件之前,需要先了解HDF5文件的结构,这里使用h5ls命令行工具
C:\Users\jiangleads>h5ls -r Z:\data\GPM\30m\2021\01\3B-HHR.MS.MRG.3IMERG.20210101-S000000-E002959.0000.V06B.HDF5 / Group /Grid Group /Grid/HQobservationTime Dataset {1/Inf, 3600, 1800} /Grid/HQprecipSource Dataset {1/Inf, 3600, 1800} /Grid/HQprecipitation Dataset {1/Inf, 3600, 1800} /Grid/IRkalmanFilterWeight Dataset {1/Inf, 3600, 1800} /Grid/IRprecipitation Dataset {1/Inf, 3600, 1800} /Grid/lat Dataset {1800} /Grid/lat_bnds Dataset {1800, 2} /Grid/latv Dataset {2} /Grid/lon Dataset {3600} /Grid/lon_bnds Dataset {3600, 2} /Grid/lonv Dataset {2} /Grid/nv Dataset {2} /Grid/precipitationCal Dataset {1/Inf, 3600, 1800} /Grid/precipitationQualityIndex Dataset {1/Inf, 3600, 1800} /Grid/precipitationUncal Dataset {1/Inf, 3600, 1800} /Grid/probabilityLiquidPrecipitation Dataset {1/Inf, 3600, 1800} /Grid/randomError Dataset {1/Inf, 3600, 1800} /Grid/time Dataset {1/Inf} /Grid/time_bnds Dataset {1/Inf, 2}
其中-r表示递归遍历所有的group(参考https://www.cnblogs.com/jiangleads/p/15606798.html相关部分)
2. 因为我们主要需要的是 /Grid/precipitationCal 这个变量,所以需要查看这个变量的主要信息。
使用h5dump工具来查看文件信息
C:\Users\jiangleads>h5dump -H Z:\data\GPM\30m\2021\01\3B-HHR.MS.MRG.3IMERG.20210101-S040000-E042959.0240.V06B.HDF5 HDF5 "Z:\data\GPM\30m\2021\01\3B-HHR.MS.MRG.3IMERG.20210101-S040000-E042959.0240.V06B.HDF5" { GROUP "/" { ... DATASET "precipitationCal" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 1, 3600, 1800 ) / ( H5S_UNLIMITED, 3600, 1800 ) } ATTRIBUTE "CodeMissingValue" { DATATYPE H5T_STRING { STRSIZE 8; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } ... } } }
可以看到precipitationCal变量的类型是H5T_IEEE_F32LE,这将在后面读取文件时候用到
3. 编写代码,读取变量。
示例代码:参考自 使用Fortran+HDF扩展进行HDF文件读写 | Herrera Space
1 program test 2 USE HDF5 ! This module contains all necessary modules 3 implicit none 4 5 CHARACTER filename*100 6 7 ! CHARACTER(LEN=8), PARAMETER :: filename = "dsetf.h5" ! File name 8 CHARACTER(LEN=3) dsetname ! Dataset name 9 10 INTEGER(HID_T) :: file_id ! File identifier 文件句柄 11 INTEGER(HID_T) :: dset_id ! Dataset identifier 变量句柄 12 INTEGER(HID_T) :: grp_id ! Dataset identifier group句柄 13 INTEGER(HID_T) :: dspace_id ! Dataset identifier 只可意会,和维数和大小相关 14 INTEGER :: error ! Error flag - success:0 15 16 real, DIMENSION(:,:,:),allocatable :: prcp_out ! Data buffers 本例中对于大小不定的数据使用可变数组 17 18 INTEGER(HSIZE_T), DIMENSION(3) :: prcp_dims,maxdims 19 20 21 22 filename="Z:/data/GPM/30m/2021/01/3B-HHR.MS.MRG.3IMERG.20210101-S000000-E002959.0000.V06B.HDF5" 23 24 CALL h5open_f(error) ! Initialize FORTRAN interface. 25 26 CALL h5fopen_f (filename, H5F_ACC_RDONLY_F, file_id, error) ! Open an existing file. 27 28 CALL h5gopen_f (file_id, "Grid", grp_id, error) ! Open an existing group. 29 30 CALL h5dopen_f(grp_id, "precipitationCal", dset_id, error) !open dataset 就是变量 31 32 CALL h5dget_space_f(dset_id,dspace_id,error) ! Get the dataspace ID 33 34 CALL h5sget_simple_extent_dims_f(dspace_id, prcp_dims, maxdims, error) 35 36 ! Allocate memory for the array. 37 ALLOCATE(prcp_out(prcp_dims(1),prcp_dims(2),prcp_dims(3))) 38 39 ! Read the dataset. 40 CALL h5dread_f(dset_id, H5T_IEEE_F32LE, prcp_out, prcp_dims, error) 41 42 !! do something 43 !PRINT*,prcp_out(1,1) 44 45 CALL h5dclose_f(dset_id, error) ! Close the dataset. 46 DEALLOCATE(prcp_out) 47 48 CALL h5fclose_f(file_id, error) ! Close the file. 49 CALL h5close_f(error) ! Close FORTRAN interface. 50 51 end program
代码说明,HDF5文件变量读取步骤:
1. 调用h5open_f函数,Fortran接口初始化
2. 调用h5fopen_f函数,打开文件,返回file_id
3. 调用h5gopen_f函数,从已有的file_id中,打开的指定名称的Group,返回grp_id
4. 调用h5dopen_f函数,从已有的grp_id中,打开指定名称的Dataset,返回dset_id
5. 调用h5dget_space_f函数,从已有的dset_id中,获取dataspace的ID
6. 调用h5sget_simple_extent_dims_f函数,返回变量的维度大小到prcp数组
7. 根据返回的变量维度分配数组内存
8. 调用h5dread_f函数,读取变量。其中要输入的参数有,dset_id,变量类型(这个在前面用h5dump -H命令输出的DATATYPE中已返回),变量维度,错误码等。
...进行其他步骤
9.完成其他操作后,按顺序依次调用h5dclose_f,h5fclose_f,h5close_f函数,依次关闭dataset,file,和fortran接口。
注意:HDF5有两种存储方式,数据集(Dataset)和数组(Array),如果要读取的变量存放在Array而不是Dataset中,则使用h5a前缀的函数进行操作。
后记:使用HDF5-fortran处理文件时,如果进程处理过的文件(包括已经关闭了的文件)数目超过默认系统限制(在Linux系统中一般是1024个)时,则会报错(Windows系统上也有类似错误):
HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0: #000: H5F.c line 793 in H5Fopen(): unable to open file major: File accessibility minor: Unable to open file #001: H5VLcallback.c line 3500 in H5VL_file_open(): open failed major: Virtual Object Layer minor: Can't open object #002: H5VLcallback.c line 3465 in H5VL__file_open(): open failed major: Virtual Object Layer minor: Can't open object #003: H5VLnative_file.c line 100 in H5VL__native_file_open(): unable to open file major: File accessibility minor: Unable to open file #004: H5Fint.c line 1564 in H5F_open(): unable to open file: name = '/data/GPM/30m/2019/04/3B-HHR.MS.MRG.3IMERG.20190422-S083000-E085959.0510.V06B.HDF5', tent_flags = 0 major: File accessibility minor: Unable to open file #005: H5FD.c line 741 in H5FD_open(): open failed major: Virtual File Layer minor: Unable to initialize object #006: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = '/data/GPM/30m/2019/04/3B-HHR.MS.MRG.3IMERG.20190422-S083000-E085959.0510.V06B.HDF5', errno = 24, error message = 'Too many open files', flags = 0, o_flags = 0 major: File accessibility minor: Unable to open file
这个是一个Bug。参见:Virtual datasets and open file limit - HDF5 - HDF Forum,netcdf貌似也有类似的错误(Re: [netcdf-hdf] 'Too many open files' What causes this?)(#20220912更新 用eccodes打开grib文件也有类似错误。 解决办法 ulimit -n 文件数)
这时候,需要设置ulimit的值,将其调大。参见How to fix 'Too Many Open Files' in Linux,Linux - ulimit命令详解与修改不生效_Yonself的博客-CSDN博客_修改ulimit 未生效
由此可见,从HDF5文件中读取变量和从GRIB、netCDF文件中读取变量的过程十分类似,基本都遵循“打开文件,获得文件句柄” => “调用函数,选定特定的变量,获取变量句柄/索引" => "调用函数,读取变量" => "依次关闭变量句柄、文件句柄" 这个流程。