Windows下使用Fortran读取HDF5文件

需要用Fortran读取HDF5格式的GPM IMERG卫星降水文件，在已经安装HDF5库（参见VS2019+ Intel Fortran (oneAPI)+HDF5库的安装+测试 - chinagod - 博客园）的基础上，实现了上述功能。

下面是简要步骤：

环境：

Windows10 64位

Visual Studio 2019 Community Edition

Intel oneAPI 2021

HDF5-1.12.2

1. 读取HDF5 文件之前，需要先了解HDF5文件的结构，这里使用h5ls命令行工具

C:\Users\jiangleads>h5ls -r Z:\data\GPM\30m\2021\01\3B-HHR.MS.MRG.3IMERG.20210101-S000000-E002959.0000.V06B.HDF5
/                        Group
/Grid                    Group
/Grid/HQobservationTime  Dataset {1/Inf, 3600, 1800}
/Grid/HQprecipSource     Dataset {1/Inf, 3600, 1800}
/Grid/HQprecipitation    Dataset {1/Inf, 3600, 1800}
/Grid/IRkalmanFilterWeight Dataset {1/Inf, 3600, 1800}
/Grid/IRprecipitation    Dataset {1/Inf, 3600, 1800}
/Grid/lat                Dataset {1800}
/Grid/lat_bnds           Dataset {1800, 2}
/Grid/latv               Dataset {2}
/Grid/lon                Dataset {3600}
/Grid/lon_bnds           Dataset {3600, 2}
/Grid/lonv               Dataset {2}
/Grid/nv                 Dataset {2}
/Grid/precipitationCal   Dataset {1/Inf, 3600, 1800}
/Grid/precipitationQualityIndex Dataset {1/Inf, 3600, 1800}
/Grid/precipitationUncal Dataset {1/Inf, 3600, 1800}
/Grid/probabilityLiquidPrecipitation Dataset {1/Inf, 3600, 1800}
/Grid/randomError        Dataset {1/Inf, 3600, 1800}
/Grid/time               Dataset {1/Inf}
/Grid/time_bnds          Dataset {1/Inf, 2}

其中-r表示递归遍历所有的group（参考https://www.cnblogs.com/jiangleads/p/15606798.html相关部分）

2. 因为我们主要需要的是 /Grid/precipitationCal 这个变量，所以需要查看这个变量的主要信息。

使用h5dump工具来查看文件信息

C:\Users\jiangleads>h5dump -H Z:\data\GPM\30m\2021\01\3B-HHR.MS.MRG.3IMERG.20210101-S040000-E042959.0240.V06B.HDF5
HDF5 "Z:\data\GPM\30m\2021\01\3B-HHR.MS.MRG.3IMERG.20210101-S040000-E042959.0240.V06B.HDF5" {
GROUP "/" {
     ...
      DATASET "precipitationCal" {
         DATATYPE  H5T_IEEE_F32LE
         DATASPACE  SIMPLE { ( 1, 3600, 1800 ) / ( H5S_UNLIMITED, 3600, 1800 ) }
         ATTRIBUTE "CodeMissingValue" {
            DATATYPE  H5T_STRING {
               STRSIZE 8;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
     ...
   }
}
}

可以看到precipitationCal变量的类型是H5T_IEEE_F32LE，这将在后面读取文件时候用到

3. 编写代码，读取变量。

示例代码：参考自使用Fortran+HDF扩展进行HDF文件读写 | Herrera Space

 1     program test
 2     USE HDF5 ! This module contains all necessary modules
 3     implicit none
 4     
 5     CHARACTER filename*100
 6 
 7     ! CHARACTER(LEN=8), PARAMETER :: filename = "dsetf.h5" ! File name
 8     CHARACTER(LEN=3) dsetname      ! Dataset name
 9 
10     INTEGER(HID_T) :: file_id       ! File identifier  文件句柄
11     INTEGER(HID_T) :: dset_id       ! Dataset identifier 变量句柄
12     INTEGER(HID_T) :: grp_id        ! Dataset identifier  group句柄
13     INTEGER(HID_T) :: dspace_id     ! Dataset identifier  只可意会，和维数和大小相关
14     INTEGER          :: error       ! Error flag - success：0
15 
16     real, DIMENSION(:,:,:),allocatable :: prcp_out ! Data buffers  本例中对于大小不定的数据使用可变数组
17 
18     INTEGER(HSIZE_T), DIMENSION(3) :: prcp_dims,maxdims
19 
20 
21 
22     filename="Z:/data/GPM/30m/2021/01/3B-HHR.MS.MRG.3IMERG.20210101-S000000-E002959.0000.V06B.HDF5"
23 
24     CALL h5open_f(error) ! Initialize FORTRAN interface.
25 
26     CALL h5fopen_f (filename, H5F_ACC_RDONLY_F, file_id, error) ! Open an existing file.
27 
28     CALL h5gopen_f (file_id, "Grid", grp_id, error)  ! Open an existing group.
29 
30     CALL h5dopen_f(grp_id, "precipitationCal", dset_id, error)  !open dataset 就是变量
31 
32     CALL h5dget_space_f(dset_id,dspace_id,error)  ! Get the dataspace ID
33 
34     CALL h5sget_simple_extent_dims_f(dspace_id, prcp_dims, maxdims, error)
35 
36     ! Allocate memory for the array.
37     ALLOCATE(prcp_out(prcp_dims(1),prcp_dims(2),prcp_dims(3)))
38 
39     ! Read the dataset.
40     CALL h5dread_f(dset_id, H5T_IEEE_F32LE, prcp_out, prcp_dims, error)
41 
42     !! do something
43     !PRINT*,prcp_out(1,1) 
44 
45     CALL h5dclose_f(dset_id, error)   ! Close the dataset.
46     DEALLOCATE(prcp_out)
47 
48     CALL h5fclose_f(file_id, error)      ! Close the file.
49     CALL h5close_f(error)                ! Close FORTRAN interface.
50 
51     end program

main.f90

代码说明，HDF5文件变量读取步骤：

1. 调用h5open_f函数，Fortran接口初始化

2. 调用h5fopen_f函数，打开文件，返回file_id

3. 调用h5gopen_f函数，从已有的file_id中，打开的指定名称的Group，返回grp_id

4. 调用h5dopen_f函数，从已有的grp_id中，打开指定名称的Dataset，返回dset_id

5. 调用h5dget_space_f函数，从已有的dset_id中，获取dataspace的ID

6. 调用h5sget_simple_extent_dims_f函数，返回变量的维度大小到prcp数组

7. 根据返回的变量维度分配数组内存

8. 调用h5dread_f函数，读取变量。其中要输入的参数有，dset_id，变量类型（这个在前面用h5dump -H命令输出的DATATYPE中已返回），变量维度，错误码等。

...进行其他步骤

9.完成其他操作后，按顺序依次调用h5dclose_f，h5fclose_f，h5close_f函数，依次关闭dataset，file，和fortran接口。

注意：HDF5有两种存储方式，数据集(Dataset)和数组(Array)，如果要读取的变量存放在Array而不是Dataset中，则使用h5a前缀的函数进行操作。

后记：使用HDF5-fortran处理文件时，如果进程处理过的文件（包括已经关闭了的文件）数目超过默认系统限制（在Linux系统中一般是1024个）时，则会报错（Windows系统上也有类似错误）：

HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
  #000: H5F.c line 793 in H5Fopen(): unable to open file
    major: File accessibility
    minor: Unable to open file
  #001: H5VLcallback.c line 3500 in H5VL_file_open(): open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 3465 in H5VL__file_open(): open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_file.c line 100 in H5VL__native_file_open(): unable to open file
    major: File accessibility
    minor: Unable to open file
  #004: H5Fint.c line 1564 in H5F_open(): unable to open file: name = '/data/GPM/30m/2019/04/3B-HHR.MS.MRG.3IMERG.20190422-S083000-E085959.0510.V06B.HDF5', tent_flags = 0
    major: File accessibility
    minor: Unable to open file
  #005: H5FD.c line 741 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object  #006: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = '/data/GPM/30m/2019/04/3B-HHR.MS.MRG.3IMERG.20190422-S083000-E085959.0510.V06B.HDF5', errno = 24, error message = 'Too many open files', flags = 0, o_flags = 0    major: File accessibility
    minor: Unable to open file

这个是一个Bug。参见：Virtual datasets and open file limit - HDF5 - HDF Forum，netcdf貌似也有类似的错误（Re: [netcdf-hdf] 'Too many open files' What causes this?）(#20220912更新用eccodes打开grib文件也有类似错误。解决办法 ulimit -n 文件数)

这时候，需要设置ulimit的值，将其调大。参见How to fix 'Too Many Open Files' in Linux，Linux - ulimit命令详解与修改不生效_Yonself的博客-CSDN博客_修改ulimit 未生效

由此可见，从HDF5文件中读取变量和从GRIB、netCDF文件中读取变量的过程十分类似，基本都遵循“打开文件，获得文件句柄” => “调用函数，选定特定的变量，获取变量句柄/索引" => "调用函数，读取变量" => "依次关闭变量句柄、文件句柄" 这个流程。

posted @ 2022-07-08 08:35 chinagod 阅读(1900) 评论(0) 编辑收藏举报

刷新页面返回顶部