Fortran哈希表
哈希表
参考自https://fortranwiki.org/fortran/show/Hash+tables
哈希表(Hash tables)通常被用于构建这样一个指针数组,数组中的每一个元素指向一个动态的数据类型(如链表或二叉树)。它需要一些哈希函数,用以将相同的关键字映射到指针数组中同样的位置中。
接下来,遍历动态数据类型,或搜索相应的键。如果键存在,则可以检索或覆盖该值。如果键不存在,那么,如果正在查询哈希表,get方法必须指示键值对不存在。如果正在调用put方法,则如果之后的动态数据类型中缺少键值对,则必须将其适当插入。 The Practice of Programming 的第55页有一个很好的介绍性讨论。使用哈希表和适当大的链表数组可以得到O(1)复杂度的查找和放置操作。
哈希表例子
参考https://fortranwiki.org/fortran/show/hash+table+example
有这样一个基于Fortran 2003的 面向对象的 哈希表的模块实现。这个模块是基于LGPL证书授权的。这个模块可以通过使transfer用函数,扩展成泛型化的,或者通过类扩展(type extension),变得专门化。这写方法可以通过类绑定过程被附加到对象中。额外的PRIVATE
声明可以被插入或者被取消初始掉从而用于产品代码;最初始的对象和方法被保留在这里用于测试。这个库和对应的测试程序被用Intel Fortran compiler, version 11.1.046进行编译,程序和库看起来是正确的,但未进行彻底的测试。
代码
Library
以下库是这个库是根据The Practice of Programming中对哈希表数据结构的描述而创建的。
! Module implementing an OO hash table (dictionary) in Fortran 2003. ! Compiles and runs with accompanying test program under the Intel ! Fortran Compiler, version 11.1.046 ! Copyright (c) Izaak Beekman 2010 ! This program is free software: you can redistribute it and/or modify ! it under the terms of the GNU Lesser General Public License as published by ! the Free Software Foundation, either version 3 of the License, or ! (at your option) any later version. ! This program is distributed in the hope that it will be useful, ! but WITHOUT ANY WARRANTY; without even the implied warranty of ! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ! GNU Lesser General Public License for more details. ! You should have received a copy of the GNU Lesser General Public License ! along with this program. If not, see <http://www.gnu.org/licenses/>. MODULE hashtbl IMPLICIT NONE ! Use strong typing INTEGER, PARAMETER :: tbl_size = 50 TYPE sllist TYPE(sllist), POINTER :: child => NULL() CHARACTER(len=:), ALLOCATABLE :: key, val CONTAINS PROCEDURE :: put => put_sll PROCEDURE :: get => get_sll PROCEDURE :: free => free_sll END TYPE sllist TYPE hash_tbl_sll TYPE(sllist), DIMENSION(:), ALLOCATABLE :: vec INTEGER :: vec_len = 0 LOGICAL :: is_init = .FALSE. CONTAINS PROCEDURE :: init => init_hash_tbl_sll PROCEDURE :: put => put_hash_tbl_sll PROCEDURE :: get => get_hash_tbl_sll PROCEDURE :: free => free_hash_tbl_sll END TYPE hash_tbl_sll PUBLIC :: hash_tbl_sll
在这里我们可以发现哈希表是由两种派生类型构建的,以及每一个数据对象具有的类型绑定过程(方法)(注意:销毁Finalization也应该添加到这些对象中,但我不确定截至2010年6月5日编译器是否支持此功能。)。
第一个对象sllist是一个单链表,是一个可分配的标量CHARACTER类型的键和值。(字符串长度可以在运行时分配。)这个类型有3种方法:1.put 将键值对放入(必要时创建)适当的列表元素。2.get 获取与键对应的字符串(值),如果键值对存在。3.free 完整的销毁链接列表。
第二个对象是哈希表。它包含存储一些元数据的元素,如向量长度(对哈希键很重要)和对象的状态,以及单链表数组vec。与此对象相关的方法有:1.init,通过将vec分配为一定长度来初始化对象,并跟踪对象状态和元数据。2.put 放置以哈希键,并将值存储在适当的链接列表的适当元素中,必要时创建它。3.get 如果键值对存在,则获取哈希key并检索关联值。4.free 销毁哈希表,释放内存并跟踪元数据和对象状态。
下面是一个实例化的哈希表对象的示意图,如在这个库中实现的:
具有单链接列表元素的哈希表(图丢失)
模块的其余部分,包括以下所有过程。
CONTAINS RECURSIVE SUBROUTINE put_sll(list,key,val) CLASS(sllist), INTENT(inout) :: list CHARACTER(len=*), INTENT(in) :: key, val INTEGER :: keylen, vallen keylen = LEN(key) vallen = LEN(val) IF (ALLOCATED(list%key)) THEN IF (list%key /= key) THEN IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child) CALL put_sll(list%child,key,val) END IF ELSE IF (.NOT. ALLOCATED(list%key)) & ALLOCATE(CHARACTER(len=keylen) :: list%key) list%key = key IF (ALLOCATED(list%val)) DEALLOCATE(list%val) ALLOCATE(CHARACTER(len=vallen) :: list%val) list%val = val END IF END SUBROUTINE put_sll RECURSIVE SUBROUTINE get_sll(list,key,val) CLASS(sllist), INTENT(in) :: list CHARACTER(len=*), INTENT(in) :: key CHARACTER(len=:), ALLOCATABLE, INTENT(out) :: val INTEGER :: vallen vallen = 0 IF (ALLOCATED(list%key) .AND. (list%key == key)) THEN vallen = LEN(list%val) IF (ALLOCATED(val)) DEALLOCATE(val) ALLOCATE(CHARACTER(len=vallen) :: val) val = list%val ELSE IF(ASSOCIATED(list%child)) THEN ! keep going CALL get_sll(list%child,key,val) ELSE ! At the end of the list, no key found IF (ALLOCATED(val)) DEALLOCATE(val) ! Exit indication RETURN END IF END SUBROUTINE get_sll RECURSIVE SUBROUTINE free_sll(list) CLASS(sllist), INTENT(inout) :: list IF (ASSOCIATED(list%child)) THEN CALL free_sll(list%child) DEALLOCATE(list%child) END IF list%child => NULL() IF (ALLOCATED(list%key)) DEALLOCATE(list%key) IF (ALLOCATED(list%val)) DEALLOCATE(list%val) END SUBROUTINE free_sll
上述过程被绑定到单链表对象。第一个虚参是它绑定到的对象,每当通过对象调用过程时都会自动传递。由于这是一个单链表,我们只能在一个方向上遍历列表,并且我们可以使用直接递归相对简洁地定义与列表相关联的方法。
SUBROUTINE init_hash_tbl_sll(tbl,tbl_len) CLASS(hash_tbl_sll), INTENT(inout) :: tbl INTEGER, OPTIONAL, INTENT(in) :: tbl_len IF (ALLOCATED(tbl%vec)) DEALLOCATE(tbl%vec) IF (PRESENT(tbl_len)) THEN ALLOCATE(tbl%vec(0:tbl_len-1)) tbl%vec_len = tbl_len ELSE ALLOCATE(tbl%vec(0:tbl_size-1)) tbl%vec_len = tbl_size END IF tbl%is_init = .TRUE. END SUBROUTINE init_hash_tbl_sll ! The first part of the hashing procedure using the string ! collating sequence ELEMENTAL FUNCTION sum_string(str) RESULT(sig) CHARACTER(len=*), INTENT(in) :: str INTEGER :: sig CHARACTER, DIMENSION(LEN(str)) :: tmp INTEGER :: i FORALL (i=1:LEN(str)) tmp(i) = str(i:i) END FORALL sig = SUM(ICHAR(tmp)) END FUNCTION sum_string SUBROUTINE put_hash_tbl_sll(tbl,key,val) CLASS(hash_tbl_sll), INTENT(inout) :: tbl CHARACTER(len=*), INTENT(in) :: key, val INTEGER :: hash hash = MOD(sum_string(key),tbl%vec_len) CALL tbl%vec(hash)%put(key=key,val=val) END SUBROUTINE put_hash_tbl_sll SUBROUTINE get_hash_tbl_sll(tbl,key,val) CLASS(hash_tbl_sll), INTENT(in) :: tbl CHARACTER(len=*), INTENT(in) :: key CHARACTER(len=:), ALLOCATABLE, INTENT(out) :: val INTEGER :: hash hash = MOD(sum_string(key),tbl%vec_len) CALL tbl%vec(hash)%get(key=key,val=val) END SUBROUTINE get_hash_tbl_sll SUBROUTINE free_hash_tbl_sll(tbl) CLASS(hash_tbl_sll), INTENT(inout) :: tbl INTEGER :: i, low, high low = LBOUND(tbl%vec,dim=1) high = UBOUND(tbl%vec,dim=1) IF (ALLOCATED(tbl%vec)) THEN DO i=low,high CALL tbl%vec(i)%free() END DO DEALLOCATE(tbl%vec) END IF tbl%is_init = .FALSE. END SUBROUTINE free_hash_tbl_sll END MODULE hashtbl
上面定义的过程对哈希表对象进行操作。由于高度的封装和抽象,实现这些过程相对容易;我们可以使用为单链表定义的方法。
Test Program
这个测试程序使用上面定义的库,并测试其正确性。这不是很彻底,但这是一个体面的开始。
! Test program for module hashtbl implementing an OO hash table (dictionary) ! in Fortran 2003. Compiles and runs with accompanying test program under ! the Intel Fortran Compiler, version 11.1.046 ! Copyright (c) Izaak Beekman 2010 ! This program is free software: you can redistribute it and/or modify ! it under the terms of the GNU Lesser General Public License as published by ! the Free Software Foundation, either version 3 of the License, or ! (at your option) any later version. ! This program is distributed in the hope that it will be useful, ! but WITHOUT ANY WARRANTY; without even the implied warranty of ! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ! GNU Lesser General Public License for more details. ! You should have received a copy of the GNU Lesser General Public License ! along with this program. If not, see <http://www.gnu.org/licenses/>. PROGRAM test_hashtbl USE hashtbl IMPLICIT NONE TYPE(hash_tbl_sll) :: table CHARACTER(len=:), ALLOCATABLE :: out INTEGER, parameter :: tbl_length = 100 INTEGER :: sum, i, rand_int1, rand_int2 ! 4 byte integer, hopefully REAL :: rand CHARACTER(len=4) :: rand_str1, rand_str2 ! each char should be 1 byte PRINT*, ' ' PRINT*, 'This program is free software: you can redistribute it and/or & &modify it under the terms of the GNU Lesser General Public License& & as published by the Free Software Foundation, either version 3 of& & the License, or (at your option) any later version.' PRINT*, ' ' PRINT*, 'This program is distributed in the hope that it will be useful,& & but WITHOUT ANY WARRANTY; without even the implied warranty of& & MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the & &GNU Lesser General Public License for more details.' PRINT*, ' ' PRINT*, 'You should have received a copy of the GNU Lesser General Public & &License along with this program. If not, see & &<http://www.gnu.org/licenses/>.' PRINT*, ' ' CALL table%init(tbl_length) CALL table%put(key='first_name', val='John') PRINT*, 'Hash: ', MOD(sum_string('first_name'),tbl_length) CALL table%put(key='last_name', val='Smith') PRINT*, 'Hash: ', MOD(sum_string('last_name'),tbl_length) CALL table%put(key='birthday', val='July 30, 1964') PRINT*, 'Hash: ', MOD(sum_string('birthday'),tbl_length) CALL table%put(key='hair_color', val='brown') PRINT*, 'Hash: ', MOD(sum_string('hair_color'),tbl_length) CALL table%put(key='eye_color', val='brown') PRINT*, 'Hash: ', MOD(sum_string('eye_color'),tbl_length) CALL table%put(key='weight', val='213 lbs') PRINT*, 'Hash: ', MOD(sum_string('weight'),tbl_length) CALL table%put(key='height', val='6''3"') PRINT*, 'Hash: ', MOD(sum_string('height'),tbl_length) PRINT*, ' ' CALL table%get(key='first_name',val=out) PRINT*, out CALL table%get('last_name',out) PRINT*, out CALL table%get('birthday',out) PRINT*, out CALL table%get('hair_color',out) PRINT*, out CALL table%get('eye_color',out) PRINT*, out CALL table%get('weight',out) PRINT*, out CALL table%get('height',out) PRINT*, out ! INCLUDE 'stress_test.f90' PRINT*, ' ' sum = 0 PRINT*, 'Indices of the hash table with content:' DO i = LBOUND(table%vec,dim=1), UBOUND(table%vec,dim=1) IF (ALLOCATED(table%vec(i)%key)) THEN PRINT*, i sum = sum + 1 END IF END DO PRINT*, 'Total used elements:', sum CALL table%free PRINT*, ' ' STOP 0 END PROGRAM test_hashtbl
上述INCLUDE语句中的压力测试stress_test.f90可以在下面看到。应注意,这些测试失败。我们可以检查此失败的原因是否来自RANDOM_NUMBER(.)函数,该函数重复rand_int1的值与rand_int2的两个不同值。通过当out/=rand_str2时打印table%vec(idx)的节点列表可以看出这一事实,其中idx是rand_str1的哈希索引。
! Included file for test_hashtbl.f90 PRINT*, ' ' sum = 0 PRINT*, 'Mild stress test.' DO i = 1,2000 ! 4byte integers default on most systems CALL RANDOM_NUMBER(rand) rand_int1 = NINT(rand*1000) rand_str1 = TRANSFER(rand_int1,rand_str1) CALL RANDOM_NUMBER(rand) rand_int2 = NINT(rand*1000) rand_str2 = TRANSFER(rand_int2,rand_str2) CALL table%put(key=rand_str1, val=rand_str2) CALL table%get(key=rand_str1,val=out) IF (TRANSFER(out,rand_int2) /= rand_int2) THEN PRINT*, 'Error, i=',i,' key=',rand_int1,' Val=',rand_int2,' & &Out=',TRANSFER(out,rand_int2) sum = sum + 1 END IF END DO PRINT*, 'Number of errors:',sum
可以生成带有配置信息的调用图。这可以在下面看到,但是分析信息是无用的,因为测试程序不够长(即,它没有调用各种子例程的次数达到生成有意义的分析信息所需的次数)。
测试程序和库调用图(丢失)
可以使用以下部分中的构建指令,使用gprof和gprof2dot.py制作此图像。
Build Instructions
下面是一个GNU Make makefile,用于构建库、构建测试程序、剖析库和测试程序、构建调用图以及构建哈希表数据结构示意图。
# Make file for hash table (dictionary) example. Makes a call graph with profiling information and # a schematic of the data structure. # Copyright (c) 2010 Izaak Beekman # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Lesser General Public License for more details. # You should have received a copy of the GNU Lesser General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. # Compilers and flags FC = ifort FCFLAGS = -g -traceback -fno-omit-frame-pointer -p -fno-inline -fno-inline-functions #-ipo -O3 LDFLAGS = $(FCFLAGS) #-fast FCSYNCHK = -syntax-only -warn # Build rules COMPILE.f90 = $(FC) $(FCFLAGS) -c $< LINK.f90 = $(FC) $(LDFLAGS) -o $@ $^ # Portability macros RM = rm DOT = dot PROFILER = gprof GPROF2DOT = gprof2dot.py -n0 -e0 #Can download from: http://code.google.com/p/jrfonseca/wiki/Gprof2Dot # Compilation pattern rules, may need to override builtins # For GNU Make MAKE = $(MAKE) -R # to override %.o: %.f90 $(COMPILE.f90) %: %.o $(LINK.f90) %.mod: %.f90 # Just lowercase module names $(FC) $(FCSYNCHK) $< # Patern rule for making PNGs from graphviz dot files %.png: %.dot $(DOT) -Tpng < $< > $@ %.dot: % ./$< $(ARGS) $(PROFILER) ./$< | $(GPROF2DOT) > $@ # Default target, call graph of test_hashtbl test_hashtbl.png: .PRECIOUS: test_hashtbl.dot .PRECIOUS:test_hashtbl test_hashtbl: modhashtbl.o call_graph.dot: test_hashtbl .PHONY: test_hashtbl.mod test_hashtbl.mod: modhashtbl.mod # Build the image of our dictionary/hash-table data type data_struct.png: 88x31-CC-by-sa.png .PHONY: clean clean: -$(RM) -f *.o *.mod
评论
在压力测试案例中,有两个可能的问题:
1. key由RANDOM_NUMBER()生成,因此相同的keys可以被put到哈希表中。重复的keys在本模块中没有得到很好的处理。在子例程put_sll(list, key, val)中,没有代码来处理之前出现的具有相同key的值。
2. 另一个风险是使用transfer函数。尽管这个问题可能不会出现在这个测试用例中,但仔细选择传输类型的大小很重要。此外,我的个人经验表明,在特定的if语句中使用module中的transfer(key, 您的原始key类型) 可能更好。例如,在module中,我们有:
IF (list%key /= key) THEN IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child) CALL put_sll(list%child,key,val) END IF
如果你的原始键是整数,那使用下面的代码更好
IF (transfer(list%key, integer) /= transfer(key, integer)) THEN
IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
CALL put_sll(list%child,key,val)
END IF
2022-11-19 16:46