Fortran哈希表

哈希表

参考自https://fortranwiki.org/fortran/show/Hash+tables

哈希表(Hash tables)通常被用于构建这样一个指针数组,数组中的每一个元素指向一个动态的数据类型(如链表或二叉树)。它需要一些哈希函数,用以将相同的关键字映射到指针数组中同样的位置中。

接下来,遍历动态数据类型,或搜索相应的键。如果键存在,则可以检索或覆盖该值。如果键不存在,那么,如果正在查询哈希表,get方法必须指示键值对不存在。如果正在调用put方法,则如果之后的动态数据类型中缺少键值对,则必须将其适当插入。 The Practice of Programming 的第55页有一个很好的介绍性讨论。使用哈希表和适当大的链表数组可以得到O(1)复杂度的查找和放置操作。

 

哈希表例子

参考https://fortranwiki.org/fortran/show/hash+table+example

有这样一个基于Fortran 2003的 面向对象的 哈希表的模块实现。这个模块是基于LGPL证书授权的。这个模块可以通过使transfer用函数,扩展成泛型化的,或者通过类扩展(type extension),变得专门化。这写方法可以通过类绑定过程被附加到对象中。额外的PRIVATE声明可以被插入或者被取消初始掉从而用于产品代码;最初始的对象和方法被保留在这里用于测试。这个库和对应的测试程序被用Intel Fortran compiler, version 11.1.046进行编译,程序和库看起来是正确的,但未进行彻底的测试。

代码

Library

以下库是这个库是根据The Practice of Programming中对哈希表数据结构的描述而创建的。

! Module implementing an OO hash table (dictionary) in Fortran 2003.
! Compiles and runs with accompanying test program under the Intel 
! Fortran Compiler, version 11.1.046

! Copyright (c) Izaak Beekman 2010

    ! This program is free software: you can redistribute it and/or modify
    ! it under the terms of the GNU Lesser General Public License as published by
    ! the Free Software Foundation, either version 3 of the License, or
    ! (at your option) any later version.

    ! This program is distributed in the hope that it will be useful,
    ! but WITHOUT ANY WARRANTY; without even the implied warranty of
    ! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    ! GNU Lesser General Public License for more details.

    ! You should have received a copy of the GNU Lesser General Public License
    ! along with this program.  If not, see <http://www.gnu.org/licenses/>.

MODULE hashtbl
  IMPLICIT NONE ! Use strong typing
  INTEGER, PARAMETER :: tbl_size = 50

  TYPE sllist
     TYPE(sllist), POINTER :: child => NULL()
     CHARACTER(len=:), ALLOCATABLE :: key, val
   CONTAINS
     PROCEDURE :: put  => put_sll
     PROCEDURE :: get  => get_sll
     PROCEDURE :: free => free_sll
  END TYPE sllist

  TYPE hash_tbl_sll
     TYPE(sllist), DIMENSION(:), ALLOCATABLE :: vec
     INTEGER                                 :: vec_len = 0
     LOGICAL                                 :: is_init = .FALSE.
   CONTAINS
     PROCEDURE :: init => init_hash_tbl_sll
     PROCEDURE :: put  => put_hash_tbl_sll
     PROCEDURE :: get  => get_hash_tbl_sll
     PROCEDURE :: free => free_hash_tbl_sll
  END TYPE hash_tbl_sll

  PUBLIC :: hash_tbl_sll

在这里我们可以发现哈希表是由两种派生类型构建的,以及每一个数据对象具有的类型绑定过程(方法)(注意:销毁Finalization也应该添加到这些对象中,但我不确定截至2010年6月5日编译器是否支持此功能。)。

第一个对象sllist是一个单链表,是一个可分配的标量CHARACTER类型的键和值。(字符串长度可以在运行时分配。)这个类型有3种方法:1.put 将键值对放入(必要时创建)适当的列表元素。2.get 获取与键对应的字符串(值),如果键值对存在。3.free 完整的销毁链接列表。
第二个对象是哈希表。它包含存储一些元数据的元素,如向量长度(对哈希键很重要)和对象的状态,以及单链表数组vec。与此对象相关的方法有:1.init,通过将vec分配为一定长度来初始化对象,并跟踪对象状态和元数据。2.put 放置以哈希键,并将值存储在适当的链接列表的适当元素中,必要时创建它。3.get 如果键值对存在,则获取哈希key并检索关联值。4.free 销毁哈希表,释放内存并跟踪元数据和对象状态。

下面是一个实例化的哈希表对象的示意图,如在这个库中实现的:

具有单链接列表元素的哈希表(图丢失)

 

模块的其余部分,包括以下所有过程。

CONTAINS

  RECURSIVE SUBROUTINE put_sll(list,key,val)
    CLASS(sllist),    INTENT(inout) :: list
    CHARACTER(len=*), INTENT(in)    :: key, val
    INTEGER                         :: keylen, vallen

    keylen = LEN(key)
    vallen = LEN(val)
    IF (ALLOCATED(list%key)) THEN
       IF (list%key /= key) THEN
          IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
          CALL put_sll(list%child,key,val)
       END IF
    ELSE
       IF (.NOT. ALLOCATED(list%key)) &
            ALLOCATE(CHARACTER(len=keylen) :: list%key)
       list%key = key
       IF (ALLOCATED(list%val)) DEALLOCATE(list%val)
       ALLOCATE(CHARACTER(len=vallen) :: list%val)
       list%val = val
    END IF
  END SUBROUTINE put_sll


  RECURSIVE SUBROUTINE get_sll(list,key,val)
    CLASS(sllist),                 INTENT(in)    :: list
    CHARACTER(len=*),              INTENT(in)    :: key
    CHARACTER(len=:), ALLOCATABLE, INTENT(out)   :: val
    INTEGER                                      :: vallen

    vallen = 0
    IF (ALLOCATED(list%key) .AND. (list%key == key)) THEN
       vallen = LEN(list%val)
       IF (ALLOCATED(val)) DEALLOCATE(val)
       ALLOCATE(CHARACTER(len=vallen) :: val)
       val = list%val
    ELSE IF(ASSOCIATED(list%child)) THEN ! keep going
       CALL get_sll(list%child,key,val)
    ELSE ! At the end of the list, no key found
       IF (ALLOCATED(val)) DEALLOCATE(val) ! Exit indication
       RETURN
    END IF
  END SUBROUTINE get_sll


  RECURSIVE SUBROUTINE free_sll(list)
    CLASS(sllist), INTENT(inout) :: list
    IF (ASSOCIATED(list%child)) THEN
       CALL free_sll(list%child)
       DEALLOCATE(list%child)
    END IF
    list%child => NULL()
    IF (ALLOCATED(list%key)) DEALLOCATE(list%key)
    IF (ALLOCATED(list%val)) DEALLOCATE(list%val)
  END SUBROUTINE free_sll

上述过程被绑定到单链表对象。第一个虚参是它绑定到的对象,每当通过对象调用过程时都会自动传递。由于这是一个单链表,我们只能在一个方向上遍历列表,并且我们可以使用直接递归相对简洁地定义与列表相关联的方法。 

 SUBROUTINE init_hash_tbl_sll(tbl,tbl_len)
    CLASS(hash_tbl_sll),   INTENT(inout) :: tbl
    INTEGER,     OPTIONAL, INTENT(in)    :: tbl_len

    IF (ALLOCATED(tbl%vec)) DEALLOCATE(tbl%vec)
    IF (PRESENT(tbl_len)) THEN
       ALLOCATE(tbl%vec(0:tbl_len-1))
       tbl%vec_len = tbl_len
    ELSE
       ALLOCATE(tbl%vec(0:tbl_size-1))
       tbl%vec_len = tbl_size
    END IF
    tbl%is_init = .TRUE.
  END SUBROUTINE init_hash_tbl_sll

  ! The first part of the hashing procedure using the string
  ! collating sequence
  ELEMENTAL FUNCTION sum_string(str) RESULT(sig)
    CHARACTER(len=*), INTENT(in)   :: str
    INTEGER                        :: sig
    CHARACTER, DIMENSION(LEN(str)) :: tmp
    INTEGER :: i

    FORALL (i=1:LEN(str))
       tmp(i) = str(i:i)
    END FORALL
    sig = SUM(ICHAR(tmp))
  END FUNCTION sum_string


  SUBROUTINE put_hash_tbl_sll(tbl,key,val)
    CLASS(hash_tbl_sll), INTENT(inout) :: tbl
    CHARACTER(len=*),    INTENT(in)    :: key, val
    INTEGER                            :: hash

    hash = MOD(sum_string(key),tbl%vec_len)
    CALL tbl%vec(hash)%put(key=key,val=val)
  END SUBROUTINE put_hash_tbl_sll


  SUBROUTINE get_hash_tbl_sll(tbl,key,val)
    CLASS(hash_tbl_sll),           INTENT(in)    :: tbl
    CHARACTER(len=*),              INTENT(in)    :: key
    CHARACTER(len=:), ALLOCATABLE, INTENT(out)   :: val
    INTEGER                                      :: hash

    hash = MOD(sum_string(key),tbl%vec_len)
    CALL tbl%vec(hash)%get(key=key,val=val)
  END SUBROUTINE get_hash_tbl_sll


  SUBROUTINE free_hash_tbl_sll(tbl)
    CLASS(hash_tbl_sll), INTENT(inout) :: tbl    
    INTEGER     :: i, low, high

    low  = LBOUND(tbl%vec,dim=1)
    high = UBOUND(tbl%vec,dim=1) 
    IF (ALLOCATED(tbl%vec)) THEN
       DO i=low,high
          CALL tbl%vec(i)%free()
       END DO
       DEALLOCATE(tbl%vec)
    END IF
    tbl%is_init = .FALSE.
  END SUBROUTINE free_hash_tbl_sll

END MODULE hashtbl

上面定义的过程对哈希表对象进行操作。由于高度的封装和抽象,实现这些过程相对容易;我们可以使用为单链表定义的方法。

 

Test Program

这个测试程序使用上面定义的库,并测试其正确性。这不是很彻底,但这是一个体面的开始。 

! Test program for module hashtbl implementing an OO hash table (dictionary) 
! in Fortran 2003. Compiles and runs with accompanying test program under 
! the Intel Fortran Compiler, version 11.1.046

! Copyright (c) Izaak Beekman 2010

    ! This program is free software: you can redistribute it and/or modify
    ! it under the terms of the GNU Lesser General Public License as published by
    ! the Free Software Foundation, either version 3 of the License, or
    ! (at your option) any later version.

    ! This program is distributed in the hope that it will be useful,
    ! but WITHOUT ANY WARRANTY; without even the implied warranty of
    ! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    ! GNU Lesser General Public License for more details.

    ! You should have received a copy of the GNU Lesser General Public License
    ! along with this program.  If not, see <http://www.gnu.org/licenses/>.

PROGRAM test_hashtbl
  USE hashtbl
  IMPLICIT NONE
  TYPE(hash_tbl_sll)            :: table
  CHARACTER(len=:), ALLOCATABLE :: out
  INTEGER, parameter :: tbl_length = 100
  INTEGER            :: sum, i, rand_int1, rand_int2 ! 4 byte integer, hopefully
  REAL               :: rand
  CHARACTER(len=4)   :: rand_str1, rand_str2 ! each char should be 1 byte

  PRINT*, ' '
  PRINT*, 'This program is free software: you can redistribute it and/or &
       &modify it under the terms of the GNU Lesser General Public License&
       & as published by the Free Software Foundation, either version 3 of&
       & the License, or (at your option) any later version.'
  PRINT*, ' '
  PRINT*, 'This program is distributed in the hope that it will be useful,&
       & but WITHOUT ANY WARRANTY; without even the implied warranty of&
       & MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the &
       &GNU Lesser General Public License for more details.'
  PRINT*, ' '
  PRINT*, 'You should have received a copy of the GNU Lesser General Public &
       &License along with this program.  If not, see &
       &<http://www.gnu.org/licenses/>.'
  PRINT*, ' '

  CALL table%init(tbl_length)
  CALL table%put(key='first_name', val='John')
  PRINT*, 'Hash: ', MOD(sum_string('first_name'),tbl_length)
  CALL table%put(key='last_name', val='Smith')
  PRINT*, 'Hash: ', MOD(sum_string('last_name'),tbl_length)
  CALL table%put(key='birthday', val='July 30, 1964')
  PRINT*, 'Hash: ', MOD(sum_string('birthday'),tbl_length)
  CALL table%put(key='hair_color', val='brown')
  PRINT*, 'Hash: ', MOD(sum_string('hair_color'),tbl_length)
  CALL table%put(key='eye_color', val='brown')
  PRINT*, 'Hash: ', MOD(sum_string('eye_color'),tbl_length)
  CALL table%put(key='weight', val='213 lbs')
  PRINT*, 'Hash: ', MOD(sum_string('weight'),tbl_length)
  CALL table%put(key='height', val='6''3"')
  PRINT*, 'Hash: ', MOD(sum_string('height'),tbl_length)

  PRINT*, ' ' 

  CALL table%get(key='first_name',val=out)
  PRINT*, out
  CALL table%get('last_name',out)
  PRINT*, out
  CALL table%get('birthday',out)
  PRINT*, out
  CALL table%get('hair_color',out)
  PRINT*, out
  CALL table%get('eye_color',out)
  PRINT*, out
  CALL table%get('weight',out)
  PRINT*, out
  CALL table%get('height',out)
  PRINT*, out

  ! INCLUDE 'stress_test.f90'

  PRINT*, ' '

  sum = 0
  PRINT*, 'Indices of the hash table with content:'
  DO i = LBOUND(table%vec,dim=1), UBOUND(table%vec,dim=1)
     IF (ALLOCATED(table%vec(i)%key)) THEN 
        PRINT*, i
        sum = sum + 1
     END IF
  END DO
  PRINT*, 'Total used elements:', sum
  CALL table%free
  PRINT*, ' '
  STOP 0
END PROGRAM test_hashtbl

上述INCLUDE语句中的压力测试stress_test.f90可以在下面看到。应注意,这些测试失败。我们可以检查此失败的原因是否来自RANDOM_NUMBER(.)函数,该函数重复rand_int1的值与rand_int2的两个不同值。通过当out/=rand_str2时打印table%vec(idx)的节点列表可以看出这一事实,其中idxrand_str1的哈希索引。

 ! Included file for test_hashtbl.f90
  PRINT*, ' '

  sum = 0
  PRINT*, 'Mild stress test.'
  DO i = 1,2000 ! 4byte integers default on most systems
     CALL RANDOM_NUMBER(rand)
     rand_int1 = NINT(rand*1000)
     rand_str1 = TRANSFER(rand_int1,rand_str1)
     CALL RANDOM_NUMBER(rand)
     rand_int2 = NINT(rand*1000)
     rand_str2 = TRANSFER(rand_int2,rand_str2)
     CALL table%put(key=rand_str1, val=rand_str2)
     CALL table%get(key=rand_str1,val=out)
     IF (TRANSFER(out,rand_int2) /= rand_int2) THEN
        PRINT*, 'Error, i=',i,' key=',rand_int1,' Val=',rand_int2,' &
             &Out=',TRANSFER(out,rand_int2)
        sum = sum + 1
     END IF     
  END DO
  PRINT*, 'Number of errors:',sum

可以生成带有配置信息的调用图。这可以在下面看到,但是分析信息是无用的,因为测试程序不够长(即,它没有调用各种子例程的次数达到生成有意义的分析信息所需的次数)。

测试程序和库调用图(丢失)

可以使用以下部分中的构建指令,使用gprof和gprof2dot.py制作此图像。

 

Build Instructions

下面是一个GNU Make makefile,用于构建库、构建测试程序、剖析库和测试程序、构建调用图以及构建哈希表数据结构示意图。

# Make file for hash table (dictionary) example.  Makes a call graph with profiling information and
# a schematic of the data structure.

# Copyright (c) 2010 Izaak Beekman

    # This program is free software: you can redistribute it and/or modify
    # it under the terms of the GNU Lesser General Public License as published by
    # the Free Software Foundation, either version 3 of the License, or
    # (at your option) any later version.

    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU Lesser General Public License for more details.

    # You should have received a copy of the GNU Lesser General Public License
    # along with this program.  If not, see <http://www.gnu.org/licenses/>.


# Compilers and flags
FC = ifort
FCFLAGS = -g -traceback -fno-omit-frame-pointer -p -fno-inline -fno-inline-functions #-ipo -O3
LDFLAGS = $(FCFLAGS) #-fast

FCSYNCHK = -syntax-only -warn

# Build rules
COMPILE.f90 = $(FC) $(FCFLAGS) -c $<

LINK.f90    = $(FC) $(LDFLAGS) -o $@ $^


# Portability macros
RM = rm
DOT = dot
PROFILER = gprof
GPROF2DOT = gprof2dot.py -n0 -e0 #Can download from: http://code.google.com/p/jrfonseca/wiki/Gprof2Dot

# Compilation pattern rules, may need to override builtins
# For GNU Make
MAKE = $(MAKE) -R
# to override

%.o: %.f90
    $(COMPILE.f90)

%: %.o
    $(LINK.f90)

%.mod: %.f90 # Just lowercase module names
    $(FC) $(FCSYNCHK) $<


# Patern rule for making PNGs from graphviz dot files
%.png: %.dot
    $(DOT) -Tpng < $< > $@

%.dot: %
    ./$<  $(ARGS)
    $(PROFILER) ./$< | $(GPROF2DOT) > $@


# Default target, call graph of test_hashtbl
test_hashtbl.png:
.PRECIOUS: test_hashtbl.dot

.PRECIOUS:test_hashtbl
test_hashtbl: modhashtbl.o

call_graph.dot: test_hashtbl


.PHONY: test_hashtbl.mod
test_hashtbl.mod: modhashtbl.mod

# Build the image of our dictionary/hash-table data type
data_struct.png: 88x31-CC-by-sa.png


.PHONY: clean
clean:
    -$(RM) -f *.o *.mod

 

评论

在压力测试案例中,有两个可能的问题:
1. key由RANDOM_NUMBER()生成,因此相同的keys可以被put到哈希表中。重复的keys在本模块中没有得到很好的处理。在子例程put_sll(list, key, val)中,没有代码来处理之前出现的具有相同key的值。
2. 另一个风险是使用transfer函数。尽管这个问题可能不会出现在这个测试用例中,但仔细选择传输类型的大小很重要。此外,我的个人经验表明,在特定的if语句中使用module中的transfer(key, 您的原始key类型) 可能更好。例如,在module中,我们有:

IF (list%key /= key) THEN
 IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
 CALL put_sll(list%child,key,val)
END IF

如果你的原始键是整数,那使用下面的代码更好

IF (transfer(list%key, integer) /= transfer(key, integer)) THEN
 IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
 CALL put_sll(list%child,key,val)
END IF

 

2022-11-19 16:46

posted @ 2024-07-13 01:06  chinagod  阅读(22)  评论(0编辑  收藏  举报