一种hashtable 实现源码分析

上篇分析了hashtable实现的相关接口与用法，本篇将深入到代码中，分析其代码原理。

其源码有

hash_function.h         ——实现hash函数
hash_table.h             ——hashtable的实现
list.h                         ——hashtable的开链法用到了链表
test_hashtable.c        ——测试

结构定义

用于链接节点的hash_entry

struct hash_entry {
    struct list_head list;
    unsigned char *key;
    unsigned int keylen;
};

用于全局结构的hash_table

struct hash_table {
    struct hash_entry *table;

    unsigned int buckets;
    pthread_mutex_t *bucket_locks;

    pthread_mutex_t lock;
    keycmp_ptr keycmp;
    /* private variables */
    unsigned int __ht_i;
    struct list_head *pos;
};

keycmp_ptr是一个函数指针，其定义为

typedef int (*keycmp_ptr) (const void *, const void *, size_t);

pthread_mutex_t lock 和pthread_mutex_t *bucket_locks，用于互斥访问（该实现宣称是线程安全的），下面的接口函数用于内部函数的互斥用（准确说是在xxx_safe()函数中引用）。互斥对linux的互斥操作进行了简单的封装，直接调用了linux下的函数pthread_mutex_lock 和pthread_mutex_unlock，操作的对象为hash_table中的成员变量 pthread_mutex_t *bucket_locks数组，每个链表都有一个lock来标示。

实现代码如下

View Code

static inline int hash_table_bucket_lock(struct hash_table *t, unsigned int n)
{
    return (pthread_mutex_lock(&(t->bucket_locks[n])));
}

static inline int hash_table_bucket_unlock(struct hash_table *t, unsigned int n)
{
    return (pthread_mutex_unlock(&(t->bucket_locks[n])));
}

static inline int hash_table_lock(struct hash_table *t)
{
    return (pthread_mutex_lock(&(t->lock)));
}

static inline int hash_table_unlock(struct hash_table *t)
{
    return (pthread_mutex_unlock(&(t->lock)));
}

判断是否锁的状态（但是程序里面没有用到）

View Code

static inline int hash_table_bucket_locked(struct hash_table *t, unsigned int n)
{
return (pthread_mutex_trylock((t->bucket_locks[n])) == EBUSY);
}

static inline int hash_table_locked(struct hash_table *t)
{
return (pthread_mutex_trylock(&(t->lock)) == EBUSY);
}

2.初始化与释放

涉及到两个结构的初始化和释放

static inline int hash_entry_init(struct hash_entry *e,
                  const unsigned char *str, unsigned int len);
static inline void hash_entry_finit(struct hash_entry *e);

static inline int hash_table_init(struct hash_table *h, unsigned int b,
                  keycmp_ptr keycmp);
static inline void hash_table_finit(struct hash_table *h);

个人觉得作为接口函数，其前面的static inline修饰应该去掉。明显，entry和table都必须显式的初始化与释放。并且，作为库函数的话，应该实现在.c文件中，而非.h文件中。

源码分析，由于entry和table结构中都是数据指针，所以初始化的过程，就是按照需求分配空间。

在entry中，用key保存申请到的空间头指针，

e->key = (unsigned char *)malloc(len);

在table中，申请有两处，分别是

/**b为用户指定的大小*/
h->table =(struct hash_entry *)malloc(sizeof(struct hash_entry) * b);
h->bucket_locks =(pthread_mutex_t *) malloc(sizeof(pthread_mutex_t) * b));

注：在stl实现中table是用vetor来保存的，不必显式申请空间，也没有bucket_locks进行加锁操作。

回收与此相反，是否空间，再次不表。值得注意的是，其table的释放中，只释放了h->table的空间大小，并没有理会bucket_locks，此处貌似有问题。所以修改为

static inline void hash_table_finit(struct hash_table *h)
{
    if (h->table)
        free(h->table);
    h->buckets = 0;
    if(h->buckets_locks)
        free(h->buckets_locks);
}

3.插入与查找

接口

void hash_table_insert(struct hash_table *h,
　　　　 　　　　　　　　 struct hash_entry *e,
　　　　 　　　　　　　　　const unsigned char *key,
　　　　　　　　　　　　　 unsigned int len);
struct hash_entry *hash_table_lookup_hash_entry(const struct hash_table *h,
　　　　　　　　const struct hash_entry *e)

struct hash_entry *hash_table_lookup_key(const struct hash_table *h,
　　　　　　　　　　　 　　　　　　　　　　　　 const unsigned char *str,
　　　　　　　　　　　　　　　　　　　　　　　　unsigned int len);

struct hash_entry *hash_table_del_key(struct hash_table *h,
　　　　　　　　　　　　　　　　　　　　　　const char *str,
　　　　　　　　　　　　　　　　　　　　　　unsigned int len);

1）insert

A)通过hash_table_hash_code函数定位其在table中的位置。

unsigned int key = hash_table_hash_code(h, str, len);//position in table

B)插入到table[key]的头部

list_add(&(e->list), &(h->table[n].list));//add to the front of List

2）search(look_up)

通过hash_table_hash_code函数定位其在table中的位置

unsigned int key = hash_table_hash_code(h, str, len);//position in table

在通过变量table[k]下的链表寻找到节点后返回

list_for_each(pos, &(h->table[key].list)) {
        tmp = list_entry(pos, struct hash_entry, list);
        if ((tmp->keylen == len)&& (h->keycmp(tmp->key, str, tmp->keylen) == 0))
            return tmp;
    }

非常的常规

3）删除

直接上代码

struct hash_entry *hash_table_del_key(struct hash_table *h, const char *str,unsigned int len)
{
    struct hash_entry *e;
    if ((e = hash_table_lookup_key(h, str, len)) == NULL)
        return NULL;

    list_del_init(&(e->list));
    return e;
}

注意返回的e节点，空间并未释放，如何进行操作应该是用户的责任。

4.遍历

遍历提供了类似于list一样的接口，在前文中已经有所叙述

hash_entry(ptr, type, member);
hash_table_for_each(hentry, htable)
hash_table_for_each_safe(hentry, htable, pos, hti)

hash_entry 与list中的很类似，关于其解释在blog中已经详述

#define hash_entry(ptr, type, member) \
    ((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

hash_table_for_each用来遍历，由于hashtable的结构特点，其是两层循环的。具体实现如下

#define hash_table_for_each(hentry, htable)    \
    for    ((htable)->__ht_i=0; ((htable)->__ht_i < (htable)->buckets); ++((htable)->__ht_i))    \
        for(((htable)->pos= (htable)->table[(htable)->__ht_i].list.next);        \
                ((htable)->pos != &((htable)->table[(htable)->__ht_i].list)) &&    \
                ((hentry) = ((struct hash_entry *)((char *)((htable)->pos)-(unsigned long)\
                             (&((struct hash_entry *)0)->list))) );    \
                (htable)->pos= (htable)->pos->next)

在此处可以看到hast_table中定义两个private变量（红色）的作用，是为了方便遍历。

红色的可以实现为（亲测可行）

 ((hentry) = hash_entry((htable)->pos,struct hash_entry,list));   \

其实现基于hash_entry。

——————————————————————————————————————————————————————————

5.hash函数

定位的hash

static inline int hash_table_hash_code(const struct hash_table *t,
                       const char *key, unsigned int len)
{

    return (__hash(key, len) % t->buckets);
}

其实现的__hash函数借用的是

http://www.azillionmonkeys.com/qed/hash.html

实现太复杂，看不太明白，有兴趣的请移步上述网址看看。

——————————————————————————————————————————————————————————

6总论

1)该份代码比较简洁，适合初学者，实现也非常简单

2)作为理解hashtable原理的代码，该代码是成功的。

3)今天在水木上问到为什么.h中，那么多static inline，有网友回答引用《C++ primer》里的原话

"To expand the code of an inline function at the point of call, the compiler
must have access to the function definition.
就是说inline函数编译的时候就要展开的，如果放在单独的.c或者.cc文件里，就没办法
做到了"

为曾考证，但是应该有些道理（用在C中）

其实现中static是限制作用于本文件的，但是如果是static包含的话，是否可以当做普通的函数来引用？至少从其测试用例上看是可以的。

因为其实现的初始化添加的static inline限制，但是仍然可以调用。

但是对比STL中的实现，其至少有如下几点注意的。

1）STL中可以指定n，但是其在初始化的时候，并不是用用户指定的n值，而是与n最为接近的质数来初始化桶的大小，该策略据说能够有效的减少冲突的概率。

2）STL当节点数达到一定的个数后，其会重新建立hashtable——rehash过程。

posted @ 2011-04-07 23:02 westfly 阅读(789) 评论(0) 编辑收藏举报

刷新页面返回顶部

westfly

一种hashtable 实现源码分析

公告