字符串

字符串hash

Definition

hash算法是将字符串映射到整数的一种算法

为什么这么做

通过把字符串映射到整数，可以降低一些操作的复杂度。例如比较两个字符串\(s_1\)和\(s_2\)是否相同，暴力方法的复杂度为\(O(n)\),其中\(n\)为所比较字符串的长度，而若是将\(s_1,s_2\)映射到\(a_1,a_2\)上，则可以实现\(O(1)\)的比较。

hash的性质

1.在 Hash 函数值不一样的时候，两个字符串一定不一样；

2.在 Hash 函数值一样的时候，两个字符串不一定一样（但有大概率一样，且我们当然希望它们总是一样的）。

（From OI Wiki）

实现方法

设字符串\(s\)，\(h(s)\)为其哈希值，\(n\)为\(s\)的长度，则有如下方法实现hash

\[h(s)=\sum_{i=0}^{n-1}{s[i] \times seed^i} \]

或者如

\[h(s)=\sum_{i=0}^{n-1}s[i]\times seed^{n-1-i} \]

不难看出，这种方法求出的hash值满足上述的性质。实现代码如下

Code：

const ll mod=1e9+7,seed=31;
struct hash_s{
    ll hash;
    hash_s():has(0){}
    const bool operator <(hash_s x)const{
        return hash<x.hash;
    }
    const bool operator == (hash_s x)const{
        return hash==x.hash;
    }
};
hash_s f(string s){
    hash_s ret;
    for(int i=0;i<s.size();++i){
        ret.hash0=(ret.hash1*seed+ll(s[i]))%mod;
    }
    return ret;
}

双模哈希

然而，哈希是有极限的！所以，那就只能使用两个哈希值了！

为什么要用双模哈希？

在上文哈希的性质中说过，hash值一样，字符串只是大概率相同。可能出现字符串不同，但是却拥有相同hash值的情况（被称为hash碰撞）。这可能会导致程序错误（根据数据和题意不同）那么我们可以通过双模hash使得出现hash碰撞的概率变得极小

什么是双模hash

顾名思义，就是同时有两个模数\(mod_1,mod_2\)，最后也得到两个hash值\(hash_1,hash_2\),只有两个hash值都相等，我们才认为这两个字符串是相同的。而关于hash碰撞的概率分析，请移步OI WIKI ~~才不是我不会呢，哼~~

Code:

const ll seed=31,mod1=1e9+7,mod2=1e9+9;
struct hash_s{//hash这个字是在std命名空间中已经出现......
    ll hash1,hash2;//两个hash值
    hash_s():hash1(0),hash2(0){}
    const bool operator <(hash_s x)const{
        if(hash1<x.hash1)   return true;
        else if(hash1==x.hash1 && hash2<x.hash2)    return true;
        return false;//这里不要忘了returnfalse，否则直接RE+WA
    }
    const bool operator ==(hash_s x)const{
        return hash1==x.hash1 && hash2==x.hash2;
    }
};
hash_s f(string s){
    hash_s ret;
    ll seed_now1=1,seed_now2=1;
    for(int i=0;i<s.size();++i){
        ret.hash1+=ll(s[i])*seed_now1;
        ret.hash2+=ll(s[i])*seed_now2;
        ret.hash1%=mod1;
        ret.hash2%=mod2;
        seed_now1*=seed;
        seed_now2*=seed;
        seed_now1%=mod1;
        seed_now2%=mod2;
    }
    return ret;
}

或者

hash_s f(string s){//两种不同的处理获得hash值的方法
    hash_s ret;
    for(int i=0;i<s.size();++i){
        ret.hash1=(ret.hash1*seed+ll(s[i]))%mod1;
        ret.hash2=(ret.hash2*seed+ll(s[i]))%mod2;
    }
    return ret;
}

题目

P3370 【模板】字符串哈希 - 洛谷 | 计算机科学教育新生态 (luogu.com.cn)

字典树（Trie）

Definition

字典树是一种用边表示字符的树，从任意节点\(c\)到根节点的路径表示一个字符串，同时记录每个字符串结尾的位置

（当然个人感觉把字符放在点上也没有问题）

如下图

（只有聪明人才能看见的图）

（好吧，看OI wiki的图去吧）

这种数据结构十分简单易懂，不多赘述。

Code：

struct Trie{
    static const int N=maxn,charset=26;
    int tot,root,nex[N][charset],flag[N];
    Trie(){
        memset(nex,-1,sizeof(nex));
        memset(flag,0,sizeof(flag));
        root = tot = 0;
    }
    void clear(){
        memset(nex,-1,sizeof(nex));
        memset(flag,0,sizeof(flag));
        root = tot = 0;
    }
    void insert(string s){
        int now=root;
        for(int i=0;i<s.size();++i){
            int x=s[i]-'a';
            if(nex[now][x]==-1){
                nex[now][x]=++tot;
            }
            now=nex[now][x];
        }
        flag[now]=1;
    }
    bool query(string s){
        int now=root;
        for(int i=0;i<s.size();++i){
            int x=s[i]-'a';
            if(nex[now][x]==-1 ){
                return false;
            }
            now=nex[now][x];
        }
        if(flag[now]==1 )   return true;
        return false;
    }
};

题目

P2580 于是他错误的点名开始了 - 洛谷 | 计算机科学教育新生态 (luogu.com.cn)

KMP

算法思想

没错这个我实在不会讲了直接去其他博客吧，列出板子跑路

Code：

void kmp_pre(char s0[],int m,int next[]){
    int i,j;
    j=next[0]=-1;
    i=0;
    while(i<m){
        while(-1!=j && s0[i]!=s0[j]) j=next[j];
        next[++i]=++j;
    }
}

int kmp_count(char s0[],int m,char s[],int n){
    int i,j;
    int ans=0;
    int next[10007];
    kmp_pre(s0,m,next);
    i=j=0;
    while(i<n){
        while(-1!=j && s[i]!=s0[j]) j=next[j];
        i++;j++;
        if(j>=m){
            ans++;
            j=next[j];
        }
    }
    return ans;
}

参考博客

字符串部分简介 - OI Wiki (oi-wiki.org)

String Hashing - Competitive Programming Algorithms (cp-algorithms.com)

posted @ 2021-07-27 15:34 sora_013 阅读(71) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

sora_013

字符串

字符串

字符串hash

Definition

为什么这么做

hash的性质

实现方法

双模哈希

为什么要用双模哈希？

什么是双模hash

题目

字典树（Trie）

Definition

题目

KMP

算法思想

参考博客

公告