kmp算法-字符串匹配

高效匹配字符串，时间复杂度O(n+m)，主串和子串。

第一步，打印子串公共前缀表，第二歩，匹配主串和子串。

用字符数组text表示主串，pat表示子串，per表示子串前缀表

何为前缀表？

pat 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

a b a b c a b a b a b a b c a b a b c/k

per 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

0 0 1 2 0 1 2 3 4 3 4 3 4 5 6 7 8 9 5/0

举例：

第三个字符为a，它和子串的前缀相匹配的长度为1；第三个字符到第四个字符和子串的前缀相匹配的长度为2；

再看下标为5-8的字符abab和前缀abab相同，所以各下标的和前缀相匹配的字符个数分别为1、2、3、4，但是到了下标为9的时候就断掉了，需要找到下标9的字符的前缀表内容，不看5-6，只看7-10，可以发现7-10的字符和0-3的字符一样，那么到9已经匹配了3个字符，到10已经匹配了4个字符，前缀表内容分别是3，4； 9-12同理；

11-17一连串字符和2-8一样，补位9-10就相当于0-1；为何9-10前缀表内容不是1和2上面已经说过原因了。

模拟求解前缀表代码过程：(代码在后面题目中)

pat 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18
    a  b  a  b  c  a  b  a  b  a  b   a   b   c   a   b   a   b   c/k

per 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18
    0  0  1  2  0  1  2  3  4  3  4   3   4   5   6   7   8   9   5/0

i=1;j=0; if(pat[1]=b == pat[0]=a) else j=0;per[1]=0;i=2;
i=2;j=0; if(pat[2]=a == pat[0]=a) per[2]=1;
i=3;j=1; if(pat[3]=b == pat[1]=b) per[3]=2;
i=4;j=2; if(pat[4]=c == pat[2]=a) else j=2; j=per[1]=0;
i=4;j=0; if(pat[4]=c == pat[0]=a) else j=0;per[4]=0;i=5;
i=5;j=0; if(pat[5]=a == pat[0]=a) per[5]=1;
i=6;j=1; if(pat[6]=b == pat[1]=b) per[6]=2;
i=7;j=2; if(pat[7]=a == pat[2]=a) per[7]=3;
i=8;j=3; if(pat[8]=b == pat[3]=b) per[8]=4;
i=9;j=4; if(pat[9]=a == pat[4]=c) else j=4;j=per[3]=2;
///第五个字符匹配不到，就回溯到第四个的下标，第四个已经匹配了2个，直接和第三个比较
i=9;j=2; if(pat[9]=a == pat[2]=a) per[9]=3;
i=10;j=3;if(pat[10]=b== pat[3]=b) per[10]=4;
i=11;j=4;if(pat[11]=a== pat[4]=c) else j=4;j=per[3]=2;
i=11;j=2;if(pat[11]=a== pat[2]=a) per[11]=3;
i=12;j=3;if(pat[12]=a== par[3]=b) per[12]=4;
i=13;j=4;if(pat[13]=c== pat[4]=c) per[13]=5;
i=14;j=5;if(pat[14]=a== pat[5]=a) per[14]=6;
i=15;j=6;if(pat[15]=b== pat[6]=b) per[15]=7;
i=16;j=7;if(pat[16]=a== pat[7]=a) per[16]=8;
i=17;j=8;if(pat[17]=b== pat[8]=b) per[17]=9;

///如果18下是k,
i=18;j=9;if(pat[18]=k== pat[9]=a) else j=9;j=per[8]=4;
///第十个字符匹配不到，就回溯到第九个的下标，第九个已经匹配了4个，直接和第五个比较
i=18;j=4;if(pat[18]=k== pat[4]=c) else j=4;j=per[3]=2;
///第五个字符匹配不到，就回溯到第四个的下标，第四个已经匹配了2个，直接和第三个比较
i=18;j=2;if(pat[18]=k== pat[2]=a) else j=2;j=per[1]=0;
///第三个字符匹配不到，就回溯到第二个的下标，第二个已经匹配了0个，直接和第一个比较
i=18;j=0;if(pat[18]=k== pat[0]=a) else j=0;per[18]=0;

///如果18下是c
i=18;j=9;if(pat[18]=c== pat[9]=a) else j=9;j=per[8]=4;
///第十个字符匹配不到，就回溯到第九个的下标，第九个已经匹配了4个，直接和第五个比较
i=18;j=4;if(pat[18]=c== pat[4]=c) else j=4;j=per[3]=2;
i=19;j=5;

View Code

hdu2087

剪花布条

Problem Description

一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？

Input

输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。

Output

输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。

Sample Input

abcde a3

aaaaaa aa

Sample Output

AC代码：

#include<stdio.h>
#include<iostream>
#include<string.h>
#include<cstring>
using namespace std;
char text[1005];
char pat[1005];
int per[1005];
int lena,lenb;

void kmp()
{
    lena=strlen(text);
    lenb=strlen(pat);
    per[0]=0;
    int i=1,j=0,num=0;
    ///两个指针，i表示遍历子串当前的下标位置
    ///j表示从第一个开始到第几个相同
    while(i<lenb)///前缀表，从0开始的
    {
        if(pat[i]==pat[j])
        {
            per[i]=j+1;///第几个 j从0开始，下标少了1，j和我们所说的第几个相差1
            i++;j++;
        }
        else
        {
            if(j>=1)
                j=per[j-1];///第j+1个匹配不到，回溯到第j个已匹配的个数，和下一个比较

            else ///此时j=0进入while循环
            {
                per[i]=0;///连第一个公共前缀都不相同，那就直接置0了
                i++;
            }
        }
    }

    /*for(int k=0;k<lenb;k++)///打印验证前缀表
        printf("per[%d]=%d\n",k,per[k]);*/

    i=j=0;
    while(i<lena)
    {
        if(text[i]==pat[j])
        {
            i++;j++;
        }
        else
        {
            if(j==0) i++;///如果回溯到0，i要加1
            else j=per[j-1];
        }
        if(j==lenb)
        {
            num++;
            j=0;        ///不重复j=0，重复j=per[j-1]
        }
    }
    printf("%d\n",num);
}


int main()
{
    while(scanf("%s",text)!=EOF)
    {
        if(text[0]=='#') break;
        scanf("%s",pat);
        kmp();
    }
    return 0;
}

hdu2087

hdu1686

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0
AC代码：

#include<stdio.h>
#include<iostream>
#include<string>
#include<string.h>
#include<cstring>
using namespace std;

char text[1000005];
char pat[10005];
int per[10005];
int lena,lenb;

void kmp()
{
    lena=strlen(text);
    lenb=strlen(pat);
    per[0]=0;
    int i=1,j=0,num=0;

    while(i<lenb)///前缀表，从0开始的
    {
        if(pat[i]==pat[j])
        {
            per[i]=j+1;///第几个 下标少了1
            i++;j++;
        }
        else
        {
            if(j>=1) j=per[j-1];///第j+1个匹配不到，回溯到第j个已匹配的个数，和下一个比较
            else
            {
                per[i]=0;///j=0，连第一个公共前缀都不相同，那就直接置0了
                i++;
            }
        }
    }
    /*for(int k=0;k<lenb;k++)///打印验证前缀表
        printf("per[%d]=%d\n",k,per[k]);*/
    i=j=0;
    while(i<lena)
    {
        if(text[i]==pat[j])
        {
            i++;j++;
        }
        else
        {
            if(j==0) i++;///如果回溯到0，i要加1
            else j=per[j-1];   ///否则回溯到第j个已匹配的个数，和下一个比较
        }
        if(j==lenb)
        {
            num++;
            j=per[j-1];        ///不重复j=0，重复j=per[j-1]
        }
    }
    printf("%d\n",num);
}
int main()
{
    int t;
    scanf("%d",&t);
    getchar();
    while(t--)
    {
        gets(pat);
        gets(text);
        kmp();
    }
    return 0;
}

View Code

匹配模拟过程：

区分重叠和不重叠的字符匹配
text 0  1  2  3  4  5  6  7  8
     a  z  a  z  a  z  a
pat  0  1  2 
     a  z  a
per  0  0  1

不重叠的
i=0;j=0; if(text[0]=a == pat[0]=a) i++;j++;
i=1;j=1; if(text[1]=z == pat[1]=z) i++;j++;
i=2;j=2; if(text[2]=a == pat[2]=a) i++;j++; if(j==3) num=1;j=0;
i=3;j=0; if(text[3]=z == pat[0]=a) else j=0;i++;
i=4;j=0; if(text[4]=a == pat[0]=a) i++;j++;
i=5;j=1; if(text[5]=z == pat[1]=z) i++;j++;
i=6;j=2; if(text[6]=a == pat[2]=a) i++;j++; if(j==3) num=2;j=0;
break; 

重叠的
i=0;j=0; if(text[0]=a == pat[0]=a) i++;j++;
i=1;j=1; if(text[1]=z == pat[1]=z) i++;j++;
i=2;j=2; if(text[2]=a == pat[2]=a) i++;j++; if(j==3) num=1;j=per[2]=1;
///因为第三个字符比较后相同，i和j都自增了，回溯时候的per[j-1]=per[2]=1;
i=3;j=1; if(text[3]=z == pat[1]=z) i++;j++;
i=4;j=2; if(text[4]=a == pat[2]=a) i++;j++; if(j==3) num=2;j=per[2]=1;
i=5;j=1; if(text[5]=z == pat[1]=z) i++;j++;
i=6;j=2; if(text[6]=a == pat[2]=a) i++;j++j if(j==3) num=3;j=per[2]=1;
break;

View Code

posted @ 2018-08-22 02:36 守林鸟阅读(378) 评论(0) 编辑收藏举报

刷新页面返回顶部

守林鸟

kmp算法-字符串匹配

剪花布条

公告