[学习笔记]KMP的简单记忆方法

模式串的匹配是从左向右进行的。需要辅助数组next

next数据是当当前字符匹配失败后，模式串应当前移的距离。母字符串指针不回溯。
模式串先对自身进行匹配,计算当前位置上对应的最长匹配前缀子串（当前模式串位置的前一个位置开始查找,也就是不包括自己)

原子串	a	b	a
前移一位，无匹配		a	b	a
前移二位，有匹配			a	b	a

将匹配个数写入下一个字符对应的next中。

src	a	b	a	a	b	c	a	c
Next0	-1
Cur0	*	无匹配前缀子串,在下一个位置的next中写入0
Next1	-1	0
Cur1		*	无匹配前缀子串,在下一个位置的next中写入0
Next2	-1	0	0
Cur2			a	有匹配前缀子串a下一位置next中写入1
Next3	-1	0	0	1
Cur3				a	匹配前缀子串a下一位置next写1
Next4	-1	0	0	1	1
Cur4				a	b	子串ab下一位置next为2
Next5	-1	0	0	1	1	2
Cur5						*	无匹配
Next6	-1	0	0	1	1	2	0
Cur6							a	1个匹配
Next7	-1	0	0	1	1	2	0	1

代码

int start = 0;
int match = -1;
next[0] = -1;
while( start < substr_length - 1 )
{
    if( match == -1 || substr[start] == substr[match] )
    {
        ++start;
        ++match;
        next[start] = match;
    }
    else
        match = next[match];//前移匹配
}

match表示为当前模式串位置前缀子串的匹配位置(加1就是已匹配个数)。

当substr[start]==substr[match]时表示找到匹配的前缀子串。

当match==-1时表示模式串的测试应当重新开始，也就是之前测试到的前缀子串不存在也就是前面例子中的*

改进版的next

简单来说就是模式串匹配时，测试下一个匹配字符是否与下一个字符相同，相同则直接使用前一个字符的next而不再转跳到前一字符。

这样可以使匹配失败时模式串直接前移到最前面的位置。

匹配过程

src	a	b	a	a	b	c	a	c
Next0	-1
Cur0	*	无匹配前缀子串,测试下一字符与下一匹配字符的关系b!=a,按原方法
Next1	-1	0
Cur1		*	无匹配前缀子串,测试下一字符与下一匹配字符的关系a==a,复制next
Next2	-1	0	-1
Cur2			a	有匹配前缀子串 a,测试下一字符与下一匹配字符的关系，a!=b所以按原next方法写入
Next3	-1	0	-1	1
Cur3				a	有匹配前缀子串a,测试下一字符与下一匹配字符的关系,b==b所以复制next
Next4	-1	0	-1	1	0
Cur4				a	b	有匹配前缀子串ab测试下一字符与下一匹配字符的关系c!=a所以按原next方法写入
Next5	-1	0	-1	1	0	2
Cur5						*	测试a==a,所以复制
Next6	-1	0	-1	1	0	2	-1
Cur6							a	c!=b原方法
Next7	-1	0	-1	1	0	2	-1	1

代码

int start = 0;
int match = -1;
next[0] = -1;
while( start < substr_length - 1 )
{
    if( match == -1 || substr[start] == substr[match] )
    {
        ++start;
        ++match;
        if( substr[start] != substr[match] )//无
            next[start] = match;
        else
            next[start] = next[match];//相等则使用直接使用出现的符的next
    }
    else
        match = next[match];//前移匹配
}

匹配过程非常简单，匹配下标为-1时(无已匹配字符,重新测试),二个指针都向前走一步,测试下一个字符(因为走完后的match==0,所以是重新开始测试,如果match!=0且data[start]==substr[match]则是当前字符测试成功,继续测试下一个模式字符)，失败的话模式串按next前移。

代码

start = 0;
match = -1;
while( start < data_length && match < substr_length )
{
    if( match == -1 || data[start] == substr[match] )
    {
        ++start;
        ++match;
    }
    else
        match = next[match];
}
if( match == substr_length )
    return start - match;
return -1;

posted on 2010-10-19 20:48 Swin.C 阅读(301) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Swin.C

[学习笔记]KMP的简单记忆方法

导航

公告