leetcode 28 实现strStr() [KMP]

暴力匹配就不说了，说说KMP。

KMP算法思想：

第一步：根据模式串(needle)得出next数组。
第二步：匹配主串，但在匹配主串时，遇到冲突位置时，不是立即和暴力匹配一样直接从头匹配。而是查看next数组，得到一个回退位置，从此处再开始匹配。
重复第二步，直至匹配成功或者匹配失败。

next数组：

这里我选择next数组为最初始状态，不做调整(例如全部-1或者右移一个补-1)

长度为needle.size()。
next[i]表示从next[0]~next[i]构成的字符串中最长前后公共子串的长度，此外，这个这个长度刚好是前公共子串最后一个位置的后一个index。也就是如果此处匹配失败，回退的目标index就是这个index。
特别的，next[0] = 0；

具体求next

void creatNextArray(string str, vector<int>& next)
{
	int n = str.size();
	next[0] = 0;
	int j = 0;//前缀末尾index
	int i;//后缀末尾index
	for (i = 1; i < n; ++i)
	{
		while (j > 0 && str[i] != str[j])
		{
			j = next[j - 1];
		}
		if (str[i] == str[j])
		{
			++j;//使j为公共子串最后一个位置的后一个index，恰好也是子串的长度
		}
		next[i] = j;
	}
}

前缀末尾从0开始，后缀末尾从1开始，遍历整个needle,当i和j指向的值不等，那么需要j回退至next[j-1]（注意，这里的next未作特殊处理，即跳转的位置需查前一个位置的next值。），并且需要重复此操作。随后判断是否匹配上，匹配上则把j右移，填上next[i]。

匹配

逐一匹配，匹配，则往下走，冲突，则让needle的index回退到next[index-1]。
重复上述操作，直至匹配完毕。

int strStr(string haystack, string needle) {
        int n = haystack.size();
        int m = needle.size();

        if(m==0) return 0;	//模式字符串为空

        vector<int> next(m);
        createNext(needle,next);

        int j = 0;
        for(int i = 0; i < n; ++i)
        {
            while(j>0&&needle[j] != haystack[i])
            {
                j = next[j-1];
            }
            if(needle[j]==haystack[i])
            {
                ++j;
            }
            if(j==m) return i-m+1;//haystack[i]刚好匹配上了needle最后一个，i-m+1就是匹配needle[0]的那个index
        }
        return -1;//没找到
    }

值得一提的时，KMP算法会在特殊情况下，效率急剧下降。

KMP主要靠回退减少匹配次数，但如果回退次数非常多，即回退后发现还是不匹配，又回退，特别是当回退时因为匹配同一个字符而导致回退，我们明知道这个字符匹配不上，回退之后又是这个字符，那应该直接跳过它，而不是又匹配一次。

因此优化的方式为，在计算next数组时，如果当前索引对应的字符和回退之后的索引的字符相同，那么直接将当前索引的next值，置为回退之后索引的next值，快速回跳。

以下为整体代码

#include<iostream>
#include<vector>
using namespace std;

void creatNextArray(string str, vector<int>& next)
{
	int n = str.size();
	next[0] = 0;
	int j = 0;//前缀末尾index
	int i;//后缀末尾index
	for (i = 1; i < n; ++i)
	{
		while (j > 0 && str[i] != str[j])
		{
			j = next[j - 1];
		}
		if (str[i] == str[j])
		{
			++j;
		}
		//下方注释为优化做法
		//next数组的决策不同，实现不同，优化的代码就会有不同，理解如何优化即可
		/*if (i < n - 1 && j>0 && str[i + 1] == str[next[j - 1]]) next[i] = next[j - 1];
		else next[i] = j;*/
		next[i] = j;
	}
}

int main()
{
	string a("ababababababc");
	string b("abababc");
	int n = b.size();
	int m = a.size();
	vector<int> next(n);
	creatNextArray(b, next);
	int j = 0;
	int ans = -1;
	for (int i = 0; i < m; ++i)
	{
		while (j > 0 && a[i] != b[j])
		{
			j = next[j - 1];
		}
		if (a[i] == b[j])
		{
			++j;
		}
		if (j == n)
		{
			ans = (i - n + 1);
		}
	}
	cout << ans<<endl;
	return 0;
}

实测效率上，二者差不太多。查看next数组，优化后的next也确实有了回跳的对应数据。可能一个是数据比较小(虽然也试过达到5000+的字符)，一个是即使重复回跳，但由于仅仅只有一个赋值操作，甚至编译器可能会进行优化，导致效率差距不大。

posted @ 2021-04-20 15:05 抚琴思伯牙阅读(153) 评论(0) 收藏举报

刷新页面返回顶部

抚琴思伯牙

leetcode 28 实现strStr() [KMP]

KMP算法思想：

next数组：

匹配

公告