[LeetCode] 30. Substring with Concatenation of All Words 串联所有单词的子串

You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated substring in s is a substring that contains all the strings of any permutation of words concatenated.

For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated substring because it is not the concatenation of any permutation of words.

Return the starting indices of all the concatenated substrings in s. You can return the answer in any order.

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Since words.length == 2 and words[i].length == 3, the concatenated substring has to be of length 6.
The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.
The output order does not matter. Returning [9,0] is fine too.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: Since words.length == 4 and words[i].length == 4, the concatenated substring has to be of length 16.
There is no substring of length 16 in s that is equal to the concatenation of any permutation of words.
We return an empty array.

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation: Since words.length == 3 and words[i].length == 3, the concatenated substring has to be of length 9.
The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"] which is a permutation of words.
The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"] which is a permutation of words.
The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"] which is a permutation of words.

Constraints:

1 <= s.length <= 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 30
s and words[i] consist of lowercase English letters.

这道题让我们求串联所有单词的子串，就是说给定一个长字符串，再给定几个长度相同的单词，让找出串联给定所有单词的子串的起始位置，还是蛮有难度的一道题。假设 words 数组中有 cnt 个单词，每个单词的长度均为 len，那么实际上这道题就让我们出所有长度为 cnt x len 的子串，使得其刚好是由 words 数组中的所有单词组成。那么就需要经常判断s串中长度为 len 的子串是否是 words 中的单词，为了快速的判断，可以使用 HashMap，同时由于 words 数组可能有重复单词，就要用 HashMap 来建立所有的单词和其出现次数之间的映射，即统计每个单词出现的次数。

遍历s中所有长度为 cnt x len 的子串，当剩余子串的长度小于 cnt x len 时，就不用再判断了。所以i从0开始，到 n - cnt x len 结束就可以了，n为原字符串s的长度。对于每个遍历到的长度为 cnt x len 的子串，需要验证其是否刚好由 words 中所有的单词构成，检查方法就是每次取长度为 len 的子串，看其是否是 words 中的单词。为了方便比较，建立另一个 HashMap，当取出的单词不在 words 中，直接 break 掉，否则就将其在新的 HashMap 中的映射值加1，还要检测若其映射值超过原 HashMap 中的映射值，也 break 掉，因为就算当前单词在 words 中，但若其出现的次数超过 words 中的次数，还是不合题意的。在 for 循环外面，若j正好等于 cnt，说明检测的 cnt 个长度为 len 的子串都是 words 中的单词，并且刚好构成了 words，则将当前位置i加入结果 res 即可，具体参见代码如下（现在这种解法已经超时了，无法通过 OJ）：

解法一：

// Time Limit Exceeded
class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        if (s.empty() || words.empty()) return {};
        vector<int> res;
        int n = s.size(), cnt = words.size(), len = words[0].size();
        unordered_map<string, int> wordCnt;
        for (auto &word : words) ++wordCnt[word];
        for (int i = 0; i <= n - cnt * len; ++i) {
            unordered_map<string, int> strCnt;
            int j = 0; 
            for (j = 0; j < cnt; ++j) {
                string t = s.substr(i + j * len, len);
                if (!wordCnt.count(t)) break;
                ++strCnt[t];
                if (strCnt[t] > wordCnt[t]) break;
            }
            if (j == cnt) res.push_back(i);
        }
        return res;
    }
};

这道题还有一种 O(n) 时间复杂度的解法，设计思路非常巧妙，但是感觉很难想出来，博主目测还未到达这种水平。这种方法不再是一个字符一个字符的遍历，而是一个词一个词的遍历，比如根据题目中的例子，字符串s的长度n为 18，words 数组中有两个单词 (cnt=2)，每个单词的长度 len 均为3，那么遍历的顺序为 0，3，6，9，12，15，然后偏移一个字符 1，4，7，10，13，16，然后再偏移一个字符 2，5，8，11，14，17，这样就可以把所有情况都遍历到，还是先用一个 HashMap 来建立所有单词和其出现次数之间的映射，变量名为 wordMap。然后从0开始遍历，用 left 来记录左边界的位置，curCnt 表示当前已经匹配的单词的个数。然后一个单词一个单词的遍历，如果当前遍历的到的单词 word 在 wordMap 中存在，那么将其加入另一个 HashMap 中，变量名为 curMap。如果在 curMap 中word 的个数小于等于 wordMap 中的个数，那么 curCnt 自增1，如果大于了，则需要做一些处理，比如下面这种情况：s = barfoofoo, words = {bar, foo, abc}，给 words 中新加了一个 abc ，目的是为了遍历到 barfoo 不会停止，当遍历到第二 foo 的时候, curMap[foo]=2, 而此时 wordMap[foo]=1，这时候已经不连续了，所以要移动左边界 left 的位置，先把第一个词 bar 取出来，然后将 curMap[bar] 自减1，如果此时 curMap[bar] < wordMap[bar] 了，说明一个匹配没了，那么对应的 curCnt 也要自减1，然后左边界加上个 len，这样就可以了。如果某个时刻 curCnt 和 cnt 相等了，说明成功匹配了一个位置，将当前左边界 left 存入结果 res 中，此时去掉最左边的一个词，同时 curCnt 自减1，左边界右移 len，继续匹配。如果匹配到一个不在 wordMap 中的词，说明跟前面已经断开了，重置 curMap，curCnt 为0，左边界 left 移到 j+len，参见代码如下：

解法二：

class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        if (s.empty() || words.empty()) return {};
        vector<int> res;
        int n = s.size(), cnt = words.size(), len = words[0].size();
        unordered_map<string, int> wordMap;
        for (string word : words) ++wordMap[word];
        for (int i = 0; i < len; ++i) {
            int left = i, curCnt = 0;
            unordered_map<string, int> curMap;
            for (int j = i; j <= n - len; j += len) {
                string word = s.substr(j, len);
                if (wordMap.count(word)) {
                    ++curMap[word];
                    if (curMap[word] <= wordMap[word]) {
                        ++curCnt;
                    } else {
                        while (curMap[word] > wordMap[word]) {
                            string t = s.substr(left, len);
                            --curMap[t];
                            if (curMap[t] < wordMap[t]) --curCnt;
                            left += len;
                        }
                    }
                    if (curCnt == cnt) {
                        res.push_back(left);
                        --curMap[s.substr(left, len)];
                        --curCnt;
                        left += len;
                    }
                } else {
                    curMap.clear();
                    curCnt = 0;
                    left = j + len;
                }
            }
        }
        return res;
    }
};