Substring with Concatenation of All Words

https://leetcode.com/problems/substring-with-concatenation-of-all-words/

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given:
S: "barfoothefoobarman"
L: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

解题思路：

本题难度为Hard，通过率也仅有19.2%。看到题目后，感觉并没有想象的那么难么，好像是很常见的DFS题目，先写起来。

写的过程中遇到的难题就是，首先哪些参数是递归需要的，也就是说自状态需要哪些变量来确定？第二个有点绕的是，状态的确定问题。

大体的思路是这样的，从S的第一个char开始，遍历调用dfs看看能不能成为合格的startIndex。具体的dfs过程如下：

从startIndex开始，此时可能拓展的状态就是L中的所有String。于是，去L中去一个个搜，加入当前结果curentResult。剪枝的条件是，curentResult不是从startIndex开始的substring，或者加入L[i]后长度已经超过整个S了。如果L内元素全部遍历结束，就加入结果集。

因为题目要求L中每个元素只能使用一次，所以需要借助一个和L同样大小的int或者boolean数组visited来标识哪些元素被搜索过。

下面是代码。

public class Solution {
    public List<Integer> findSubstring(String S, String[] L) {
        List<Integer> result = new LinkedList<Integer>();
        StringBuffer currentResult = new StringBuffer();
        int[] visited = new int[L.length];
        int LLength = 0;
        for(int i = 0; i < L.length; i++){
            LLength += L[i].length();
        }
        int endIndex = S.length() - LLength;
        for(int i = 0; i <= endIndex; i++){
            dfs(result, currentResult, S, L, visited, 0, i);
        }
        return result;
    }
    
    public void dfs(List<Integer> result, StringBuffer currentResult, String S, String[] L, int[] visited, int step, int startIndex){
        if(step == L.length && S.substring(startIndex).indexOf(currentResult.toString()) == 0){
            result.add(startIndex);
            return;
        }
        
        int currentLength = currentResult.length();
        if(currentLength > S.length() - startIndex){
            return;
        }
        if(S.substring(startIndex).indexOf(currentResult.toString()) != 0){
            return;
        }
        
        for(int i = 0; i < L.length; i++){
            if(visited[i] == 1){
                continue;
            }
            visited[i] = 1;
            currentResult.append(L[i]);
            dfs(result, currentResult, S, L, visited, step + 1, startIndex);
            visited[i] = 0;
            currentResult = currentResult.delete(currentLength, currentResult.length());
        }
    }
}

但是出来TLE，超时了。该代码在IDE里调试结果是正确的，但是遇到大数据集的时候，确实很慢。我们分析一下时间复杂度。

从S开始，每个字符逐个遍历，需要n的时间复杂度。每个startIndex，都要去L里面看所有元素是否符合，要花L.length，假设为m的时间复杂度。而每个startIndex又要看m步是否都符合。所以总体复杂度为O(n*m*m)。

一开始就想到这样的方法，是因为DFS的重要性质。递推的时候，当前状态往后拓展的可能性，也就是图中的边，我是根据这个思路来做的。于是L中所有的元素就成为了拓展的边。可是这样做，时间复杂度太高了。

回头再看题目。题目有一个重要的条件，L中所有String的长度都是相等的，而且每个元素只能使用一次（但没有说L中没有重复元素）。为什么非要说L里所有元素长度都相等？这个条件一定可以被用来改进算法。

上面的方法，每次向下搜索，都要借助构造的currentResult来记录已经生成String的长度，用来判断后面的substring是不是在L中。可是现在L里所有元素都等长了，意味着我们可以仅仅用L[0]的长度，和当前已经搜索过元素的数量（step）就可以了！这样，连构造字符串的过程都可以略去。

这是一个重要的改进。再想，L[i]的长度都固定了，那么只要看startIndex + step * n往后，L[i]这个固定长度的字符串在不在L中，就行了！而不要在L里面遍历。这样用set或者map就可以了！省去m的时间复杂度。

上面的两个改进都是因为L[i].length()恒定这个重要的性质，可见审题非常重要。

但是，用set来记录L是不行的，因为L内可能有重复元素，否则会出现下述错误。

Input:	"abababab", ["a","b","a"]
Output:	[0,1,2,3,4,5]
Expected:	[0,2,4]

public class Solution {
    public List<Integer> findSubstring(String S, String[] L) {
        List<Integer> result = new LinkedList<Integer>();
        if(S.length() == 0 || L.length == 0){
            return result;
        }
        
        int LLength = 0;
        if(L.length != 0){
            LLength = L.length * L[0].length();
        }
        int endIndex = S.length() - LLength;
        
        Map<String, Integer> wordMap = new HashMap<String, Integer>();
        for(int i = 0; i < L.length; i++){
            if(!wordMap.containsKey(L[i])){
                wordMap.put(L[i], 1);
            }else{
                int count = wordMap.get(L[i]);
                wordMap.put(L[i], count + 1);
            }
        }
        
        for(int i = 0; i <= endIndex; i++){
            dfs(result, S, wordMap, L[0].length(), L.length, 0, i);
        }
        return result;
    }
    
    public void dfs(List<Integer> result, String S, Map<String, Integer> wordMap, int wordLength, int LLength, int step, int startIndex){
        if(step == LLength){    //不能step==wordMap.size()，因为wordMap.size() <= L.length()
            result.add(startIndex);
            return;
        }
        
        int currentLength = startIndex + step * wordLength;
        if(currentLength > S.length()){
            return;
        }

        String currentWord = S.substring(startIndex + step * wordLength, startIndex + step * wordLength + wordLength);
        if(!wordMap.containsKey(currentWord) || wordMap.get(currentWord) == 0){
            return;
        }
        
        wordMap.put(currentWord, wordMap.get(currentWord) - 1);
        dfs(result, S, wordMap, wordLength, LLength, step + 1, startIndex);
        wordMap.put(currentWord, wordMap.get(currentWord) + 1);
    }
}

上面的代码省去了每次都遍历L的过程，于是时间复杂度降低到O(nm)，但是用了另一个map来记录L。所以额外使用了O(m)的空间。

顺利AC。但是却使用了700ms的时间，看到前面还有一个大部队，基本都在300ms左右。肯定还有改进。

首先尝试将上面的代码改成迭代，发现代码要straightforward很多...都怪最近DFS程序写的太多了。

public class Solution {
    public List<Integer> findSubstring(String S, String[] L) {
        List<Integer> result = new LinkedList<Integer>();
        if(S.length() == 0 || L.length == 0){
            return result;
        }
        
        int LLength = 0;
        if(L.length != 0){
            LLength = L.length * L[0].length();
        }
        int endIndex = S.length() - LLength;
        
        Map<String, Integer> wordMap = new HashMap<String, Integer>();
        for(int i = 0; i < L.length; i++){
            if(!wordMap.containsKey(L[i])){
                wordMap.put(L[i], 1);
            }else{
                int count = wordMap.get(L[i]);
                wordMap.put(L[i], count + 1);
            }
        }
        
        for(int i = 0; i <= endIndex; i++){
            Map<String, Integer> thisWordMap = new HashMap<String, Integer>();
            boolean flag = true;
            for(int j = 0; j < L.length; j++){
                String currentWord = S.substring(i + L[0].length() * j, i + L[0].length() * j + L[0].length());
                if(!wordMap.containsKey(currentWord)){
                    flag = false;
                    break;
                }
                int count = 1;
                if(!thisWordMap.containsKey(currentWord)){
                    thisWordMap.put(currentWord, 1);
                }else{
                    count = thisWordMap.get(currentWord);
                    count++;
                    thisWordMap.put(currentWord, count);
                }
                if(count > wordMap.get(currentWord)){
                    flag = false;
                    break;
                }
            }
            if(flag){
                result.add(i);
            }
        }
        return result;
    }
}

上面的代码是AC的，但是仍然要花650ms，因为时间复杂度依然是O(n*m)。

最后推荐一个在Longest Substring Without Repeating Characters问题里提到的sliding window的方法，借鉴于http://blog.csdn.net/linhuanmars/article/details/20342851。它的原理学习起来开始很复杂，但是一旦理解就比较简单了。

比如对于例子

S: "barfoothefoobarman"
L: ["foo", "bar"]

之前的方法是从S的0-n个char，每次都检查L.length次。实际上，我们在i==0时，检查了bar foo the foo bar man，在i==3的时候，又要检查foo the foo bar man。你看，都重复了。可是有什么好办法吗？

sliding window是这样做的，因为L中有元素都等长这个重要的性质，可以将它们看成Longest Substring Without Repeating Characters问题里的char，将L的总长，也就是foobar作为一个窗口，那么就从每次检查一个char，或者跳过一个char，变成现在的每次检查一个长度为3的String，或者跳过一个长度为3的String。

那么，这个例子中就是bar foo the foo bar man。显然是不全面的，我们还要从第二个字符开始，检查arf oot hef oob arm，还有第三个字符开始，rfo oth efo oba rma。这样就可以了。可以看到外层只需要循环L[0].length()次就可以了。

那么内层的窗口如何滑动呢？让我们回忆一下Longest Substring Without Repeating Characters这个问题。每遇到一个重复字符串，在前面有下标index，左窗口就从index+1开始，右侧窗口继续后移遍历。这里能不能这样做？本题用了类似的方法，但更为复杂。

我们令当前遍历到的下标为j，左侧窗口的下标为start，维护两个词典HashMap：L的wordMap和S当前的thisWordMap。

如果当前word压根不在wordMap里，那么当前结果被整个舍弃，start从j + L[0].length()开始。同时清空重置thisWordMap。

如果当前word在wordMap里，看word在thisWordMap里出现过几次。

　　如果次数不比wordMap里的多，证明当前结果有效，thisWordMap的次数+1。

　　　　然后看看当前结果长度是不是和L的总长相等了？相等证明就是有效的结果了。start加入result，新的start从start+ L[0].length()开始。

同时更新thisWordMap，将start到start+L[0].length()的字符次数减一。

　　如果次数已经比wordMap里的多了，证明当前结果无效。先thisWordMap的次数+1，然后start以L[0].length()的步长一直往后找。

直到word在wordMap里的次数等于thisWordMap，同时不断将start到start+L[0].length()的字符次数减一。

public class Solution {
    public List<Integer> findSubstring(String S, String[] L) {
        List<Integer> result = new LinkedList<Integer>();
        if(S.length() == 0 || L.length == 0){
            return result;
        }
        
        int LLength = 0;
        if(L.length != 0){
            LLength = L.length * L[0].length();
        }
        int endIndex = S.length() - LLength;
        
        Map<String, Integer> wordMap = new HashMap<String, Integer>();
        for(int i = 0; i < L.length; i++){
            if(!wordMap.containsKey(L[i])){
                wordMap.put(L[i], 1);
            }else{
                int count = wordMap.get(L[i]);
                wordMap.put(L[i], count + 1);
            }
        }
        
        for(int i = 0; i < L[0].length(); i++){
            Map<String, Integer> thisWordMap = new HashMap<String, Integer>();
            int start = i;
            for(int j = i; j <= S.length() - L[0].length(); j += L[0].length()){
                String currentWord = S.substring(j, j + L[0].length());
                if(!wordMap.containsKey(currentWord)){
                    start = j + L[0].length();
                    thisWordMap.clear();
                }else{
                    if(!thisWordMap.containsKey(currentWord)){
                        thisWordMap.put(currentWord, 1);
                        if(j - start + L[0].length() == LLength){
                            result.add(start);
                            String temp = S.substring(start, start + L[0].length());  
                            thisWordMap.put(temp, thisWordMap.get(temp) - 1);
                            start += L[0].length();
                        }
                    }else{
                        int count = thisWordMap.get(currentWord);
                        count++;
                        thisWordMap.put(currentWord, count);
                        if(count > wordMap.get(currentWord)){
                            // String temp = S.substring(start, start + L[0].length());  
                            // thisWordMap.put(temp, thisWordMap.get(temp) - 1);
                            // start += L[0].length();
                            /*
                            必须是while，否则"aaabbbc", ["a","a","b","b","c"]
                            start跳到1，会认为aabbbc是正确答案，因为后面不知道b已经>2了
                            */
                            while(thisWordMap.get(currentWord) > wordMap.get(currentWord)){
                                String temp = S.substring(start, start + L[0].length());  
                                thisWordMap.put(temp, thisWordMap.get(temp) - 1);
                                start += L[0].length();
                            }
                        }else{
                            if(j - start + L[0].length() == LLength){
                                result.add(start);
                                String temp = S.substring(start, start + L[0].length());  
                                thisWordMap.put(temp, thisWordMap.get(temp) - 1);
                                start += L[0].length();
                            }
                        }
                    }
                }
            }
        }
        return result;
    }
}

用了sliding window的方法，我们看到对于S中的每个字符，都仅仅要检查一次。内层的while循环最坏要花费O(L)的时间，L为L[0].length()。平均复杂度为O(n)。

那么为什么这道题目可以使用sliding windows的方法？是不是使用上面提到的DFS和迭代可以解决的问题，都能用sliding window去做？答案是不是的。还是，因为这题的一个重要的性质，L内每个string长度相等，所以可以被当成一个char来看待。

审题应该仔细，思路应该敏感。这道题之所以Hard也就是因为这个原因，找出最后的最好的解法，还是不太容易的。

update 2015/05/29

二刷，用的上面的sliding window解法，更深的理解了。代码也清楚了些。

public class Solution {
    public List<Integer> findSubstring(String s, String[] words) {
        Map<String, Integer> wordMap = new HashMap<String, Integer>();
        for(String word : words) {
            if(wordMap.containsKey(word)) {
                wordMap.put(word, wordMap.get(word) + 1);
            } else {
                wordMap.put(word, 1);
            }
        }
        List<Integer> res = new ArrayList<Integer>();
        int wordLength = words[0].length();
        for(int i = 0; i < wordLength; i++) {
            int count = 0, start = i;
            Map<String, Integer> thisWordMap = new HashMap<String, Integer>();
            for(int j = i; j < s.length() - wordLength + 1; j = j + wordLength) {
                String thisWord = s.substring(j, j + wordLength);
                if(!wordMap.containsKey(thisWord)) {
                    count = 0;
                    start = j + wordLength;
                    thisWordMap.clear();
                } else if(thisWordMap.get(thisWord) == wordMap.get(thisWord)) {
                    count++;
                    thisWordMap.put(thisWord, thisWordMap.get(thisWord) + 1);
                    // 不是因为找到非words词，而是因为词数多而非法，就要从头往后删除word，一直到删除到超数的那个word为止
                    // 注意，上面的count++和wordMap操作不能忘记
                    String firstWord = s.substring(start, start + wordLength);
                    while(!firstWord.equals(thisWord) && start <= j) {
                        start += wordLength;
                        thisWordMap.put(firstWord, thisWordMap.get(firstWord) - 1);
                        count--;
                        firstWord = s.substring(start, start + wordLength);
                    }
                    thisWordMap.put(firstWord, thisWordMap.get(firstWord) - 1);
                    start = start + wordLength;
                    count--;
                } else {
                    count++;
                    if(thisWordMap.containsKey(thisWord)) {
                        thisWordMap.put(thisWord, thisWordMap.get(thisWord) + 1);
                    } else {
                        thisWordMap.put(thisWord, 1);
                    }
                    // 一个解的时候，start应该仅仅往后增加wordLength
                    if(count == words.length) {
                        res.add(start);
                        count--;
                        String firstWord = s.substring(start, start + wordLength);
                        thisWordMap.put(firstWord, thisWordMap.get(firstWord) - 1);
                        start = start + wordLength;
                    }
                }
            }
        }
        return res;
    }
}

posted on 2015-03-18 21:17 NickyYe 阅读(274) 评论(0) 编辑收藏举报

刷新页面返回顶部

Idiot-maker

公告