[LeetCode] 30. Substring with Concatenation of All Words ☆☆☆

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

For example, given:
s: "barfoothefoobarman"
words: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

解法1：

　　题目要求在给定一个字符串s和一个字符串数组words（有m个字符串，长度均为n）的条件下，找出s中所有“由words中全部字符串按任意顺序串联而成”的子串。

　　需要用到两个哈希表，第一个存储words数组，key为每一个word，value为出现的次数（考虑到数组中可能会有相同的字符串）；第二个哈希表用于存储当前遍历到的子串，value为已出现次数。从s的第一个字符开始遍历，按顺序找到长度为n的子串part，判断part是否在第一个哈希表中，以及当前已出现次数是否超过words数组中的数量，不满足条件的跳出循环并从s的下一个字符继续寻找。

　　时间复杂度为 O(n²)。

public class Solution {
    public List<Integer> findSubstring(String s, String[] words) {
        List<Integer> res = new ArrayList<>();
        if (s == null || words == null || words.length == 0) {
            return res;
        }
        
        HashMap<String, Integer> toFind = new HashMap<>();
        for (String word : words) {
            Integer k = toFind.containsKey(word) ? toFind.get(word) + 1 : 1;
            toFind.put(word, k);
        }
        
        int m = words.length, n = words[0].length();
        HashMap<String, Integer> found = new HashMap<>();
        for (int i = 0; i <= s.length() - m * n; i++) {
            found.clear();
            int j = i;
            for (; j < i + m * n; j += n) {
                String part = s.substring(j, j + n);
                Integer a = toFind.get(part);
                Integer b = found.get(part);
                if (a == null || (b != null && a <= b)) {
                    break;
                }
                found.put(part, b == null ? 1 : ++b);
            }
            if (j == i + m * n) {
                res.add(i);
            }
        }
        
        return res;
    }
}

解法2:

　　仔细研究上面的解法，会发现其中有很多多余的步骤。例如，s="abcdefghijklmn"，n=3：第一次遍历从'a'开始，需要遍历"abc", "edf", "ghi"....；而第4次比较从'd'开始，又需要遍历"def", "ghi"....

　　可以改变上面的思路，从一个字符一个字符遍历，变成一个单词一个单词遍历。将s按照word的长度n进行划分，如s长度为18，m=2，n=3，那么遍历的顺序为：第一次（0，3，6，9，12，15），第二次（1，4，7，10，13，16），第三次（2，5，8，11，14，17）。

　　首先还是先将words中的单词存到哈希表toFind中。然后从0开始遍历，用left记录当前串联起来的子串的起始位置，curr表示当前的子串（长为n），哈希表found纪录目前串联子串中的每一个word情况。遍历时分以下几种情况：

curr在toFind中不存在，此时匹配中断，前面从left开始匹配到的都失效。因此，清空found，从下一个子串（curr+n）继续。
curr在toFind中存在，但在found中不存在，说明前面没有匹配到。因此，将curr记入found中，然后从下一个子串继续。
curr在toFind和found中都存在，并且found中的数量小于toFind中的数量。将found中的currz数量加一，然后从下一个子串继续。
curr在toFind和found中都存在，并且两者数量相同，此时再将curr记入found就超过数量限制了。因此，需要从left开始，找到最早出现的同curr相同的子串（设为dupl），并把dupl同之前的子串在found中的纪录删除，然后从left移到dupl的下一个子串，然后curr从curr的下一个子串继续。

　　每次curr操作结束后，比较当前串联子串的长度是否与与words中所有word长度之和相等，相等则说明当前串联子串匹配成功，将left记入res中，然后从left的下一个子串继续进行。。。。

　　当 i = 0 遍历完成时，清空found，然后从 i = 1 进行下一次循环。

　　这个算法实现难度比较大，但是整体的时间复杂度为 O(n)。

public class Solution {
    public List<Integer> findSubstring(String s, String[] words) {
        List<Integer> res = new ArrayList<>();
        if (s == null || words == null || words.length == 0) {
            return res;
        }
        
        HashMap<String, Integer> toFind = new HashMap<>();
        for (String word : words) {
            Integer k = toFind.containsKey(word) ? toFind.get(word) + 1 : 1;
            toFind.put(word, k);
        }
        
        int m = words.length, n = words[0].length();
        HashMap<String, Integer> found = new HashMap<>();
        for (int first = 0; first < n; first++) {
            found.clear();
            int left = first;  // 当前串联字符串的起始位置
            for (int curr = first; curr < s.length() && left <= s.length() - m * n; curr += n) {
                String part = s.substring(curr, curr + n);
                if (!toFind.containsKey(part)) { // 如果part不存在，清空found并从下一个位置开始
                    found.clear();
                    left = curr + n;
                    continue;
                }
                
                if (!found.containsKey(part)) {
                    found.put(part, 1);
                } else if (found.get(part) < toFind.get(part)) {
                    found.put(part, found.get(part) + 1);
                } else { 
                    // found和toFind中part数量相同，再添加就超出了，因此需要从left开始，
                    // 找到第一个part相同的子串，将其与之前的子串删去，再将其下一个字符串为起点。
                    while (!part.equals(s.substring(left, left + n))) {
                        Integer x = found.get(s.substring(left, left + n));
                        found.put(s.substring(left, left + n), x - 1);
                        left += n;
                    }
                    // 此处需要从found中删除left，再添加curr，但两者相同，故可抵消
                    left += n;
                }
                
                // 如果当前串联子串的长度与数组长度相同，说明成功匹配了一个，
                // 将其记录到res后，从left的下一个子串再继续
                if (curr - left == (m - 1) * n) {
                    res.add(left);
                    Integer y = found.get(s.substring(left, left + n));
                    found.put(s.substring(left, left + n), y - 1);
                    left += n;
                }
            }
        }
        
        return res;
    }
}

posted @ 2017-02-21 13:22 Strugglion 阅读(316) 评论(0) 收藏举报

刷新页面返回顶部

Strugglion

[LeetCode] 30. Substring with Concatenation of All Words ☆☆☆

公告