[LeetCode] 249. Group Shifted Strings 群组偏移字符串

Given a string, we can "shift" each of its letter to its successive letter, for example: "abc" -> "bcd". We can keep "shifting" which forms the sequence:

"abc" -> "bcd" -> ... -> "xyz"

Given a list of strings which contains only lowercase alphabets, group all strings that belong to the same shifting sequence.

For example, given: ["abc", "bcd", "acef", "xyz", "az", "ba", "a", "z"],
Return:

[
  ["abc","bcd","xyz"],
  ["az","ba"],
  ["acef"],
  ["a","z"]
]

Note: For the return value, each inner list's elements must follow the lexicographic order.

这道题让我们重组偏移字符串，所谓偏移字符串，就是一个字符串的每个字符按照字母顺序表偏移相同量得到的另一个字符串，两者互为偏移字符串，注意相同字符串是偏移字符串的一种特殊情况，因为偏移量为0。现在给了一堆长度不同的字符串，让把互为偏移字符串的归并到一起，博主最开始想的是建立字符度和该长度的所有偏移字符串的映射，但是很明显的错误是相同长度的不一定都是偏移字符串，比如 'ab' 和 'ba'，所以只能用 HashMap 来建立一个字符串和所有和此字符串是偏移字符串的集合之间的映射，由于题目要求结果是按字母顺序的，所以用 multiset 来保存结果，一来可以保存重复字符串，二来可以自动排序。然后博主还写了一个判断二个字符串是否互为偏移字符串的函数，注意在比较两个字母距离时采用了加 26，再对 26 取余的 trick。遍历给定字符串集，对于遍历到的字符串，再遍历哈希表，和每个关键字调用 isShifted 函数来比较，如果互为偏移字符串，则加入其对应的字符串集，并标记 flag，最后遍历完 HashMap，没有跟任何关键字互为偏移，那么就新建一个映射，最后要做的就是把 multiset 转换为 vector 即可，参见代码如下：

解法一：

// Correct but complicated
class Solution {
public:
    vector<vector<string>> groupStrings(vector<string>& strings) {
        vector<vector<string> > res;
        unordered_map<string, multiset<string>> m;
        for (auto a : strings) {
            bool b = false;
            for (auto it = m.begin(); it != m.end(); ++it) {
                if (isShifted(it->first, a)) {
                    it->second.insert(a);
                    b = true;
                }
            }
            if (!b) m[a] = {a};
        }
        for (auto it = m.begin(); it != m.end(); ++it) {
            res.push_back(vector<string>(it->second.begin(), it->second.end()));
        }
        return res;
    }
    bool isShifted(string s1, string s2) {
        if (s1.size() != s2.size()) return false;
        int diff = (s1[0] + 26 - s2[0]) % 26;
        for (int i = 1; i < s1.size(); ++i) {
            if ((s1[i] + 26 - s2[i]) % 26 != diff) return false;
        }
        return true;
    }
};

上面那个方法挺复杂的，其实有更好的方法，网友的智慧无穷啊，上面那个方法的不高效之处在于对于每个遍历到的字符串，都要和 HashMap 中所有的关键字都比较一次，而其实我们可以更加巧妙的利用偏移字符串的特点，那就是字符串的每个字母和首字符的相对距离都是相等的，比如 abc 和 efg 互为偏移，对于 abc 来说，b和a的距离是1，c和a的距离是2，对于 efg 来说，f和e的距离是1，g和e的距离是2。再来看一个例子，az 和 yx，z和a的距离是 25，x和y的距离也是 25 (直接相减是 -1，这就是要加 26 然后取余的原因)，那么这样的话，所有互为偏移的字符串都有个 unique 的距离差，根据这个来建立映射就可以很好的进行单词分组了，这个思路真实太赞了，参见代码如下：

解法二：

class Solution {
public:
    vector<vector<string>> groupStrings(vector<string>& strings) {
        vector<vector<string> > res;
        unordered_map<string, multiset<string>> m;
        for (auto a : strings) {
            string t = "";
            for (char c : a) {
                t += to_string((c + 26 - a[0]) % 26) + ",";
            }
            m[t].insert(a);
        }
        for (auto it = m.begin(); it != m.end(); ++it) {
            res.push_back(vector<string>(it->second.begin(), it->second.end()));
        }
        return res;
    }
};