leetcode刷题：我好像戳到奇妙的知识点了——Trie树

前言

3月份Leetcode发起了"每日一题"的打卡活动，本人参加过一次之后，就觉得这活动真有意思，然后就忘了，直到活动快结束了才想起来。今天戳的题目是：

题目是这样的，

给定一个单词列表，我们将这个列表编码成一个索引字符串 S 与一个索引列表 A。

例如，如果这个列表是 [“time”, “me”, “bell”]，我们就可以将其表示为 S = “time#bell#” 和 indexes = [0, 2, 5]。

对于每一个索引，我们可以通过从字符串 S 中索引的位置开始读取字符串，直到 “#” 结束，来恢复我们之前的单词列表。

那么成功对给定单词列表进行编码的最小字符串长度是多少呢？

示例：

输入: words = [“time”, “me”, “bell”]
输出: 10
说明: S = “time#bell#” ， indexes = [0, 2, 5] 。

简单说，就是将一堆字符串压缩成一个字符串，其中的"#"表示单词之间的结束符。题目中的"time#"是：time和me的压缩结果。

Leetcode官方给出了两种解法，这里主要介绍“字典树（Trie）”的解法。在解决这道题之前，我们先看看什么是Trie树。

什么是Trie树

Trie的介绍来自于百度百科：

又称单词查找树，Trie树，是一种树形结构，是一种哈希树的变种。典型应用是用于统计，排序和保存大量的字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本词频统计。它的优点是：利用字符串的公共前缀来减少查询时间，最大限度地减少无谓的字符串比较，查询效率比哈希树高。

举个栗子——自动补全：
在这里插入图片描述
也就是说，Trie树可以利用字符串的公共前缀来构成树结构，并由此减少查询时间。

其实将Trie树的结构画出来，应该是这样：
在这里插入图片描述
当然，中文的处理会复杂很多，在此用小写英文来演示Trie树的定义。以下代码出自Leetcode：208. 实现 Trie (前缀树)

class Trie {

    private TrieNode root;

    /** Initialize your data structure here. */
    public Trie() {
        root = new TrieNode();
    }
    
    /** Inserts a word into the trie. */
    public void insert(String word) {
        TrieNode node = root;
        for(int i=0; i<word.length(); i++){
            char currentChar = word.charAt(i);
            if(!node.containsKey(currentChar)){
                node.put(currentChar, new TrieNode());
            }
            node = node.get(currentChar);
        }
        node.setEnd();
    }
    
    /** Returns if the word is in the trie. */
    public boolean search(String word) {
        TrieNode node = searchPrefix(word);
        return node != null && node.isEnd();
    }

    private TrieNode searchPrefix(String word){
        TrieNode node = root;
        for(int i=0; i<word.length(); i++){
            char ch = word.charAt(i);
            if(node.containsKey(ch)){
                node = node.get(ch);
            }else{
                return null;
            }
        }
        return node;
    }
    
    /** Returns if there is any word in the trie that starts with the given prefix. */
    public boolean startsWith(String prefix) {
        TrieNode node = searchPrefix(prefix);
        return node!=null;
    }

    private class TrieNode{
        private TrieNode[] links;

        private final int R = 26;

        private boolean isEnd;

        public TrieNode(){
            links = new TrieNode[R];
        }

        public boolean containsKey(char ch){
            return links[ch-'a'] != null;
        }

        public TrieNode get(char ch){
            return links[ch-'a'];
        }

        public void put(char ch, TrieNode node){
            links[ch-'a'] = node;
        }

        public void setEnd(){
            isEnd = true;
        }

        public boolean isEnd(){
            return isEnd;
        }

    }
}

个人认为，Trie树是用于解决字符串前缀或后缀相关的问题，所以只要掌握住两个核心点基本可以解决Trie树相关的问题：TrieNode的结构构成，以及Trie树是用来解决什么问题。

820.单词的压缩编码

上文所提到的单词的压缩编码这个问题。解决这个目的的关键是：先录入较长的字符串，然后较短的字符串再进行比较。根据这个思路，即使没有用到Trie树其实也可以做出来：

class Solution {
    public int minimumLengthEncoding(String[] words) {
        Set<String> good = new HashSet<>(Arrays.asList(words));
        for(String w : words){
            for(int i=1; i<w.length(); i++){
                good.remove(w.substring(i));
            }
        }
        int ans = 0;
        for(String g : good){
            ans += g.length() + 1;
        }
        return ans;
    }
}

可以看到，在没有用到Trie树时，可以使用集合good将用于压缩的字符串装起来，而甄选字符串的过程则需要将每一个单词后缀的所有可能进行比较。

而Trie树因可以在寻找是否有匹配的字符串过程中，就对需要压缩的字符串进行筛选了：

class Solution {
    public int minimumLengthEncoding(String[] words) {
        int len = 0;
        Trie trie = new Trie();
        //先对单词列表根据长度从长到短进行排序
        Arrays.sort(words, (s1, s2)->s2.length()-s1.length());
        //单词插入trie，返回该单词增加的编码长度
        for(String word: words){
            len += trie.insert(word);
        }
        return len;
    }
    
}
class Trie{
    private TrieNode root;
    public Trie(){
        root = new TrieNode();
    }
    public int insert(String word){
        TrieNode node = root;
        boolean isNew = false;
        for(int i=word.length()-1; i>=0; i--){
            char c = word.charAt(i);
            //如果这个单词是个新单词，则记为需压缩字符串
            if(node.links[c-'a'] == null){
                node.links[c-'a'] = new TrieNode();
                isNew = true;
            }
            node = node.links[c-'a'];
        }
        return isNew?word.length()+1:0;
    }
}
class TrieNode{
    TrieNode[] links = new TrieNode[26];
    public TrieNode(){
    }
}

（本题的题解参考：99% Trie 吐血攻略，包教包会）

除此之外，关于Trie树的题目还有：

211. 添加与搜索单词 - 数据结构设计

结语

对于大佬们来说Trie树简直不算什么，但笔者乃算法小白一枚，若文章有不正之处，或难以理解的地方，请多多谅解，欢迎指正。

如果本文对你的学习有帮助，请给一个赞吧，这会是我最大的动力~

posted @ 2020-03-29 13:30 NYfor2018 阅读(204) 评论(0) 编辑收藏举报

刷新页面返回顶部

NYfor2018

leetcode刷题：我好像戳到奇妙的知识点了——Trie树

前言

什么是Trie树

820.单词的压缩编码

结语

公告