Trie
Trie又叫字典树,前缀树等,是一个高效的信息检索数据结构,专门处理字符串匹配。查找和插入字符串的时间复杂都为O(M),M为字符串的长度,空间复杂度为O(ALPHABET_SIZE * key_length * N),N为keys的个数。Trie用空间换时间,利用共同前缀来提高查找效率。
Trie的三个特点:
- 根节点不包含字符,除根节点外每一个节点都只包含一个字符
- 从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串
- 每个节点的所有子节点包含的字符都不相同
Trie的应用:
- 字符串检索
- 词频统计
- 字符串排序
- 前缀匹配
C++实现:
// C++ implementation of search and insert // operations on Trie #include <bits/stdc++.h> using namespace std; const int ALPHABET_SIZE = 26; // trie node struct TrieNode { struct TrieNode *children[ALPHABET_SIZE]; // isEndOfWord is true if the node represents // end of a word bool isEndOfWord; }; // Returns new trie node (initialized to NULLs) struct TrieNode *getNode(void) { struct TrieNode *pNode = new TrieNode; pNode->isEndOfWord = false; for (int i = 0; i < ALPHABET_SIZE; i++) pNode->children[i] = NULL; return pNode; } // If not present, inserts key into trie // If the key is prefix of trie node, just // marks leaf node void insert(struct TrieNode *root, string key) { struct TrieNode *pCrawl = root; for (int i = 0; i < key.length(); i++) { int index = key[i] - 'a'; if (!pCrawl->children[index]) pCrawl->children[index] = getNode(); pCrawl = pCrawl->children[index]; } // mark last node as leaf pCrawl->isEndOfWord = true; } // Returns true if key presents in trie, else // false bool search(struct TrieNode *root, string key) { struct TrieNode *pCrawl = root; for (int i = 0; i < key.length(); i++) { int index = key[i] - 'a'; if (!pCrawl->children[index]) return false; pCrawl = pCrawl->children[index]; } return (pCrawl != NULL && pCrawl->isEndOfWord); } // Driver int main() { // Input keys (use only 'a' through 'z' // and lower case) string keys[] = {"the", "a", "there", "answer", "any", "by", "bye", "their" }; int n = sizeof(keys)/sizeof(keys[0]); struct TrieNode *root = getNode(); // Construct trie for (int i = 0; i < n; i++) insert(root, keys[i]); // Search for different keys search(root, "the")? cout << "Yes\n" : cout << "No\n"; search(root, "these")? cout << "Yes\n" : cout << "No\n"; return 0; }