[学习笔记] Frequent pattern mining: current status and future directions (DMKD, 2007)
这篇是学习笔记, 摘录在这里只是为了方便自己查阅.
Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1): 55-86 (2007)
1. frequent pattern的种类
frequent itemsets: 无序
(frequent) sequential pattern: 有序
(frequent) structural pattern: 结构感知, 如子图, 子树等
2. 三种基本的frequent itemset mining方法
(1) Apriori principle
Apriori: a downward closure property, a k-itemset is frequent only if all of its sub-itemsets are frequent.
horizontal data format
(2) FP-growth
FP-tree: frequent pattern tree
horizontal data format
(3) eclat
vertical data format
3. closed frequent pattern
a pattern a is a closed frequent pattern in a data set in D if
(1) a is frequent
(2) there exists no proper super-pattern b such that b has the same support as a in D
maximal frequent pattern (max-pattern)
(1) a is frequent
(2) there exists no super-pattern b such that ab
(3) b is frequent in D
4. Sequential pattern mining
常见的几种算法
GSP: A Sequential Pattern Mining Algorithm Based on Candidate Generate-and-Test
SPADE: An Apriori-Based Vertical Data Format Sequential Pattern Mining Algorithm
PrefixSpan: Prefix-Projected Sequential Pattern Growth
性能比较: PrefixSpan > SPADE > GSP
当frequent subsequences得数量比较大时, 三个算法的速度都变慢.
5. frequent substructures mining
Apriori-based approach
"The search for frequent graphs starts with graphs of small "size", and proceeds in a bottom-up manner".
AGM, FSG are all of this kind
Pattern-growth approach
6. Mining interesting frequent patterns
Constraint-based mining: efficient mining only the patterns that satisfy user-specified constraints.
Categories of constraints:
succinct constraints anti-monotonic constraints monotonic constraints convertible constraints:
For constraint-based mining in the context of sequential pattern mining, refer to
Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. VLDB99
Pei J, Han J, Wang W (2002) Constraint-based sequential pattern mining in large databases. CIKM02
Memo
1. frequent substructures mining里关于DAG mining的材料