AprioriTID algorithm

What is AprioriTID?

AprioriTID is an algorithm for discovering frequent itemsets (groups of items appearing frequently) in a transaction database. It was proposed by Agrawal & Srikant (1993).

AprioriTID is a variation of the Apriori algorithm. It was proposed in the same article as Apriori as an alternative implementation of Apriori. It produces the same output as Apriori. But it uses a different mechanism for counting the support of itemsets.

比较Apriori与AprioriTID如下：

数据结构方面：

在Apriori算法中，首先利用HashMap<Integer,Integer>存储每个项与其出现的次数之间的映射关系，取出频繁项构成List集合：frequent1. 将此List集合作为生成k=2时候选项的输入。　　

除了k=1外，其余k值的每个候选项存储在每个Itemset类的对象中，由List<Itemset>集合candidates统一存储。Itemset类中拥有存、取候选项，存储候选项支持度(support)的各函数。全部的频繁项集对象由List<Itemset>集合level存储。（level自然作为k>2时生成候选项函数的输入）

在AprioriTID算法中，用HashMap<Integer,Set<Integer>>存储每个项item与其出现的位置(transaction ID)之间的映射关系，从k=1时，直接将频繁项集存储在Itemset对象中(在对象中有集合存储TID)，并用List<Itemset>集合level存储各Itemset对象。Itemset类中增添了transaction ID集合，保存项集所对应的transaction ID。

在算法方面：

在AprioriTID算法中，当k>=2时，依旧通过we compare items of itemset1 and itemset2.If they have all the same k-1 items and the last item of itemset1 is smaller than the last item of itemset2, we will combine them to generate a candidate来生成候选项集。查看结合在一起的候选集的共同的tid(common tids)，当common tids中元素个数满足minsup则结合在一起的候选集为频繁项，(相比apriori效率提高了一些，apriori是将候选项不断与transaction作比较，计算各候选项支持度)保存频繁项和其对应的common tids到Itemset对象中，统一由List<Itemset>集合candidates存储，通过saveItemset()函数保存频繁项集之后，candidates作为下一次计算k+1时频繁项的输入。　　

posted on 2014-10-22 21:27 cnblogs_man 阅读(714) 评论(0) 编辑收藏举报