Apriori算法实例----Weka,R, Using Weka in my javacode

学习数据挖掘工具中,下面使用4种工具来对同一个数据集进行研究。

数据描述:下面这些数据是15个同学选修课程情况,在课程大纲中共有10门课程供学生选择,下面给出具体的选课情况,以ARFF数据文件保存,名称为TestStudenti.arff。我使用Apriori算法期望挖掘出学生选课的关联规则。

@relation test_studenti


@attribute Arbori_binari_de_cautare {TRUE, FALSE}
@attribute Arbori_optimali {TRUE, FALSE}
@attribute Arbori_echilibrati_in_inaltime {TRUE, FALSE}
@attribute Arbori_Splay {TRUE, FALSE}
@attribute Arbori_rosu_negru {TRUE, FALSE}
@attribute Arbori_2_3 {TRUE, FALSE}
@attribute Arbori_B {TRUE, FALSE}
@attribute Arbori_TRIE {TRUE, FALSE}
@attribute Sortare_topologica {TRUE, FALSE}
@attribute Algoritmul_Dijkstra {TRUE, FALSE}


@data
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE
TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE
FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE
FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE
FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,FALSE
FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE
TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE
TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE
TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE

(一) Weka 使用实例

在Apriori算法中,设置minSupprot=50%, 最小置信度 minimum confidence 也设置为50%。Weka配置路径为 Explore-》Openfile(TestStudenti.arff)->Associate 点击配置参数信息

 

在算法完成之后,我们得到以下结果:

Best rules found:

1. Sortare_topologica=FALSE 13 ==> Arbori_TRIE=TRUE 13 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
2. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE 11 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
3. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
4. Arbori_optimali=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
5. Arbori_echilibrati_in_inaltime=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
6. Arbori_optimali=TRUE Sortare_topologica=FALSE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
7. Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
8. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
9. Arbori_binari_de_cautare=TRUE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
10. Arbori_B=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
11. Arbori_rosu_negru=TRUE Sortare_topologica=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
12. Arbori_TRIE=TRUE 15 ==> Sortare_topologica=FALSE 13 <conf:(0.87)> lift:(1) lev:(0) [0] conv:(0.67)
13. Arbori_rosu_negru=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
14. Arbori_rosu_negru=TRUE Arbori_TRIE=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
15. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
16. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
17. Arbori_TRIE=TRUE Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
18. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
19. Arbori_TRIE=TRUE 15 ==> Arbori_rosu_negru=TRUE 11 <conf:(0.73)> lift:(1) lev:(0) [0] conv:(0.8)
20. Sortare_topologica=FALSE 13 ==> Arbori_rosu_negru=TRUE 9 <conf:(0.69)> lift:(0.94) lev:(-0.04) [0] conv:(0.69)

分析第一条结果,我们可以得出关联规则: 如果一个学生不参加Sortare topologica 课程,那么他的一个趋向是肯定不会参加 Arbori TRIE课程。这条关联规则的置信度是100%,是非常可信的。

(二) Using Weka in my Javacode 

展示Java代码,运行程序可以得到和上面一样的结果

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import weka.associations.Apriori;
import weka.core.Instances;
public class Main{
  public static void main(String[] args) {
    Instances data = null;
    try {
    BufferedReader reader = new BufferedReader( new FileReader( "TestStudenti.arff" ) );
    data = new Instances(reader);
    reader.close();
    data.setClassIndex(data.numAttributes() - 1);
  }
  catch ( IOException e ) {
    e.printStackTrace();
  }
  double deltaValue = 0.05;
  double lowerBoundMinSupportValue = 0.1;
  double minMetricValue = 0.5;
  int numRulesValue = 20;
  double upperBoundMinSupportValue = 1.0;
  String resultapriori;
  Apriori apriori = new Apriori();
  apriori.setDelta(deltaValue);
  apriori.setLowerBoundMinSupport(lowerBoundMinSupportValue);
  apriori.setNumRules(numRulesValue);
  apriori.setUpperBoundMinSupport(upperBoundMinSupportValue);
  apriori.setMinMetric(minMetricValue);
  try{
    apriori.buildAssociations( data );
  }
  catch ( Exception e ) {
    e.printStackTrace();
  }
  resultapriori = apriori.toString();
  System.out.println(resultapriori);

  }

}

(三) Using Weka in R

程序很简单,仅仅三行代码搞定。

library(RWeka);
data <- read.arff("D:/test.studenti.arff");
Apriori(data,control=Weka_control(N=20,T =0,C =0.5,D =0.05, U= 1.0,M =0.5, S =-1.0, c =-1))

运行结果:

Apriori
=======

Minimum support: 0.6 (9 instances)
Minimum metric <confidence>: 0.5
Number of cycles performed: 8

Generated sets of large itemsets:

Size of set of large itemsets L(1): 7

Size of set of large itemsets L(2): 8

Size of set of large itemsets L(3): 2

Best rules found:

1. Sortare_topologica=FALSE 13 ==> Arbori_TRIE=TRUE 13 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
2. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE 11 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
3. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
4. Arbori_optimali=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
5. Arbori_echilibrati_in_inaltime=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
6. Arbori_optimali=TRUE Sortare_topologica=FALSE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
7. Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
8. Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
9. Arbori_binari_de_cautare=TRUE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
10. Arbori_B=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
11. Arbori_rosu_negru=TRUE Sortare_topologica=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
12. Arbori_TRIE=TRUE 15 ==> Sortare_topologica=FALSE 13 <conf:(0.87)> lift:(1) lev:(0) [0] conv:(0.67)
13. Arbori_rosu_negru=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
14. Arbori_rosu_negru=TRUE Arbori_TRIE=TRUE 11 ==> Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
15. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0] conv:(0.49)
16. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
17. Arbori_TRIE=TRUE Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
18. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE Arbori_TRIE=TRUE 10 <conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
19. Arbori_TRIE=TRUE 15 ==> Arbori_rosu_negru=TRUE 11 <conf:(0.73)> lift:(1) lev:(0) [0] conv:(0.8)
20. Sortare_topologica=FALSE 13 ==> Arbori_rosu_negru=TRUE 9 <conf:(0.69)> lift:(0.94) lev:(-0.04) [0] conv:(0.69)

posted @ 2014-02-16 15:31  愚人_同乐  阅读(2384)  评论(0编辑  收藏  举报