C50和机器学习

现有一份数据集,包含专家对于是否可以使用隐形眼镜的诊断记录(来自《数据挖掘》),尝试用R语言实现规则的提取。

构造

> spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",3)),3))
> age = factor(c(rep("young",8),rep("pre-presbyopic",8),rep("presbyopic",8)))
> spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",4)),3))
> astimatism = factor(rep(c("no","no","yes","yes"),6))
> tear = factor(rep(c("reduced","normal"),12))
> recommended = factor(c("none","soft","none","hard","none","soft","none","hard","none",
                       "soft","none","hard","none","soft","none","none","none","none",
                        "none","hard","none","soft","none","none"))
> df <- data.frame(age,spectacle,astimatism,tear,recommended)

规则产生

> model <- rpart(formula = recommended ~.,data = df2)
> summary(model)
Call:
rpart(formula = recommended ~ ., data = df2)
  n= 24 

         CP nsplit rel error   xerror      xstd
1 0.2222222      0 1.0000000 1.000000 0.2635231
2 0.0100000      1 0.7777778 1.333333 0.2721655

Variable importance
tear 
 100 

Node number 1: 24 observations,    complexity param=0.2222222
  predicted class=none  expected loss=0.375  P(node) =1
    class counts:     4    15     5
   probabilities: 0.167 0.625 0.208 
  left son=2 (12 obs) right son=3 (12 obs)
  Primary splits:
      tear       splits as  RL,  improve=5.0833330, (0 missing)
      astimatism splits as  RL,  improve=1.7500000, (0 missing)
      age        splits as  RRL, improve=0.2916667, (0 missing)
      spectacle  splits as  RL,  improve=0.2500000, (0 missing)

Node number 2: 12 observations
  predicted class=none  expected loss=0  P(node) =0.5
    class counts:     0    12     0
   probabilities: 0.000 1.000 0.000 

Node number 3: 12 observations
  predicted class=soft  expected loss=0.5833333  P(node) =0.5
    class counts:     4     3     5
   probabilities: 0.333 0.250 0.417 

可视化

> par(xpd = TRUE)
> plot(model)
> text(model)

算法C5.0的统计汇总

Call:
C5.0.formula(formula = recommended ~ ., data = df2)


C5.0 [Release 2.07 GPL Edition]  	Mon Mar 09 14:47:09 2015
-------------------------------

Class specified by attribute `outcome'

Read 24 cases (5 attributes) from undefined.data

Decision tree:

tear = reduced: none (12)
tear = normal:
:...astimatism = no: soft (6/1)
    astimatism = yes: hard (6/2)


Evaluation on training data (24 cases):

	    Decision Tree   
	  ----------------  
	  Size      Errors  

	     3    3(12.5%)   <<


	   (a)   (b)   (c)    <-classified as
	  ----  ----  ----
	     4                (a): class hard
	     2    12     1    (b): class none
	                 5    (c): class soft


	Attribute usage:

	100.00%	tear
	 50.00%	astimatism


Time: 0.0 secs

发现影响医生决策佩戴隐形眼镜后泪腺分泌是否增多。

 

posted @ 2015-03-09 15:16  Dearc  阅读(300)  评论(0编辑  收藏  举报