《机器学习》周志华 习题答案7.3

  运用贝叶斯方法对西瓜数据集进行分类,同理代码如下:


file1 = open('c:\quant\watermelon.csv','r')
data = [line.strip('\n').split(',') for line in file1]
data = np.array(data)
X = [[float(raw[-7]),float(raw[-6]),float(raw[-5]),float(raw[-4]),float(raw[-3]), float(raw[-2])] for raw in data[1:,1:-1]]
#X = [[float(raw[-3]), float(raw[-2])] for raw in data[1:]]
y = [1 if raw[-1]=='1' else 0 for raw in data[1:]]
X = np.array(X)
y = np.array(y)

from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(X, y).predict(X)
print("Number of mislabeled points out of a total %d points : %d"
% (X.shape[0],(y != y_pred).sum()))
print y
print y_pred
 

结果如下:Number of mislabeled points out of a total 17 points : 2

[1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0]

如果选取的属性过小,则分类的错误率会增加。

 

posted @ 2016-07-05 14:55  机器人小z  阅读(1262)  评论(0编辑  收藏  举报