《机器学习》周志华习题答案7.3

　　运用贝叶斯方法对西瓜数据集进行分类，同理代码如下：

file1 = open('c:\quant\watermelon.csv','r')
data = [line.strip('\n').split(',') for line in file1]
data = np.array(data)
X = [[float(raw[-7]),float(raw[-6]),float(raw[-5]),float(raw[-4]),float(raw[-3]), float(raw[-2])] for raw in data[1:,1:-1]]
#X = [[float(raw[-3]), float(raw[-2])] for raw in data[1:]]
y = [1 if raw[-1]=='1' else 0 for raw in data[1:]]
X = np.array(X)
y = np.array(y)

from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(X, y).predict(X)
print("Number of mislabeled points out of a total %d points : %d"
% (X.shape[0],(y != y_pred).sum()))
print y
print y_pred

结果如下：Number of mislabeled points out of a total 17 points : 2

[1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0]

如果选取的属性过小，则分类的错误率会增加。

posted @ 2016-07-05 14:55 机器人小z 阅读(1262) 评论(0) 编辑收藏举报

刷新页面返回顶部

机器人小z

《机器学习》周志华 习题答案7.3

公告

《机器学习》周志华习题答案7.3