K紧邻分类鸢尾花

导入数据集

'''
datatime:2020/6/14
author:wuxiong
description:鸢尾花数据集分类
'''
import numpy
from sklearn.datasets import load_iris 
#读出鸢尾花数据集data
data=load_iris()

print(data.keys())

鸢尾花数据集一共150条

一共有三类分别用0,1,2表示,分别是['setosa' 'versicolor' 'virginica']

四个特征

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

每种种类的鸢尾花有50个数据

详细的数据集分析点击,这里

'''
datatime:2020/6/14
author:wuxiong
description:鸢尾花数据集分类
'''
import numpy
from sklearn.datasets import load_iris 
#读出鸢尾花数据集data
data=load_iris()

print(data.keys())
#鸢尾花数据集包含的内容
# print(data['data'])

#打乱顺序
from sklearn.utils import shuffle
data_shuffle_train,data_shuffle_target = shuffle(data['data'],data['target'])

#分割成数据集和测试集,80%用于训练,20%用于测试
from sklearn.model_selection import StratifiedShuffleSplit
train_index, test_index = next(iter(
    StratifiedShuffleSplit(n_splits=1, test_size=0.2,
                           random_state=1).split(data_shuffle_train,data_shuffle_target)))
x_data_train = data_shuffle_train[train_index]
y_data_train = data_shuffle_target[train_index]
x_data_test = data_shuffle_train[test_index]
y_data_test = data_shuffle_target[test_index]

#开始训练
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3, algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, p=2, weights='uniform')
knn.fit(x_data_train, y_data_train)

train_score = knn.score(x_data_train,y_data_train)
test_score = knn.score(x_data_test,y_data_test)

print('train_score = {0},test_score={1}'.format(train_score,test_score))

训练集的模式差不多能到0.975,测试集0.96

posted @ 2020-06-14 21:03  吴雄  阅读(467)  评论(0编辑  收藏  举报