KNN分类

1. KNN简介

    K近邻(K-Nearest Neighbor)简称KNN.它可以做分类算法,也可以做回归算法。个人经验:KNN在做分类问题时非常有效。

2. KNN算法思想

    在样本空间中,我们认为两个实例在特征空间中的距离反映了它们之间的相似度,距离越近越相似。输入一个实例,看它距离些实例近,使用这些实例标签推断该实例标签(一般使用投票法做分类)。

3. KNN算法实现

# 导入包
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import joblib

# 导入数据
fpath = r"..\文件\训练数据2.csv"
df = pd.read_csv(fpath)
print(df.head())


# 数据划分
x_train, x_test = train_test_split(df, train_size=0.7)

# 训练集
train_x = x_train.loc[:, "nAcid":"Zagreb"]
train_y = x_train["CYP3A4"]

# 测试集
text_x = x_test.loc[:, "nAcid":"Zagreb"]
test_y = x_test["CYP3A4"]

# 训练knn模型
knn = KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto')
knn.fit(train_x, train_y)
joblib.dump(knn, "knn2.pkl")

scores = knn.score(train_x, train_y)
print("knn训练得分:", scores)

# 测试模型
label_predic = knn.predict(text_x)
acc = accuracy_score(label_predic, test_y)
print("knn测试得分:", acc)

print(classification_report(test_y, label_predic))


# 网格调参
gsCv = GridSearchCV(knn,
                    param_grid={
                     'n_neighbors':list(range(1, 40, 1))
                     }, cv=10)
gsCv.fit(train_x, train_y)

print("参数训练结束")
print("参数训练结束")
print("最好的得分:", gsCv.best_score_, "最好的参数:", gsCv.best_params_)
posted @ 2021-10-19 21:42  编码雪人  阅读(247)  评论(0编辑  收藏  举报