【笔记】初探KNN算法(2)

KNN算法(2)

机器学习算法封装
scikit-learn中的机器学习算法封装

在python chame中将算法写好

  import  numpy as np
  from math import sqrt
  from collections import Counter

  def kNN_classify(k, X_train, y_train , x):

      assert 1 <= k <= X_train.shape[0],"k must be valid"
      assert X_train.shape[0] == y_train.shape[0], \
          "the size of X_train must equal to the size of y_train"
      assert X_train.shape[1] == x.shape[0], \
          "the feature number of x must be equal to X_train"

      distances = [sqrt(np.sum((x_train - x)**2)) for x_train in X_train]
      nearest = np.argsort(distances)

      topK_y = [y_train[i] for i in nearest[:k]]
      votes = Counter(topK_y)

      return votes.most_common(1)[0][0]

将所需要的数据提前准备好

使用魔法命令%run调用函数

  %run KNN.py

执行即可得到预测结果

k近邻算法是非常特殊的,可以被认为是没有模型的算法,为了和其他的算法统一,可以认为训练数据集就是魔性本身

使用scikit-learn中的kNN

需要调用KNeighborsClassifier类

创建实例,其中n_neighbors=6相当于k=6

然后进行fit操作

  kNN_classifier.fit(X_train,y_train)

其返回值就是自身,可以不用接参数

调用predict方法即可实现

不过需要注意的是,这个必须是一个矩阵,不能是一维数组
因此我们先reshape改变结构

最后就可以得到预测的类别

重新整理我们的kNN代码
在同一个文件夹下创建一个kNN1.py的文件
写入KNN算法

  import numpy as np
  from math import sqrt
  from collections import Counter

  class KNNClassifier:

      def __init__(self, k):
          """初始化KNN分类器"""
          assert k >= 1, "k must be valid"
          self.k = k
          self._X_train = None
          self._y_train = None

      def fit(self, X_train, y_train):
          """根据训练数据集X_train和y_train训练kNN分类器"""
          assert X_train.shape[0] == y_train.shape[0], \
              "this size of X_train must be equal to the size of y_train"
          assert self.k <= X_train.shape[0], \
              "the size of X_train must be at least k."

          self._X_train = X_train
          self._y_train = y_train
          return self

      def predict(self, X_predict):
          """给定预测数据集X_predict,返回表示X_predict的结果向量"""
          assert self._X_train is not None and self._y_train is not None, \
              "must fit before predict!"
          assert X_predict.shape[1] == self._X_train.shape[1], \
              "the feature number of X_predict must be equal to X_train"

          y_predict = [self._predict(x) for x in X_predict]
          return np.array(y_predict)

      def _predict(self, x):
          """给定单个待预测数据x,返回x的预测结果值"""
          assert x.shape[0] == self._X_train.shape[1], \
              "the feature number of x must be equal to X_train"

          distances = [sqrt(np.sum((x_train - x) ** 2))
                       for x_train in self._X_train]

          nearest = np.argsort(distances)

          topK_y = [self._y_train[i] for i in nearest[:self.k]]
          votes = Counter(topK_y)

          return votes.most_common(1)[0][0]

      def __repr__(self):
          return "KNN(k=%d)" % self.k

同上操作,即可得到

posted @ 2021-01-13 21:33  DbWong_0918  阅读(167)  评论(0编辑  收藏  举报