机器学习之KNN原理与代码实现

                                      KNN原理与代码实现

                                      本文系作者原创,转载请注明出处:https://www.cnblogs.com/further-further-further/p/9670187.html 

1. KNN原理

KNN(k-Nearest Neighbour):K-近邻算法,主要思想可以归结为一个成语:物以类聚

1.1 工作原理

给定一个训练数据集,对新的输入实例,在训练数据集中找到与该实例最邻近的 k (k <= 20)个实例,这 k 个实例的多数属于某个类,

就把该输入实例分为这个类。

https://www.cnblogs.com/ybjourney/p/4702562.html给出的例子很形象,这里借用一下。

如下图,绿色圆要被决定赋予哪个类,是红色三角形还是蓝色四方形?如果K=3,由于红色三角形所占比例为2/3,绿色圆将被赋予红色三角形那个类,

如果K=5,由于蓝色四方形比例为3/5,因此绿色圆被赋予蓝色四方形类。

          

由此也说明了KNN算法的结果很大程度取决于K的选择。

1.2 欧氏距离公式

计算两个向量点xA和xB之间的距离

               

 

1.3 分类决策规则(如多数表决)

           

 决定  类别  为指示函数,即当  时  为 1,否则  为0。

1.4 算法流程

对未知类别属性的数据集中的每个点依次执行以下操作:

1. 计算已知类别数据集中的点与当前点之间的距离;

2. 按照距离递增次序排序;

3. 选取与当前点距离最小的 k 个点;

4. 确定前 k 个点所在类别的出现频率;

5. 返回前 k 个点出现频率最高的类别作为当前点的预测分类;

2. 代码实现

python3.6

每个方法的作用,以及每行代码的作用,同样我都做了详细的注解。

希望大家最好自己能实现一下,特别是在运算时 list,array,matrix之间的关系以及运用场景,

只有在你自己实现时,才能理清这三者的作用以及关系。

2.1 输入数据

datingTestSet2.txt :约会网站数据(三种类型:不喜欢的人,魅力一般的人,极具魅力的人)

   1 40920    8.326976    0.953952    3
   2 14488    7.153469    1.673904    2
   3 26052    1.441871    0.805124    1
   4 75136    13.147394    0.428964    1
   5 38344    1.669788    0.134296    1
   6 72993    10.141740    1.032955    1
   7 35948    6.830792    1.213192    3
   8 42666    13.276369    0.543880    3
   9 67497    8.631577    0.749278    1
  10 35483    12.273169    1.508053    3
  11 50242    3.723498    0.831917    1
  12 63275    8.385879    1.669485    1
  13 5569    4.875435    0.728658    2
  14 51052    4.680098    0.625224    1
  15 77372    15.299570    0.331351    1
  16 43673    1.889461    0.191283    1
  17 61364    7.516754    1.269164    1
  18 69673    14.239195    0.261333    1
  19 15669    0.000000    1.250185    2
  20 28488    10.528555    1.304844    3
  21 6487    3.540265    0.822483    2
  22 37708    2.991551    0.833920    1
  23 22620    5.297865    0.638306    2
  24 28782    6.593803    0.187108    3
  25 19739    2.816760    1.686209    2
  26 36788    12.458258    0.649617    3
  27 5741    0.000000    1.656418    2
  28 28567    9.968648    0.731232    3
  29 6808    1.364838    0.640103    2
  30 41611    0.230453    1.151996    1
  31 36661    11.865402    0.882810    3
  32 43605    0.120460    1.352013    1
  33 15360    8.545204    1.340429    3
  34 63796    5.856649    0.160006    1
  35 10743    9.665618    0.778626    2
  36 70808    9.778763    1.084103    1
  37 72011    4.932976    0.632026    1
  38 5914    2.216246    0.587095    2
  39 14851    14.305636    0.632317    3
  40 33553    12.591889    0.686581    3
  41 44952    3.424649    1.004504    1
  42 17934    0.000000    0.147573    2
  43 27738    8.533823    0.205324    3
  44 29290    9.829528    0.238620    3
  45 42330    11.492186    0.263499    3
  46 36429    3.570968    0.832254    1
  47 39623    1.771228    0.207612    1
  48 32404    3.513921    0.991854    1
  49 27268    4.398172    0.975024    1
  50 5477    4.276823    1.174874    2
  51 14254    5.946014    1.614244    2
  52 68613    13.798970    0.724375    1
  53 41539    10.393591    1.663724    3
  54 7917    3.007577    0.297302    2
  55 21331    1.031938    0.486174    2
  56 8338    4.751212    0.064693    2
  57 5176    3.692269    1.655113    2
  58 18983    10.448091    0.267652    3
  59 68837    10.585786    0.329557    1
  60 13438    1.604501    0.069064    2
  61 48849    3.679497    0.961466    1
  62 12285    3.795146    0.696694    2
  63 7826    2.531885    1.659173    2
  64 5565    9.733340    0.977746    2
  65 10346    6.093067    1.413798    2
  66 1823    7.712960    1.054927    2
  67 9744    11.470364    0.760461    3
  68 16857    2.886529    0.934416    2
  69 39336    10.054373    1.138351    3
  70 65230    9.972470    0.881876    1
  71 2463    2.335785    1.366145    2
  72 27353    11.375155    1.528626    3
  73 16191    0.000000    0.605619    2
  74 12258    4.126787    0.357501    2
  75 42377    6.319522    1.058602    1
  76 25607    8.680527    0.086955    3
  77 77450    14.856391    1.129823    1
  78 58732    2.454285    0.222380    1
  79 46426    7.292202    0.548607    3
  80 32688    8.745137    0.857348    3
  81 64890    8.579001    0.683048    1
  82 8554    2.507302    0.869177    2
  83 28861    11.415476    1.505466    3
  84 42050    4.838540    1.680892    1
  85 32193    10.339507    0.583646    3
  86 64895    6.573742    1.151433    1
  87 2355    6.539397    0.462065    2
  88 0    2.209159    0.723567    2
  89 70406    11.196378    0.836326    1
  90 57399    4.229595    0.128253    1
  91 41732    9.505944    0.005273    3
  92 11429    8.652725    1.348934    3
  93 75270    17.101108    0.490712    1
  94 5459    7.871839    0.717662    2
  95 73520    8.262131    1.361646    1
  96 40279    9.015635    1.658555    3
  97 21540    9.215351    0.806762    3
  98 17694    6.375007    0.033678    2
  99 22329    2.262014    1.022169    1
 100 46570    5.677110    0.709469    1
 101 42403    11.293017    0.207976    3
 102 33654    6.590043    1.353117    1
 103 9171    4.711960    0.194167    2
 104 28122    8.768099    1.108041    3
 105 34095    11.502519    0.545097    3
 106 1774    4.682812    0.578112    2
 107 40131    12.446578    0.300754    3
 108 13994    12.908384    1.657722    3
 109 77064    12.601108    0.974527    1
 110 11210    3.929456    0.025466    2
 111 6122    9.751503    1.182050    3
 112 15341    3.043767    0.888168    2
 113 44373    4.391522    0.807100    1
 114 28454    11.695276    0.679015    3
 115 63771    7.879742    0.154263    1
 116 9217    5.613163    0.933632    2
 117 69076    9.140172    0.851300    1
 118 24489    4.258644    0.206892    1
 119 16871    6.799831    1.221171    2
 120 39776    8.752758    0.484418    3
 121 5901    1.123033    1.180352    2
 122 40987    10.833248    1.585426    3
 123 7479    3.051618    0.026781    2
 124 38768    5.308409    0.030683    3
 125 4933    1.841792    0.028099    2
 126 32311    2.261978    1.605603    1
 127 26501    11.573696    1.061347    3
 128 37433    8.038764    1.083910    3
 129 23503    10.734007    0.103715    3
 130 68607    9.661909    0.350772    1
 131 27742    9.005850    0.548737    3
 132 11303    0.000000    0.539131    2
 133 0    5.757140    1.062373    2
 134 32729    9.164656    1.624565    3
 135 24619    1.318340    1.436243    1
 136 42414    14.075597    0.695934    3
 137 20210    10.107550    1.308398    3
 138 33225    7.960293    1.219760    3
 139 54483    6.317292    0.018209    1
 140 18475    12.664194    0.595653    3
 141 33926    2.906644    0.581657    1
 142 43865    2.388241    0.913938    1
 143 26547    6.024471    0.486215    3
 144 44404    7.226764    1.255329    3
 145 16674    4.183997    1.275290    2
 146 8123    11.850211    1.096981    3
 147 42747    11.661797    1.167935    3
 148 56054    3.574967    0.494666    1
 149 10933    0.000000    0.107475    2
 150 18121    7.937657    0.904799    3
 151 11272    3.365027    1.014085    2
 152 16297    0.000000    0.367491    2
 153 28168    13.860672    1.293270    3
 154 40963    10.306714    1.211594    3
 155 31685    7.228002    0.670670    3
 156 55164    4.508740    1.036192    1
 157 17595    0.366328    0.163652    2
 158 1862    3.299444    0.575152    2
 159 57087    0.573287    0.607915    1
 160 63082    9.183738    0.012280    1
 161 51213    7.842646    1.060636    3
 162 6487    4.750964    0.558240    2
 163 4805    11.438702    1.556334    3
 164 30302    8.243063    1.122768    3
 165 68680    7.949017    0.271865    1
 166 17591    7.875477    0.227085    2
 167 74391    9.569087    0.364856    1
 168 37217    7.750103    0.869094    3
 169 42814    0.000000    1.515293    1
 170 14738    3.396030    0.633977    2
 171 19896    11.916091    0.025294    3
 172 14673    0.460758    0.689586    2
 173 32011    13.087566    0.476002    3
 174 58736    4.589016    1.672600    1
 175 54744    8.397217    1.534103    1
 176 29482    5.562772    1.689388    1
 177 27698    10.905159    0.619091    3
 178 11443    1.311441    1.169887    2
 179 56117    10.647170    0.980141    3
 180 39514    0.000000    0.481918    1
 181 26627    8.503025    0.830861    3
 182 16525    0.436880    1.395314    2
 183 24368    6.127867    1.102179    1
 184 22160    12.112492    0.359680    3
 185 6030    1.264968    1.141582    2
 186 6468    6.067568    1.327047    2
 187 22945    8.010964    1.681648    3
 188 18520    3.791084    0.304072    2
 189 34914    11.773195    1.262621    3
 190 6121    8.339588    1.443357    2
 191 38063    2.563092    1.464013    1
 192 23410    5.954216    0.953782    1
 193 35073    9.288374    0.767318    3
 194 52914    3.976796    1.043109    1
 195 16801    8.585227    1.455708    3
 196 9533    1.271946    0.796506    2
 197 16721    0.000000    0.242778    2
 198 5832    0.000000    0.089749    2
 199 44591    11.521298    0.300860    3
 200 10143    1.139447    0.415373    2
 201 21609    5.699090    1.391892    2
 202 23817    2.449378    1.322560    1
 203 15640    0.000000    1.228380    2
 204 8847    3.168365    0.053993    2
 205 50939    10.428610    1.126257    3
 206 28521    2.943070    1.446816    1
 207 32901    10.441348    0.975283    3
 208 42850    12.478764    1.628726    3
 209 13499    5.856902    0.363883    2
 210 40345    2.476420    0.096075    1
 211 43547    1.826637    0.811457    1
 212 70758    4.324451    0.328235    1
 213 19780    1.376085    1.178359    2
 214 44484    5.342462    0.394527    1
 215 54462    11.835521    0.693301    3
 216 20085    12.423687    1.424264    3
 217 42291    12.161273    0.071131    3
 218 47550    8.148360    1.649194    3
 219 11938    1.531067    1.549756    2
 220 40699    3.200912    0.309679    1
 221 70908    8.862691    0.530506    1
 222 73989    6.370551    0.369350    1
 223 11872    2.468841    0.145060    2
 224 48463    11.054212    0.141508    3
 225 15987    2.037080    0.715243    2
 226 70036    13.364030    0.549972    1
 227 32967    10.249135    0.192735    3
 228 63249    10.464252    1.669767    1
 229 42795    9.424574    0.013725    3
 230 14459    4.458902    0.268444    2
 231 19973    0.000000    0.575976    2
 232 5494    9.686082    1.029808    3
 233 67902    13.649402    1.052618    1
 234 25621    13.181148    0.273014    3
 235 27545    3.877472    0.401600    1
 236 58656    1.413952    0.451380    1
 237 7327    4.248986    1.430249    2
 238 64555    8.779183    0.845947    1
 239 8998    4.156252    0.097109    2
 240 11752    5.580018    0.158401    2
 241 76319    15.040440    1.366898    1
 242 27665    12.793870    1.307323    3
 243 67417    3.254877    0.669546    1
 244 21808    10.725607    0.588588    3
 245 15326    8.256473    0.765891    2
 246 20057    8.033892    1.618562    3
 247 79341    10.702532    0.204792    1
 248 15636    5.062996    1.132555    2
 249 35602    10.772286    0.668721    3
 250 28544    1.892354    0.837028    1
 251 57663    1.019966    0.372320    1
 252 78727    15.546043    0.729742    1
 253 68255    11.638205    0.409125    1
 254 14964    3.427886    0.975616    2
 255 21835    11.246174    1.475586    3
 256 7487    0.000000    0.645045    2
 257 8700    0.000000    1.424017    2
 258 26226    8.242553    0.279069    3
 259 65899    8.700060    0.101807    1
 260 6543    0.812344    0.260334    2
 261 46556    2.448235    1.176829    1
 262 71038    13.230078    0.616147    1
 263 47657    0.236133    0.340840    1
 264 19600    11.155826    0.335131    3
 265 37422    11.029636    0.505769    3
 266 1363    2.901181    1.646633    2
 267 26535    3.924594    1.143120    1
 268 47707    2.524806    1.292848    1
 269 38055    3.527474    1.449158    1
 270 6286    3.384281    0.889268    2
 271 10747    0.000000    1.107592    2
 272 44883    11.898890    0.406441    3
 273 56823    3.529892    1.375844    1
 274 68086    11.442677    0.696919    1
 275 70242    10.308145    0.422722    1
 276 11409    8.540529    0.727373    2
 277 67671    7.156949    1.691682    1
 278 61238    0.720675    0.847574    1
 279 17774    0.229405    1.038603    2
 280 53376    3.399331    0.077501    1
 281 30930    6.157239    0.580133    1
 282 28987    1.239698    0.719989    1
 283 13655    6.036854    0.016548    2
 284 7227    5.258665    0.933722    2
 285 40409    12.393001    1.571281    3
 286 13605    9.627613    0.935842    2
 287 26400    11.130453    0.597610    3
 288 13491    8.842595    0.349768    3
 289 30232    10.690010    1.456595    3
 290 43253    5.714718    1.674780    3
 291 55536    3.052505    1.335804    1
 292 8807    0.000000    0.059025    2
 293 25783    9.945307    1.287952    3
 294 22812    2.719723    1.142148    1
 295 77826    11.154055    1.608486    1
 296 38172    2.687918    0.660836    1
 297 31676    10.037847    0.962245    3
 298 74038    12.404762    1.112080    1
 299 44738    10.237305    0.633422    3
 300 17410    4.745392    0.662520    2
 301 5688    4.639461    1.569431    2
 302 36642    3.149310    0.639669    1
 303 29956    13.406875    1.639194    3
 304 60350    6.068668    0.881241    1
 305 23758    9.477022    0.899002    3
 306 25780    3.897620    0.560201    2
 307 11342    5.463615    1.203677    2
 308 36109    3.369267    1.575043    1
 309 14292    5.234562    0.825954    2
 310 11160    0.000000    0.722170    2
 311 23762    12.979069    0.504068    3
 312 39567    5.376564    0.557476    1
 313 25647    13.527910    1.586732    3
 314 14814    2.196889    0.784587    2
 315 73590    10.691748    0.007509    1
 316 35187    1.659242    0.447066    1
 317 49459    8.369667    0.656697    3
 318 31657    13.157197    0.143248    3
 319 6259    8.199667    0.908508    2
 320 33101    4.441669    0.439381    3
 321 27107    9.846492    0.644523    3
 322 17824    0.019540    0.977949    2
 323 43536    8.253774    0.748700    3
 324 67705    6.038620    1.509646    1
 325 35283    6.091587    1.694641    3
 326 71308    8.986820    1.225165    1
 327 31054    11.508473    1.624296    3
 328 52387    8.807734    0.713922    3
 329 40328    0.000000    0.816676    1
 330 34844    8.889202    1.665414    3
 331 11607    3.178117    0.542752    2
 332 64306    7.013795    0.139909    1
 333 32721    9.605014    0.065254    3
 334 33170    1.230540    1.331674    1
 335 37192    10.412811    0.890803    3
 336 13089    0.000000    0.567161    2
 337 66491    9.699991    0.122011    1
 338 15941    0.000000    0.061191    2
 339 4272    4.455293    0.272135    2
 340 48812    3.020977    1.502803    1
 341 28818    8.099278    0.216317    3
 342 35394    1.157764    1.603217    1
 343 71791    10.105396    0.121067    1
 344 40668    11.230148    0.408603    3
 345 39580    9.070058    0.011379    3
 346 11786    0.566460    0.478837    2
 347 19251    0.000000    0.487300    2
 348 56594    8.956369    1.193484    3
 349 54495    1.523057    0.620528    1
 350 11844    2.749006    0.169855    2
 351 45465    9.235393    0.188350    3
 352 31033    10.555573    0.403927    3
 353 16633    6.956372    1.519308    2
 354 13887    0.636281    1.273984    2
 355 52603    3.574737    0.075163    1
 356 72000    9.032486    1.461809    1
 357 68497    5.958993    0.023012    1
 358 35135    2.435300    1.211744    1
 359 26397    10.539731    1.638248    3
 360 7313    7.646702    0.056513    2
 361 91273    20.919349    0.644571    1
 362 24743    1.424726    0.838447    1
 363 31690    6.748663    0.890223    3
 364 15432    2.289167    0.114881    2
 365 58394    5.548377    0.402238    1
 366 33962    6.057227    0.432666    1
 367 31442    10.828595    0.559955    3
 368 31044    11.318160    0.271094    3
 369 29938    13.265311    0.633903    3
 370 9875    0.000000    1.496715    2
 371 51542    6.517133    0.402519    3
 372 11878    4.934374    1.520028    2
 373 69241    10.151738    0.896433    1
 374 37776    2.425781    1.559467    1
 375 68997    9.778962    1.195498    1
 376 67416    12.219950    0.657677    1
 377 59225    7.394151    0.954434    1
 378 29138    8.518535    0.742546    3
 379 5962    2.798700    0.662632    2
 380 10847    0.637930    0.617373    2
 381 70527    10.750490    0.097415    1
 382 9610    0.625382    0.140969    2
 383 64734    10.027968    0.282787    1
 384 25941    9.817347    0.364197    3
 385 2763    0.646828    1.266069    2
 386 55601    3.347111    0.914294    1
 387 31128    11.816892    0.193798    3
 388 5181    0.000000    1.480198    2
 389 69982    10.945666    0.993219    1
 390 52440    10.244706    0.280539    3
 391 57350    2.579801    1.149172    1
 392 57869    2.630410    0.098869    1
 393 56557    11.746200    1.695517    3
 394 42342    8.104232    1.326277    3
 395 15560    12.409743    0.790295    3
 396 34826    12.167844    1.328086    3
 397 8569    3.198408    0.299287    2
 398 77623    16.055513    0.541052    1
 399 78184    7.138659    0.158481    1
 400 7036    4.831041    0.761419    2
 401 69616    10.082890    1.373611    1
 402 21546    10.066867    0.788470    3
 403 36715    8.129538    0.329913    3
 404 20522    3.012463    1.138108    2
 405 42349    3.720391    0.845974    1
 406 9037    0.773493    1.148256    2
 407 26728    10.962941    1.037324    3
 408 587    0.177621    0.162614    2
 409 48915    3.085853    0.967899    1
 410 9824    8.426781    0.202558    2
 411 4135    1.825927    1.128347    2
 412 9666    2.185155    1.010173    2
 413 59333    7.184595    1.261338    1
 414 36198    0.000000    0.116525    1
 415 34909    8.901752    1.033527    3
 416 47516    2.451497    1.358795    1
 417 55807    3.213631    0.432044    1
 418 14036    3.974739    0.723929    2
 419 42856    9.601306    0.619232    3
 420 64007    8.363897    0.445341    1
 421 59428    6.381484    1.365019    1
 422 13730    0.000000    1.403914    2
 423 41740    9.609836    1.438105    3
 424 63546    9.904741    0.985862    1
 425 30417    7.185807    1.489102    3
 426 69636    5.466703    1.216571    1
 427 64660    0.000000    0.915898    1
 428 14883    4.575443    0.535671    2
 429 7965    3.277076    1.010868    2
 430 68620    10.246623    1.239634    1
 431 8738    2.341735    1.060235    2
 432 7544    3.201046    0.498843    2
 433 6377    6.066013    0.120927    2
 434 36842    8.829379    0.895657    3
 435 81046    15.833048    1.568245    1
 436 67736    13.516711    1.220153    1
 437 32492    0.664284    1.116755    1
 438 39299    6.325139    0.605109    3
 439 77289    8.677499    0.344373    1
 440 33835    8.188005    0.964896    3
 441 71890    9.414263    0.384030    1
 442 32054    9.196547    1.138253    3
 443 38579    10.202968    0.452363    3
 444 55984    2.119439    1.481661    1
 445 72694    13.635078    0.858314    1
 446 42299    0.083443    0.701669    1
 447 26635    9.149096    1.051446    3
 448 8579    1.933803    1.374388    2
 449 37302    14.115544    0.676198    3
 450 22878    8.933736    0.943352    3
 451 4364    2.661254    0.946117    2
 452 4985    0.988432    1.305027    2
 453 37068    2.063741    1.125946    1
 454 41137    2.220590    0.690754    1
 455 67759    6.424849    0.806641    1
 456 11831    1.156153    1.613674    2
 457 34502    3.032720    0.601847    1
 458 4088    3.076828    0.952089    2
 459 15199    0.000000    0.318105    2
 460 17309    7.750480    0.554015    3
 461 42816    10.958135    1.482500    3
 462 43751    10.222018    0.488678    3
 463 58335    2.367988    0.435741    1
 464 75039    7.686054    1.381455    1
 465 42878    11.464879    1.481589    3
 466 42770    11.075735    0.089726    3
 467 8848    3.543989    0.345853    2
 468 31340    8.123889    1.282880    3
 469 41413    4.331769    0.754467    3
 470 12731    0.120865    1.211961    2
 471 22447    6.116109    0.701523    3
 472 33564    7.474534    0.505790    3
 473 48907    8.819454    0.649292    3
 474 8762    6.802144    0.615284    2
 475 46696    12.666325    0.931960    3
 476 36851    8.636180    0.399333    3
 477 67639    11.730991    1.289833    1
 478 171    8.132449    0.039062    2
 479 26674    10.296589    1.496144    3
 480 8739    7.583906    1.005764    2
 481 66668    9.777806    0.496377    1
 482 68732    8.833546    0.513876    1
 483 69995    4.907899    1.518036    1
 484 82008    8.362736    1.285939    1
 485 25054    9.084726    1.606312    3
 486 33085    14.164141    0.560970    3
 487 41379    9.080683    0.989920    3
 488 39417    6.522767    0.038548    3
 489 12556    3.690342    0.462281    2
 490 39432    3.563706    0.242019    1
 491 38010    1.065870    1.141569    1
 492 69306    6.683796    1.456317    1
 493 38000    1.712874    0.243945    1
 494 46321    13.109929    1.280111    3
 495 66293    11.327910    0.780977    1
 496 22730    4.545711    1.233254    1
 497 5952    3.367889    0.468104    2
 498 72308    8.326224    0.567347    1
 499 60338    8.978339    1.442034    1
 500 13301    5.655826    1.582159    2
 501 27884    8.855312    0.570684    3
 502 11188    6.649568    0.544233    2
 503 56796    3.966325    0.850410    1
 504 8571    1.924045    1.664782    2
 505 4914    6.004812    0.280369    2
 506 10784    0.000000    0.375849    2
 507 39296    9.923018    0.092192    3
 508 13113    2.389084    0.119284    2
 509 70204    13.663189    0.133251    1
 510 46813    11.434976    0.321216    3
 511 11697    0.358270    1.292858    2
 512 44183    9.598873    0.223524    3
 513 2225    6.375275    0.608040    2
 514 29066    11.580532    0.458401    3
 515 4245    5.319324    1.598070    2
 516 34379    4.324031    1.603481    1
 517 44441    2.358370    1.273204    1
 518 2022    0.000000    1.182708    2
 519 26866    12.824376    0.890411    3
 520 57070    1.587247    1.456982    1
 521 32932    8.510324    1.520683    3
 522 51967    10.428884    1.187734    3
 523 44432    8.346618    0.042318    3
 524 67066    7.541444    0.809226    1
 525 17262    2.540946    1.583286    2
 526 79728    9.473047    0.692513    1
 527 14259    0.352284    0.474080    2
 528 6122    0.000000    0.589826    2
 529 76879    12.405171    0.567201    1
 530 11426    4.126775    0.871452    2
 531 2493    0.034087    0.335848    2
 532 19910    1.177634    0.075106    2
 533 10939    0.000000    0.479996    2
 534 17716    0.994909    0.611135    2
 535 31390    11.053664    1.180117    3
 536 20375    0.000000    1.679729    2
 537 26309    2.495011    1.459589    1
 538 33484    11.516831    0.001156    3
 539 45944    9.213215    0.797743    3
 540 4249    5.332865    0.109288    2
 541 6089    0.000000    1.689771    2
 542 7513    0.000000    1.126053    2
 543 27862    12.640062    1.690903    3
 544 39038    2.693142    1.317518    1
 545 19218    3.328969    0.268271    2
 546 62911    7.193166    1.117456    1
 547 77758    6.615512    1.521012    1
 548 27940    8.000567    0.835341    3
 549 2194    4.017541    0.512104    2
 550 37072    13.245859    0.927465    3
 551 15585    5.970616    0.813624    2
 552 25577    11.668719    0.886902    3
 553 8777    4.283237    1.272728    2
 554 29016    10.742963    0.971401    3
 555 21910    12.326672    1.592608    3
 556 12916    0.000000    0.344622    2
 557 10976    0.000000    0.922846    2
 558 79065    10.602095    0.573686    1
 559 36759    10.861859    1.155054    3
 560 50011    1.229094    1.638690    1
 561 1155    0.410392    1.313401    2
 562 71600    14.552711    0.616162    1
 563 30817    14.178043    0.616313    3
 564 54559    14.136260    0.362388    1
 565 29764    0.093534    1.207194    1
 566 69100    10.929021    0.403110    1
 567 47324    11.432919    0.825959    3
 568 73199    9.134527    0.586846    1
 569 44461    5.071432    1.421420    1
 570 45617    11.460254    1.541749    3
 571 28221    11.620039    1.103553    3
 572 7091    4.022079    0.207307    2
 573 6110    3.057842    1.631262    2
 574 79016    7.782169    0.404385    1
 575 18289    7.981741    0.929789    3
 576 43679    4.601363    0.268326    1
 577 22075    2.595564    1.115375    1
 578 23535    10.049077    0.391045    3
 579 25301    3.265444    1.572970    2
 580 32256    11.780282    1.511014    3
 581 36951    3.075975    0.286284    1
 582 31290    1.795307    0.194343    1
 583 38953    11.106979    0.202415    3
 584 35257    5.994413    0.800021    1
 585 25847    9.706062    1.012182    3
 586 32680    10.582992    0.836025    3
 587 62018    7.038266    1.458979    1
 588 9074    0.023771    0.015314    2
 589 33004    12.823982    0.676371    3
 590 44588    3.617770    0.493483    1
 591 32565    8.346684    0.253317    3
 592 38563    6.104317    0.099207    1
 593 75668    16.207776    0.584973    1
 594 9069    6.401969    1.691873    2
 595 53395    2.298696    0.559757    1
 596 28631    7.661515    0.055981    3
 597 71036    6.353608    1.645301    1
 598 71142    10.442780    0.335870    1
 599 37653    3.834509    1.346121    1
 600 76839    10.998587    0.584555    1
 601 9916    2.695935    1.512111    2
 602 38889    3.356646    0.324230    1
 603 39075    14.677836    0.793183    3
 604 48071    1.551934    0.130902    1
 605 7275    2.464739    0.223502    2
 606 41804    1.533216    1.007481    1
 607 35665    12.473921    0.162910    3
 608 67956    6.491596    0.032576    1
 609 41892    10.506276    1.510747    3
 610 38844    4.380388    0.748506    1
 611 74197    13.670988    1.687944    1
 612 14201    8.317599    0.390409    2
 613 3908    0.000000    0.556245    2
 614 2459    0.000000    0.290218    2
 615 32027    10.095799    1.188148    3
 616 12870    0.860695    1.482632    2
 617 9880    1.557564    0.711278    2
 618 72784    10.072779    0.756030    1
 619 17521    0.000000    0.431468    2
 620 50283    7.140817    0.883813    3
 621 33536    11.384548    1.438307    3
 622 9452    3.214568    1.083536    2
 623 37457    11.720655    0.301636    3
 624 17724    6.374475    1.475925    3
 625 43869    5.749684    0.198875    3
 626 264    3.871808    0.552602    2
 627 25736    8.336309    0.636238    3
 628 39584    9.710442    1.503735    3
 629 31246    1.532611    1.433898    1
 630 49567    9.785785    0.984614    3
 631 7052    2.633627    1.097866    2
 632 35493    9.238935    0.494701    3
 633 10986    1.205656    1.398803    2
 634 49508    3.124909    1.670121    1
 635 5734    7.935489    1.585044    2
 636 65479    12.746636    1.560352    1
 637 77268    10.732563    0.545321    1
 638 28490    3.977403    0.766103    1
 639 13546    4.194426    0.450663    2
 640 37166    9.610286    0.142912    3
 641 16381    4.797555    1.260455    2
 642 10848    1.615279    0.093002    2
 643 35405    4.614771    1.027105    1
 644 15917    0.000000    1.369726    2
 645 6131    0.608457    0.512220    2
 646 67432    6.558239    0.667579    1
 647 30354    12.315116    0.197068    3
 648 69696    7.014973    1.494616    1
 649 33481    8.822304    1.194177    3
 650 43075    10.086796    0.570455    3
 651 38343    7.241614    1.661627    3
 652 14318    4.602395    1.511768    2
 653 5367    7.434921    0.079792    2
 654 37894    10.467570    1.595418    3
 655 36172    9.948127    0.003663    3
 656 40123    2.478529    1.568987    1
 657 10976    5.938545    0.878540    2
 658 12705    0.000000    0.948004    2
 659 12495    5.559181    1.357926    2
 660 35681    9.776654    0.535966    3
 661 46202    3.092056    0.490906    1
 662 11505    0.000000    1.623311    2
 663 22834    4.459495    0.538867    1
 664 49901    8.334306    1.646600    3
 665 71932    11.226654    0.384686    1
 666 13279    3.904737    1.597294    2
 667 49112    7.038205    1.211329    3
 668 77129    9.836120    1.054340    1
 669 37447    1.990976    0.378081    1
 670 62397    9.005302    0.485385    1
 671 0    1.772510    1.039873    2
 672 15476    0.458674    0.819560    2
 673 40625    10.003919    0.231658    3
 674 36706    0.520807    1.476008    1
 675 28580    10.678214    1.431837    3
 676 25862    4.425992    1.363842    1
 677 63488    12.035355    0.831222    1
 678 33944    10.606732    1.253858    3
 679 30099    1.568653    0.684264    1
 680 13725    2.545434    0.024271    2
 681 36768    10.264062    0.982593    3
 682 64656    9.866276    0.685218    1
 683 14927    0.142704    0.057455    2
 684 43231    9.853270    1.521432    3
 685 66087    6.596604    1.653574    1
 686 19806    2.602287    1.321481    2
 687 41081    10.411776    0.664168    3
 688 10277    7.083449    0.622589    2
 689 7014    2.080068    1.254441    2
 690 17275    0.522844    1.622458    2
 691 31600    10.362000    1.544827    3
 692 59956    3.412967    1.035410    1
 693 42181    6.796548    1.112153    3
 694 51743    4.092035    0.075804    1
 695 5194    2.763811    1.564325    2
 696 30832    12.547439    1.402443    3
 697 7976    5.708052    1.596152    2
 698 14602    4.558025    0.375806    2
 699 41571    11.642307    0.438553    3
 700 55028    3.222443    0.121399    1
 701 5837    4.736156    0.029871    2
 702 39808    10.839526    0.836323    3
 703 20944    4.194791    0.235483    2
 704 22146    14.936259    0.888582    3
 705 42169    3.310699    1.521855    1
 706 7010    2.971931    0.034321    2
 707 3807    9.261667    0.537807    2
 708 29241    7.791833    1.111416    3
 709 52696    1.480470    1.028750    1
 710 42545    3.677287    0.244167    1
 711 24437    2.202967    1.370399    1
 712 16037    5.796735    0.935893    2
 713 8493    3.063333    0.144089    2
 714 68080    11.233094    0.492487    1
 715 59016    1.965570    0.005697    1
 716 11810    8.616719    0.137419    2
 717 68630    6.609989    1.083505    1
 718 7629    1.712639    1.086297    2
 719 71992    10.117445    1.299319    1
 720 13398    0.000000    1.104178    2
 721 26241    9.824777    1.346821    3
 722 11160    1.653089    0.980949    2
 723 76701    18.178822    1.473671    1
 724 32174    6.781126    0.885340    3
 725 45043    8.206750    1.549223    3
 726 42173    10.081853    1.376745    3
 727 69801    6.288742    0.112799    1
 728 41737    3.695937    1.543589    1
 729 46979    6.726151    1.069380    3
 730 79267    12.969999    1.568223    1
 731 4615    2.661390    1.531933    2
 732 32907    7.072764    1.117386    3
 733 37444    9.123366    1.318988    3
 734 569    3.743946    1.039546    2
 735 8723    2.341300    0.219361    2
 736 6024    0.541913    0.592348    2
 737 52252    2.310828    1.436753    1
 738 8358    6.226597    1.427316    2
 739 26166    7.277876    0.489252    3
 740 18471    0.000000    0.389459    2
 741 3386    7.218221    1.098828    2
 742 41544    8.777129    1.111464    3
 743 10480    2.813428    0.819419    2
 744 5894    2.268766    1.412130    2
 745 7273    6.283627    0.571292    2
 746 22272    7.520081    1.626868    3
 747 31369    11.739225    0.027138    3
 748 10708    3.746883    0.877350    2
 749 69364    12.089835    0.521631    1
 750 37760    12.310404    0.259339    3
 751 13004    0.000000    0.671355    2
 752 37885    2.728800    0.331502    1
 753 52555    10.814342    0.607652    3
 754 38997    12.170268    0.844205    3
 755 69698    6.698371    0.240084    1
 756 11783    3.632672    1.643479    2
 757 47636    10.059991    0.892361    3
 758 15744    1.887674    0.756162    2
 759 69058    8.229125    0.195886    1
 760 33057    7.817082    0.476102    3
 761 28681    12.277230    0.076805    3
 762 34042    10.055337    1.115778    3
 763 29928    3.596002    1.485952    1
 764 9734    2.755530    1.420655    2
 765 7344    7.780991    0.513048    2
 766 7387    0.093705    0.391834    2
 767 33957    8.481567    0.520078    3
 768 9936    3.865584    0.110062    2
 769 36094    9.683709    0.779984    3
 770 39835    10.617255    1.359970    3
 771 64486    7.203216    1.624762    1
 772 0    7.601414    1.215605    2
 773 39539    1.386107    1.417070    1
 774 66972    9.129253    0.594089    1
 775 15029    1.363447    0.620841    2
 776 44909    3.181399    0.359329    1
 777 38183    13.365414    0.217011    3
 778 37372    4.207717    1.289767    1
 779 0    4.088395    0.870075    2
 780 17786    3.327371    1.142505    2
 781 39055    1.303323    1.235650    1
 782 37045    7.999279    1.581763    3
 783 6435    2.217488    0.864536    2
 784 72265    7.751808    0.192451    1
 785 28152    14.149305    1.591532    3
 786 25931    8.765721    0.152808    3
 787 7538    3.408996    0.184896    2
 788 1315    1.251021    0.112340    2
 789 12292    6.160619    1.537165    2
 790 49248    1.034538    1.585162    1
 791 9025    0.000000    1.034635    2
 792 13438    2.355051    0.542603    2
 793 69683    6.614543    0.153771    1
 794 25374    10.245062    1.450903    3
 795 55264    3.467074    1.231019    1
 796 38324    7.487678    1.572293    3
 797 69643    4.624115    1.185192    1
 798 44058    8.995957    1.436479    3
 799 41316    11.564476    0.007195    3
 800 29119    3.440948    0.078331    1
 801 51656    1.673603    0.732746    1
 802 3030    4.719341    0.699755    2
 803 35695    10.304798    1.576488    3
 804 1537    2.086915    1.199312    2
 805 9083    6.338220    1.131305    2
 806 47744    8.254926    0.710694    3
 807 71372    16.067108    0.974142    1
 808 37980    1.723201    0.310488    1
 809 42385    3.785045    0.876904    1
 810 22687    2.557561    0.123738    1
 811 39512    9.852220    1.095171    3
 812 11885    3.679147    1.557205    2
 813 4944    9.789681    0.852971    2
 814 73230    14.958998    0.526707    1
 815 17585    11.182148    1.288459    3
 816 68737    7.528533    1.657487    1
 817 13818    5.253802    1.378603    2
 818 31662    13.946752    1.426657    3
 819 86686    15.557263    1.430029    1
 820 43214    12.483550    0.688513    3
 821 24091    2.317302    1.411137    1
 822 52544    10.069724    0.766119    3
 823 61861    5.792231    1.615483    1
 824 47903    4.138435    0.475994    1
 825 37190    12.929517    0.304378    3
 826 6013    9.378238    0.307392    2
 827 27223    8.361362    1.643204    3
 828 69027    7.939406    1.325042    1
 829 78642    10.735384    0.705788    1
 830 30254    11.592723    0.286188    3
 831 21704    10.098356    0.704748    3
 832 34985    9.299025    0.545337    3
 833 31316    11.158297    0.218067    3
 834 76368    16.143900    0.558388    1
 835 27953    10.971700    1.221787    3
 836 152    0.000000    0.681478    2
 837 9146    3.178961    1.292692    2
 838 75346    17.625350    0.339926    1
 839 26376    1.995833    0.267826    1
 840 35255    10.640467    0.416181    3
 841 19198    9.628339    0.985462    3
 842 12518    4.662664    0.495403    2
 843 25453    5.754047    1.382742    2
 844 12530    0.000000    0.037146    2
 845 62230    9.334332    0.198118    1
 846 9517    3.846162    0.619968    2
 847 71161    10.685084    0.678179    1
 848 1593    4.752134    0.359205    2
 849 33794    0.697630    0.966786    1
 850 39710    10.365836    0.505898    3
 851 16941    0.461478    0.352865    2
 852 69209    11.339537    1.068740    1
 853 4446    5.420280    0.127310    2
 854 9347    3.469955    1.619947    2
 855 55635    8.517067    0.994858    3
 856 65889    8.306512    0.413690    1
 857 10753    2.628690    0.444320    2
 858 7055    0.000000    0.802985    2
 859 7905    0.000000    1.170397    2
 860 53447    7.298767    1.582346    3
 861 9194    7.331319    1.277988    2
 862 61914    9.392269    0.151617    1
 863 15630    5.541201    1.180596    2
 864 79194    15.149460    0.537540    1
 865 12268    5.515189    0.250562    2
 866 33682    7.728898    0.920494    3
 867 26080    11.318785    1.510979    3
 868 19119    3.574709    1.531514    2
 869 30902    7.350965    0.026332    3
 870 63039    7.122363    1.630177    1
 871 51136    1.828412    1.013702    1
 872 35262    10.117989    1.156862    3
 873 42776    11.309897    0.086291    3
 874 64191    8.342034    1.388569    1
 875 15436    0.241714    0.715577    2
 876 14402    10.482619    1.694972    2
 877 6341    9.289510    1.428879    2
 878 14113    4.269419    0.134181    2
 879 6390    0.000000    0.189456    2
 880 8794    0.817119    0.143668    2
 881 43432    1.508394    0.652651    1
 882 38334    9.359918    0.052262    3
 883 34068    10.052333    0.550423    3
 884 30819    11.111660    0.989159    3
 885 22239    11.265971    0.724054    3
 886 28725    10.383830    0.254836    3
 887 57071    3.878569    1.377983    1
 888 72420    13.679237    0.025346    1
 889 28294    10.526846    0.781569    3
 890 9896    0.000000    0.924198    2
 891 65821    4.106727    1.085669    1
 892 7645    8.118856    1.470686    2
 893 71289    7.796874    0.052336    1
 894 5128    2.789669    1.093070    2
 895 13711    6.226962    0.287251    2
 896 22240    10.169548    1.660104    3
 897 15092    0.000000    1.370549    2
 898 5017    7.513353    0.137348    2
 899 10141    8.240793    0.099735    2
 900 35570    14.612797    1.247390    3
 901 46893    3.562976    0.445386    1
 902 8178    3.230482    1.331698    2
 903 55783    3.612548    1.551911    1
 904 1148    0.000000    0.332365    2
 905 10062    3.931299    0.487577    2
 906 74124    14.752342    1.155160    1
 907 66603    10.261887    1.628085    1
 908 11893    2.787266    1.570402    2
 909 50908    15.112319    1.324132    3
 910 39891    5.184553    0.223382    3
 911 65915    3.868359    0.128078    1
 912 65678    3.507965    0.028904    1
 913 62996    11.019254    0.427554    1
 914 36851    3.812387    0.655245    1
 915 36669    11.056784    0.378725    3
 916 38876    8.826880    1.002328    3
 917 26878    11.173861    1.478244    3
 918 46246    11.506465    0.421993    3
 919 12761    7.798138    0.147917    3
 920 35282    10.155081    1.370039    3
 921 68306    10.645275    0.693453    1
 922 31262    9.663200    1.521541    3
 923 34754    10.790404    1.312679    3
 924 13408    2.810534    0.219962    2
 925 30365    9.825999    1.388500    3
 926 10709    1.421316    0.677603    2
 927 24332    11.123219    0.809107    3
 928 45517    13.402206    0.661524    3
 929 6178    1.212255    0.836807    2
 930 10639    1.568446    1.297469    2
 931 29613    3.343473    1.312266    1
 932 22392    5.400155    0.193494    1
 933 51126    3.818754    0.590905    1
 934 53644    7.973845    0.307364    3
 935 51417    9.078824    0.734876    3
 936 24859    0.153467    0.766619    1
 937 61732    8.325167    0.028479    1
 938 71128    7.092089    1.216733    1
 939 27276    5.192485    1.094409    3
 940 30453    10.340791    1.087721    3
 941 18670    2.077169    1.019775    2
 942 70600    10.151966    0.993105    1
 943 12683    0.046826    0.809614    2
 944 81597    11.221874    1.395015    1
 945 69959    14.497963    1.019254    1
 946 8124    3.554508    0.533462    2
 947 18867    3.522673    0.086725    2
 948 80886    14.531655    0.380172    1
 949 55895    3.027528    0.885457    1
 950 31587    1.845967    0.488985    1
 951 10591    10.226164    0.804403    3
 952 70096    10.965926    1.212328    1
 953 53151    2.129921    1.477378    1
 954 11992    0.000000    1.606849    2
 955 33114    9.489005    0.827814    3
 956 7413    0.000000    1.020797    2
 957 10583    0.000000    1.270167    2
 958 58668    6.556676    0.055183    1
 959 35018    9.959588    0.060020    3
 960 70843    7.436056    1.479856    1
 961 14011    0.404888    0.459517    2
 962 35015    9.952942    1.650279    3
 963 70839    15.600252    0.021935    1
 964 3024    2.723846    0.387455    2
 965 5526    0.513866    1.323448    2
 966 5113    0.000000    0.861859    2
 967 20851    7.280602    1.438470    2
 968 40999    9.161978    1.110180    3
 969 15823    0.991725    0.730979    2
 970 35432    7.398380    0.684218    3
 971 53711    12.149747    1.389088    3
 972 64371    9.149678    0.874905    1
 973 9289    9.666576    1.370330    2
 974 60613    3.620110    0.287767    1
 975 18338    5.238800    1.253646    2
 976 22845    14.715782    1.503758    3
 977 74676    14.445740    1.211160    1
 978 34143    13.609528    0.364240    3
 979 14153    3.141585    0.424280    2
 980 9327    0.000000    0.120947    2
 981 18991    0.454750    1.033280    2
 982 9193    0.510310    0.016395    2
 983 2285    3.864171    0.616349    2
 984 9493    6.724021    0.563044    2
 985 2371    4.289375    0.012563    2
 986 13963    0.000000    1.437030    2
 987 2299    3.733617    0.698269    2
 988 5262    2.002589    1.380184    2
 989 4659    2.502627    0.184223    2
 990 17582    6.382129    0.876581    2
 991 27750    8.546741    0.128706    3
 992 9868    2.694977    0.432818    2
 993 18333    3.951256    0.333300    2
 994 3780    9.856183    0.329181    2
 995 18190    2.068962    0.429927    2
 996 11145    3.410627    0.631838    2
 997 68846    9.974715    0.669787    1
 998 26575    10.650102    0.866627    3
 999 48111    9.134528    0.728045    3
1000 43757    7.882601    1.332446    3
输入数据集

2.2 KNN算法实现

 myKNN.py

  1 # -*- coding: utf-8 -*-
  2 """
  3 Created on Mon Sep 17 15:58:58 2018
  4 KNN(K-Nearest Neighbor) K-近邻算法
  5 @author: weixw
  6 """
  7 
  8 import numpy as np
  9 import operator
 10 #输入:行测试数据集,训练数据集,标签数据集,用于选择最近邻居的数目
 11 #功能:根据欧氏距离公式,找到与未知类别的测试数据距离最小的 k 个点,
 12 #     以这 k 个点出现频率最高的类别座位测试数据的预测分类。
 13 #     欧氏距离公式:测试数据与训练数据对应位置作差,平方和,然后开方
 14 #输出:测试数据预测分类结果
 15 def classify(testDataSet, trainingDataSet, labelList, k):
 16     #训练数据集行数
 17     trainingDataSetSize = trainingDataSet.shape[0]
 18     #np.tile(testDataSet, (trainingDataSetSize,1)沿X轴复制1倍(相当于没有复制),再沿Y轴复制trainingDataSetSize倍,维数:1000*3
 19     #欧氏距离公式实现 
 20     #1 测试数据 - 训练数据
 21     diffMat = np.mat(np.tile(testDataSet, (trainingDataSetSize, 1)) - trainingDataSet)
 22     #2 差平方(需要将matrix转化为数组,否则报错)
 23     sqDiffMat = diffMat.A**2
 24     #3 按行求和 axis = 0(默认按列) axis = 1(按行)
 25     sqDistances = sqDiffMat.sum(axis = 1)
 26     #4 开方
 27     distances = sqDistances**0.5
 28     #agrsort():从小到大排序,返回欧氏距离最小值对应的索引列表
 29     sortedDistIndicies = distances.argsort()
 30     #预测分类计数
 31     predictClassCount = {}
 32     #多数表决方式,选择 k 个欧氏距离最小值 
 33     for i in range(k):
 34         #找到索引对应的标签值
 35         voteLabel = labelList[sortedDistIndicies[i]]
 36         #预测标签值字典,存储索引标签值预测次数
 37         predictClassCount[voteLabel] = predictClassCount.get(voteLabel, 0) + 1
 38     #对象按值逆向(由大到小)排序 
 39     # sorted(iterable[, cmp[, key[, reverse]]])
 40     # itemgetter(1) 取第一项结果
 41     sortedPredictClassCount = sorted(predictClassCount.items(), key = operator.itemgetter(1), reverse = True)
 42     return sortedPredictClassCount[0][0]
 43 
 44 
 45 
 46 #输入:数据文件
 47 #功能:加载文件,文件最后一列是标签数据,分离特征数据集与标签数据集
 48 #     自动检测多少列特征数据并分离
 49 #输出:特征数据集矩阵,标签数据集矩阵
 50 def loadDataSet(fileName):
 51     #特征数据列长度
 52     numberFeat = len(open(fileName).readline().split('\t')) - 1
 53     dataSet = []; labelSet = []
 54     fr = open(fileName)
 55     for line in fr.readlines():
 56         lineArr = []
 57         #去除收尾空格,然后分割每一列
 58         curLine = line.strip().split('\t')
 59         #保存每一列特征数据
 60         for i in range(numberFeat):
 61             lineArr.append(float(curLine[i]))
 62         dataSet.append(lineArr)
 63         labelSet.append(float(curLine[-1]))
 64     return np.mat(dataSet), labelSet
 65 
 66 #输入:原始特征数据集
 67 #功能:数据归一化,使每类数据都在同一范围内 (0, 1) 变化
 68 #     归一化公式:newValue = (oldValue - min)/(max - min)
 69 #输出:归一化后特征数据集,范围数组大小(分母),列最小值数组   
 70 def autoNorm(dataMat):
 71     #min(axis) 无参数:所有值中最小值;axis = 0:每列最小值;axis = 1:每行最小值
 72     #求出每列最小值
 73     minValsMat = dataMat.min(0)
 74     #求出每列最大值
 75     maxValsMat = dataMat.max(0)
 76     #计算差值(对应位置相减)
 77     rangesMat = maxValsMat - minValsMat
 78     #归一化特征数据集初始化,维数:1000*3
 79     normDataMat = np.zeros(np.shape(dataMat))
 80     #原始数据集行数目
 81     m = dataMat.shape[0]
 82     #归一化公式分子实现
 83     #np.tile(minVals, (m,1)沿X轴复制1倍(相当于没有复制),再沿Y轴复制m倍,维数:1000*3
 84     normDataMat = dataMat - np.tile(minValsMat, (m, 1))
 85     #归一化公式实现,求得归一化结果
 86     normDataMat = normDataMat/np.tile(rangesMat, (m, 1))
 87     return normDataMat, rangesMat, minValsMat
 88 
 89 #输入:特征数据集矩阵,标签数据集列表,测试数据与训练数据比例,用于选择最近邻居的数目
 90 #功能:求出测试特征数据集预测分类结果
 91 # 1.解析文件
 92 # 2.通过ratio确定测试数据集
 93 # 3.归一化
 94 # 4.对每一行测试数据运用欧氏距离公式以及多数表决方式预测分类结果
 95 # 5.求出整个测试数据集的预测分类结果
 96 #输出:测试数据预测分类结果
 97 def dataClassify(dataMat, labelList, ratio, k):
 98         
 99     #特征数据集归一化
100     normDataMat, rangesMat, minValsMat = autoNorm(dataMat)
101     #归一化特征数据集行数目
102     m = normDataMat.shape[0]
103     #测试数据集行数目(也就知道训练数据集行数)
104     testDataNum = int(m*ratio)
105     #预测分类错误计数
106     errorCount = 0.0
107     for i in range(testDataNum):
108         #求出测试数据集每行预测分类
109         classifierResult = classify(normDataMat[i, :], normDataMat[testDataNum:m, :], labelList[testDataNum:m], k)
110         print ("the classifier result is: %d, the real answer is: %d"% (classifierResult, labelList[i]))
111         #统计错误预测分类
112         if(classifierResult != labelList[i]):
113             errorCount += 1.0
114     print ("the total error count is %d"% errorCount)
115     print ("the total error rate is: %f"%(errorCount/float(testDataNum)))
116     
117 
118 #绘制散点图
119 def drawScatter(filename):
120     import matplotlib.pyplot as plt
121     #加载文件,分离特征数据集和标签数据集
122     dataMat, labelList = loadDataSet(filename)
123     #矩阵转化为数组
124     dataArr = dataMat.A
125     #创建一副图画
126     plt.figure()
127     #保存标签类型相同的索引值(观察标签数据集,有3种不同类型)
128     label_idx1 = []; label_idx2 = []; label_idx3 = []
129     #遍历标签数组,索引,值
130     for index, value in enumerate(labelList):
131         if(value == 1):
132             label_idx1.append(index)
133         elif(value == 2):
134             label_idx2.append(index)
135         else:
136             label_idx3.append(index)
137     #scatter(x,y,s,maker,color,label)
138     #x,y必须是数组类型,s表示形状大小,maker:形状
139     plt.scatter(dataArr[label_idx1, 1], dataArr[label_idx1, 2], marker = 'x', color = 'm', label = 'no like', s = 30)
140     plt.scatter(dataArr[label_idx2, 1], dataArr[label_idx2, 2], marker = '+', color = 'c', label = 'like', s = 50)
141     plt.scatter(dataArr[label_idx3, 1], dataArr[label_idx3, 2], marker = 'o', color = 'r', label = 'very like', s = 15)
142     plt.legend(loc = 'upper right')
143         
144         
KNN算法实现

2.3 测试代码

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Tue Sep 18 14:07:14 2018
 4 测试KNN算法
 5 @author: weixw
 6 """
 7 import myKNN as mk
 8 #前50%是测试数据,后50%作为训练数据
 9 ratio = 0.5
10 #选择邻居数目
11 #errCount:31 errRate:6.2%
12 k = 4
13 #errCount:30 errRate:6.0%
14 #k = 8
15 #errCount:30 errRate:6.0%
16 #k = 12
17 #errCount:33 errRate:6.6%
18 #k = 16
19 #errCount:32 errRate:6.4%
20 #k = 20
21 
22 
23 
24 fileName = 'datingTestSet2.txt'
25 #绘制数据散点图
26 mk.drawScatter(fileName)
27 #加载文件,分离特征数据集和标签数据集
28 dataMat, labelList = mk.loadDataSet(fileName)
29 #预测测试数据结果
30 mk.dataClassify(dataMat, labelList, ratio, k)
测试代码

2.4 运行结果

输入数据的散点图:

        

k = 4 ,ratio = 0.5(一半测试数据,一半训练数据)时分类结果:

        

在 k为不同值时运行结果:

        

可以看出,并不是 k越大,正确率越高,会产生过拟合。

3. 优缺点

优点:

1. 简单,易于理解,易于实现,无需训练;

2. 精度高,对异常值不敏感;

缺点:

计算复杂度高,空间复杂度高。

4. 参考文献

《机器学习实战》

《统计学习方法》

  知乎:https://www.zhihu.com/search?type=content&q=KNN

  博客:https://www.cnblogs.com/ybjourney/p/4702562.html

         

 

 

不要让懒惰占据你的大脑,不要让妥协拖垮了你的人生。青春就是一张票,能不能赶上时代的快车,你的步伐就掌握在你的脚下。

posted @ 2018-09-18 23:09  w_x_w1985  阅读(4677)  评论(0编辑  收藏  举报