机器学习之KNN原理与代码实现
KNN原理与代码实现
本文系作者原创,转载请注明出处:https://www.cnblogs.com/further-further-further/p/9670187.html
1. KNN原理
KNN(k-Nearest Neighbour):K-近邻算法,主要思想可以归结为一个成语:物以类聚
1.1 工作原理
给定一个训练数据集,对新的输入实例,在训练数据集中找到与该实例最邻近的 k (k <= 20)个实例,这 k 个实例的多数属于某个类,
就把该输入实例分为这个类。
https://www.cnblogs.com/ybjourney/p/4702562.html给出的例子很形象,这里借用一下。
如下图,绿色圆要被决定赋予哪个类,是红色三角形还是蓝色四方形?如果K=3,由于红色三角形所占比例为2/3,绿色圆将被赋予红色三角形那个类,
如果K=5,由于蓝色四方形比例为3/5,因此绿色圆被赋予蓝色四方形类。
由此也说明了KNN算法的结果很大程度取决于K的选择。
1.2 欧氏距离公式
计算两个向量点xA和xB之间的距离
1.3 分类决策规则(如多数表决)
决定 类别 , 为指示函数,即当 时 为 1,否则 为0。
1.4 算法流程
对未知类别属性的数据集中的每个点依次执行以下操作:
1. 计算已知类别数据集中的点与当前点之间的距离;
2. 按照距离递增次序排序;
3. 选取与当前点距离最小的 k 个点;
4. 确定前 k 个点所在类别的出现频率;
5. 返回前 k 个点出现频率最高的类别作为当前点的预测分类;
2. 代码实现
python3.6
每个方法的作用,以及每行代码的作用,同样我都做了详细的注解。
希望大家最好自己能实现一下,特别是在运算时 list,array,matrix之间的关系以及运用场景,
只有在你自己实现时,才能理清这三者的作用以及关系。
2.1 输入数据
datingTestSet2.txt :约会网站数据(三种类型:不喜欢的人,魅力一般的人,极具魅力的人)
1 40920 8.326976 0.953952 3 2 14488 7.153469 1.673904 2 3 26052 1.441871 0.805124 1 4 75136 13.147394 0.428964 1 5 38344 1.669788 0.134296 1 6 72993 10.141740 1.032955 1 7 35948 6.830792 1.213192 3 8 42666 13.276369 0.543880 3 9 67497 8.631577 0.749278 1 10 35483 12.273169 1.508053 3 11 50242 3.723498 0.831917 1 12 63275 8.385879 1.669485 1 13 5569 4.875435 0.728658 2 14 51052 4.680098 0.625224 1 15 77372 15.299570 0.331351 1 16 43673 1.889461 0.191283 1 17 61364 7.516754 1.269164 1 18 69673 14.239195 0.261333 1 19 15669 0.000000 1.250185 2 20 28488 10.528555 1.304844 3 21 6487 3.540265 0.822483 2 22 37708 2.991551 0.833920 1 23 22620 5.297865 0.638306 2 24 28782 6.593803 0.187108 3 25 19739 2.816760 1.686209 2 26 36788 12.458258 0.649617 3 27 5741 0.000000 1.656418 2 28 28567 9.968648 0.731232 3 29 6808 1.364838 0.640103 2 30 41611 0.230453 1.151996 1 31 36661 11.865402 0.882810 3 32 43605 0.120460 1.352013 1 33 15360 8.545204 1.340429 3 34 63796 5.856649 0.160006 1 35 10743 9.665618 0.778626 2 36 70808 9.778763 1.084103 1 37 72011 4.932976 0.632026 1 38 5914 2.216246 0.587095 2 39 14851 14.305636 0.632317 3 40 33553 12.591889 0.686581 3 41 44952 3.424649 1.004504 1 42 17934 0.000000 0.147573 2 43 27738 8.533823 0.205324 3 44 29290 9.829528 0.238620 3 45 42330 11.492186 0.263499 3 46 36429 3.570968 0.832254 1 47 39623 1.771228 0.207612 1 48 32404 3.513921 0.991854 1 49 27268 4.398172 0.975024 1 50 5477 4.276823 1.174874 2 51 14254 5.946014 1.614244 2 52 68613 13.798970 0.724375 1 53 41539 10.393591 1.663724 3 54 7917 3.007577 0.297302 2 55 21331 1.031938 0.486174 2 56 8338 4.751212 0.064693 2 57 5176 3.692269 1.655113 2 58 18983 10.448091 0.267652 3 59 68837 10.585786 0.329557 1 60 13438 1.604501 0.069064 2 61 48849 3.679497 0.961466 1 62 12285 3.795146 0.696694 2 63 7826 2.531885 1.659173 2 64 5565 9.733340 0.977746 2 65 10346 6.093067 1.413798 2 66 1823 7.712960 1.054927 2 67 9744 11.470364 0.760461 3 68 16857 2.886529 0.934416 2 69 39336 10.054373 1.138351 3 70 65230 9.972470 0.881876 1 71 2463 2.335785 1.366145 2 72 27353 11.375155 1.528626 3 73 16191 0.000000 0.605619 2 74 12258 4.126787 0.357501 2 75 42377 6.319522 1.058602 1 76 25607 8.680527 0.086955 3 77 77450 14.856391 1.129823 1 78 58732 2.454285 0.222380 1 79 46426 7.292202 0.548607 3 80 32688 8.745137 0.857348 3 81 64890 8.579001 0.683048 1 82 8554 2.507302 0.869177 2 83 28861 11.415476 1.505466 3 84 42050 4.838540 1.680892 1 85 32193 10.339507 0.583646 3 86 64895 6.573742 1.151433 1 87 2355 6.539397 0.462065 2 88 0 2.209159 0.723567 2 89 70406 11.196378 0.836326 1 90 57399 4.229595 0.128253 1 91 41732 9.505944 0.005273 3 92 11429 8.652725 1.348934 3 93 75270 17.101108 0.490712 1 94 5459 7.871839 0.717662 2 95 73520 8.262131 1.361646 1 96 40279 9.015635 1.658555 3 97 21540 9.215351 0.806762 3 98 17694 6.375007 0.033678 2 99 22329 2.262014 1.022169 1 100 46570 5.677110 0.709469 1 101 42403 11.293017 0.207976 3 102 33654 6.590043 1.353117 1 103 9171 4.711960 0.194167 2 104 28122 8.768099 1.108041 3 105 34095 11.502519 0.545097 3 106 1774 4.682812 0.578112 2 107 40131 12.446578 0.300754 3 108 13994 12.908384 1.657722 3 109 77064 12.601108 0.974527 1 110 11210 3.929456 0.025466 2 111 6122 9.751503 1.182050 3 112 15341 3.043767 0.888168 2 113 44373 4.391522 0.807100 1 114 28454 11.695276 0.679015 3 115 63771 7.879742 0.154263 1 116 9217 5.613163 0.933632 2 117 69076 9.140172 0.851300 1 118 24489 4.258644 0.206892 1 119 16871 6.799831 1.221171 2 120 39776 8.752758 0.484418 3 121 5901 1.123033 1.180352 2 122 40987 10.833248 1.585426 3 123 7479 3.051618 0.026781 2 124 38768 5.308409 0.030683 3 125 4933 1.841792 0.028099 2 126 32311 2.261978 1.605603 1 127 26501 11.573696 1.061347 3 128 37433 8.038764 1.083910 3 129 23503 10.734007 0.103715 3 130 68607 9.661909 0.350772 1 131 27742 9.005850 0.548737 3 132 11303 0.000000 0.539131 2 133 0 5.757140 1.062373 2 134 32729 9.164656 1.624565 3 135 24619 1.318340 1.436243 1 136 42414 14.075597 0.695934 3 137 20210 10.107550 1.308398 3 138 33225 7.960293 1.219760 3 139 54483 6.317292 0.018209 1 140 18475 12.664194 0.595653 3 141 33926 2.906644 0.581657 1 142 43865 2.388241 0.913938 1 143 26547 6.024471 0.486215 3 144 44404 7.226764 1.255329 3 145 16674 4.183997 1.275290 2 146 8123 11.850211 1.096981 3 147 42747 11.661797 1.167935 3 148 56054 3.574967 0.494666 1 149 10933 0.000000 0.107475 2 150 18121 7.937657 0.904799 3 151 11272 3.365027 1.014085 2 152 16297 0.000000 0.367491 2 153 28168 13.860672 1.293270 3 154 40963 10.306714 1.211594 3 155 31685 7.228002 0.670670 3 156 55164 4.508740 1.036192 1 157 17595 0.366328 0.163652 2 158 1862 3.299444 0.575152 2 159 57087 0.573287 0.607915 1 160 63082 9.183738 0.012280 1 161 51213 7.842646 1.060636 3 162 6487 4.750964 0.558240 2 163 4805 11.438702 1.556334 3 164 30302 8.243063 1.122768 3 165 68680 7.949017 0.271865 1 166 17591 7.875477 0.227085 2 167 74391 9.569087 0.364856 1 168 37217 7.750103 0.869094 3 169 42814 0.000000 1.515293 1 170 14738 3.396030 0.633977 2 171 19896 11.916091 0.025294 3 172 14673 0.460758 0.689586 2 173 32011 13.087566 0.476002 3 174 58736 4.589016 1.672600 1 175 54744 8.397217 1.534103 1 176 29482 5.562772 1.689388 1 177 27698 10.905159 0.619091 3 178 11443 1.311441 1.169887 2 179 56117 10.647170 0.980141 3 180 39514 0.000000 0.481918 1 181 26627 8.503025 0.830861 3 182 16525 0.436880 1.395314 2 183 24368 6.127867 1.102179 1 184 22160 12.112492 0.359680 3 185 6030 1.264968 1.141582 2 186 6468 6.067568 1.327047 2 187 22945 8.010964 1.681648 3 188 18520 3.791084 0.304072 2 189 34914 11.773195 1.262621 3 190 6121 8.339588 1.443357 2 191 38063 2.563092 1.464013 1 192 23410 5.954216 0.953782 1 193 35073 9.288374 0.767318 3 194 52914 3.976796 1.043109 1 195 16801 8.585227 1.455708 3 196 9533 1.271946 0.796506 2 197 16721 0.000000 0.242778 2 198 5832 0.000000 0.089749 2 199 44591 11.521298 0.300860 3 200 10143 1.139447 0.415373 2 201 21609 5.699090 1.391892 2 202 23817 2.449378 1.322560 1 203 15640 0.000000 1.228380 2 204 8847 3.168365 0.053993 2 205 50939 10.428610 1.126257 3 206 28521 2.943070 1.446816 1 207 32901 10.441348 0.975283 3 208 42850 12.478764 1.628726 3 209 13499 5.856902 0.363883 2 210 40345 2.476420 0.096075 1 211 43547 1.826637 0.811457 1 212 70758 4.324451 0.328235 1 213 19780 1.376085 1.178359 2 214 44484 5.342462 0.394527 1 215 54462 11.835521 0.693301 3 216 20085 12.423687 1.424264 3 217 42291 12.161273 0.071131 3 218 47550 8.148360 1.649194 3 219 11938 1.531067 1.549756 2 220 40699 3.200912 0.309679 1 221 70908 8.862691 0.530506 1 222 73989 6.370551 0.369350 1 223 11872 2.468841 0.145060 2 224 48463 11.054212 0.141508 3 225 15987 2.037080 0.715243 2 226 70036 13.364030 0.549972 1 227 32967 10.249135 0.192735 3 228 63249 10.464252 1.669767 1 229 42795 9.424574 0.013725 3 230 14459 4.458902 0.268444 2 231 19973 0.000000 0.575976 2 232 5494 9.686082 1.029808 3 233 67902 13.649402 1.052618 1 234 25621 13.181148 0.273014 3 235 27545 3.877472 0.401600 1 236 58656 1.413952 0.451380 1 237 7327 4.248986 1.430249 2 238 64555 8.779183 0.845947 1 239 8998 4.156252 0.097109 2 240 11752 5.580018 0.158401 2 241 76319 15.040440 1.366898 1 242 27665 12.793870 1.307323 3 243 67417 3.254877 0.669546 1 244 21808 10.725607 0.588588 3 245 15326 8.256473 0.765891 2 246 20057 8.033892 1.618562 3 247 79341 10.702532 0.204792 1 248 15636 5.062996 1.132555 2 249 35602 10.772286 0.668721 3 250 28544 1.892354 0.837028 1 251 57663 1.019966 0.372320 1 252 78727 15.546043 0.729742 1 253 68255 11.638205 0.409125 1 254 14964 3.427886 0.975616 2 255 21835 11.246174 1.475586 3 256 7487 0.000000 0.645045 2 257 8700 0.000000 1.424017 2 258 26226 8.242553 0.279069 3 259 65899 8.700060 0.101807 1 260 6543 0.812344 0.260334 2 261 46556 2.448235 1.176829 1 262 71038 13.230078 0.616147 1 263 47657 0.236133 0.340840 1 264 19600 11.155826 0.335131 3 265 37422 11.029636 0.505769 3 266 1363 2.901181 1.646633 2 267 26535 3.924594 1.143120 1 268 47707 2.524806 1.292848 1 269 38055 3.527474 1.449158 1 270 6286 3.384281 0.889268 2 271 10747 0.000000 1.107592 2 272 44883 11.898890 0.406441 3 273 56823 3.529892 1.375844 1 274 68086 11.442677 0.696919 1 275 70242 10.308145 0.422722 1 276 11409 8.540529 0.727373 2 277 67671 7.156949 1.691682 1 278 61238 0.720675 0.847574 1 279 17774 0.229405 1.038603 2 280 53376 3.399331 0.077501 1 281 30930 6.157239 0.580133 1 282 28987 1.239698 0.719989 1 283 13655 6.036854 0.016548 2 284 7227 5.258665 0.933722 2 285 40409 12.393001 1.571281 3 286 13605 9.627613 0.935842 2 287 26400 11.130453 0.597610 3 288 13491 8.842595 0.349768 3 289 30232 10.690010 1.456595 3 290 43253 5.714718 1.674780 3 291 55536 3.052505 1.335804 1 292 8807 0.000000 0.059025 2 293 25783 9.945307 1.287952 3 294 22812 2.719723 1.142148 1 295 77826 11.154055 1.608486 1 296 38172 2.687918 0.660836 1 297 31676 10.037847 0.962245 3 298 74038 12.404762 1.112080 1 299 44738 10.237305 0.633422 3 300 17410 4.745392 0.662520 2 301 5688 4.639461 1.569431 2 302 36642 3.149310 0.639669 1 303 29956 13.406875 1.639194 3 304 60350 6.068668 0.881241 1 305 23758 9.477022 0.899002 3 306 25780 3.897620 0.560201 2 307 11342 5.463615 1.203677 2 308 36109 3.369267 1.575043 1 309 14292 5.234562 0.825954 2 310 11160 0.000000 0.722170 2 311 23762 12.979069 0.504068 3 312 39567 5.376564 0.557476 1 313 25647 13.527910 1.586732 3 314 14814 2.196889 0.784587 2 315 73590 10.691748 0.007509 1 316 35187 1.659242 0.447066 1 317 49459 8.369667 0.656697 3 318 31657 13.157197 0.143248 3 319 6259 8.199667 0.908508 2 320 33101 4.441669 0.439381 3 321 27107 9.846492 0.644523 3 322 17824 0.019540 0.977949 2 323 43536 8.253774 0.748700 3 324 67705 6.038620 1.509646 1 325 35283 6.091587 1.694641 3 326 71308 8.986820 1.225165 1 327 31054 11.508473 1.624296 3 328 52387 8.807734 0.713922 3 329 40328 0.000000 0.816676 1 330 34844 8.889202 1.665414 3 331 11607 3.178117 0.542752 2 332 64306 7.013795 0.139909 1 333 32721 9.605014 0.065254 3 334 33170 1.230540 1.331674 1 335 37192 10.412811 0.890803 3 336 13089 0.000000 0.567161 2 337 66491 9.699991 0.122011 1 338 15941 0.000000 0.061191 2 339 4272 4.455293 0.272135 2 340 48812 3.020977 1.502803 1 341 28818 8.099278 0.216317 3 342 35394 1.157764 1.603217 1 343 71791 10.105396 0.121067 1 344 40668 11.230148 0.408603 3 345 39580 9.070058 0.011379 3 346 11786 0.566460 0.478837 2 347 19251 0.000000 0.487300 2 348 56594 8.956369 1.193484 3 349 54495 1.523057 0.620528 1 350 11844 2.749006 0.169855 2 351 45465 9.235393 0.188350 3 352 31033 10.555573 0.403927 3 353 16633 6.956372 1.519308 2 354 13887 0.636281 1.273984 2 355 52603 3.574737 0.075163 1 356 72000 9.032486 1.461809 1 357 68497 5.958993 0.023012 1 358 35135 2.435300 1.211744 1 359 26397 10.539731 1.638248 3 360 7313 7.646702 0.056513 2 361 91273 20.919349 0.644571 1 362 24743 1.424726 0.838447 1 363 31690 6.748663 0.890223 3 364 15432 2.289167 0.114881 2 365 58394 5.548377 0.402238 1 366 33962 6.057227 0.432666 1 367 31442 10.828595 0.559955 3 368 31044 11.318160 0.271094 3 369 29938 13.265311 0.633903 3 370 9875 0.000000 1.496715 2 371 51542 6.517133 0.402519 3 372 11878 4.934374 1.520028 2 373 69241 10.151738 0.896433 1 374 37776 2.425781 1.559467 1 375 68997 9.778962 1.195498 1 376 67416 12.219950 0.657677 1 377 59225 7.394151 0.954434 1 378 29138 8.518535 0.742546 3 379 5962 2.798700 0.662632 2 380 10847 0.637930 0.617373 2 381 70527 10.750490 0.097415 1 382 9610 0.625382 0.140969 2 383 64734 10.027968 0.282787 1 384 25941 9.817347 0.364197 3 385 2763 0.646828 1.266069 2 386 55601 3.347111 0.914294 1 387 31128 11.816892 0.193798 3 388 5181 0.000000 1.480198 2 389 69982 10.945666 0.993219 1 390 52440 10.244706 0.280539 3 391 57350 2.579801 1.149172 1 392 57869 2.630410 0.098869 1 393 56557 11.746200 1.695517 3 394 42342 8.104232 1.326277 3 395 15560 12.409743 0.790295 3 396 34826 12.167844 1.328086 3 397 8569 3.198408 0.299287 2 398 77623 16.055513 0.541052 1 399 78184 7.138659 0.158481 1 400 7036 4.831041 0.761419 2 401 69616 10.082890 1.373611 1 402 21546 10.066867 0.788470 3 403 36715 8.129538 0.329913 3 404 20522 3.012463 1.138108 2 405 42349 3.720391 0.845974 1 406 9037 0.773493 1.148256 2 407 26728 10.962941 1.037324 3 408 587 0.177621 0.162614 2 409 48915 3.085853 0.967899 1 410 9824 8.426781 0.202558 2 411 4135 1.825927 1.128347 2 412 9666 2.185155 1.010173 2 413 59333 7.184595 1.261338 1 414 36198 0.000000 0.116525 1 415 34909 8.901752 1.033527 3 416 47516 2.451497 1.358795 1 417 55807 3.213631 0.432044 1 418 14036 3.974739 0.723929 2 419 42856 9.601306 0.619232 3 420 64007 8.363897 0.445341 1 421 59428 6.381484 1.365019 1 422 13730 0.000000 1.403914 2 423 41740 9.609836 1.438105 3 424 63546 9.904741 0.985862 1 425 30417 7.185807 1.489102 3 426 69636 5.466703 1.216571 1 427 64660 0.000000 0.915898 1 428 14883 4.575443 0.535671 2 429 7965 3.277076 1.010868 2 430 68620 10.246623 1.239634 1 431 8738 2.341735 1.060235 2 432 7544 3.201046 0.498843 2 433 6377 6.066013 0.120927 2 434 36842 8.829379 0.895657 3 435 81046 15.833048 1.568245 1 436 67736 13.516711 1.220153 1 437 32492 0.664284 1.116755 1 438 39299 6.325139 0.605109 3 439 77289 8.677499 0.344373 1 440 33835 8.188005 0.964896 3 441 71890 9.414263 0.384030 1 442 32054 9.196547 1.138253 3 443 38579 10.202968 0.452363 3 444 55984 2.119439 1.481661 1 445 72694 13.635078 0.858314 1 446 42299 0.083443 0.701669 1 447 26635 9.149096 1.051446 3 448 8579 1.933803 1.374388 2 449 37302 14.115544 0.676198 3 450 22878 8.933736 0.943352 3 451 4364 2.661254 0.946117 2 452 4985 0.988432 1.305027 2 453 37068 2.063741 1.125946 1 454 41137 2.220590 0.690754 1 455 67759 6.424849 0.806641 1 456 11831 1.156153 1.613674 2 457 34502 3.032720 0.601847 1 458 4088 3.076828 0.952089 2 459 15199 0.000000 0.318105 2 460 17309 7.750480 0.554015 3 461 42816 10.958135 1.482500 3 462 43751 10.222018 0.488678 3 463 58335 2.367988 0.435741 1 464 75039 7.686054 1.381455 1 465 42878 11.464879 1.481589 3 466 42770 11.075735 0.089726 3 467 8848 3.543989 0.345853 2 468 31340 8.123889 1.282880 3 469 41413 4.331769 0.754467 3 470 12731 0.120865 1.211961 2 471 22447 6.116109 0.701523 3 472 33564 7.474534 0.505790 3 473 48907 8.819454 0.649292 3 474 8762 6.802144 0.615284 2 475 46696 12.666325 0.931960 3 476 36851 8.636180 0.399333 3 477 67639 11.730991 1.289833 1 478 171 8.132449 0.039062 2 479 26674 10.296589 1.496144 3 480 8739 7.583906 1.005764 2 481 66668 9.777806 0.496377 1 482 68732 8.833546 0.513876 1 483 69995 4.907899 1.518036 1 484 82008 8.362736 1.285939 1 485 25054 9.084726 1.606312 3 486 33085 14.164141 0.560970 3 487 41379 9.080683 0.989920 3 488 39417 6.522767 0.038548 3 489 12556 3.690342 0.462281 2 490 39432 3.563706 0.242019 1 491 38010 1.065870 1.141569 1 492 69306 6.683796 1.456317 1 493 38000 1.712874 0.243945 1 494 46321 13.109929 1.280111 3 495 66293 11.327910 0.780977 1 496 22730 4.545711 1.233254 1 497 5952 3.367889 0.468104 2 498 72308 8.326224 0.567347 1 499 60338 8.978339 1.442034 1 500 13301 5.655826 1.582159 2 501 27884 8.855312 0.570684 3 502 11188 6.649568 0.544233 2 503 56796 3.966325 0.850410 1 504 8571 1.924045 1.664782 2 505 4914 6.004812 0.280369 2 506 10784 0.000000 0.375849 2 507 39296 9.923018 0.092192 3 508 13113 2.389084 0.119284 2 509 70204 13.663189 0.133251 1 510 46813 11.434976 0.321216 3 511 11697 0.358270 1.292858 2 512 44183 9.598873 0.223524 3 513 2225 6.375275 0.608040 2 514 29066 11.580532 0.458401 3 515 4245 5.319324 1.598070 2 516 34379 4.324031 1.603481 1 517 44441 2.358370 1.273204 1 518 2022 0.000000 1.182708 2 519 26866 12.824376 0.890411 3 520 57070 1.587247 1.456982 1 521 32932 8.510324 1.520683 3 522 51967 10.428884 1.187734 3 523 44432 8.346618 0.042318 3 524 67066 7.541444 0.809226 1 525 17262 2.540946 1.583286 2 526 79728 9.473047 0.692513 1 527 14259 0.352284 0.474080 2 528 6122 0.000000 0.589826 2 529 76879 12.405171 0.567201 1 530 11426 4.126775 0.871452 2 531 2493 0.034087 0.335848 2 532 19910 1.177634 0.075106 2 533 10939 0.000000 0.479996 2 534 17716 0.994909 0.611135 2 535 31390 11.053664 1.180117 3 536 20375 0.000000 1.679729 2 537 26309 2.495011 1.459589 1 538 33484 11.516831 0.001156 3 539 45944 9.213215 0.797743 3 540 4249 5.332865 0.109288 2 541 6089 0.000000 1.689771 2 542 7513 0.000000 1.126053 2 543 27862 12.640062 1.690903 3 544 39038 2.693142 1.317518 1 545 19218 3.328969 0.268271 2 546 62911 7.193166 1.117456 1 547 77758 6.615512 1.521012 1 548 27940 8.000567 0.835341 3 549 2194 4.017541 0.512104 2 550 37072 13.245859 0.927465 3 551 15585 5.970616 0.813624 2 552 25577 11.668719 0.886902 3 553 8777 4.283237 1.272728 2 554 29016 10.742963 0.971401 3 555 21910 12.326672 1.592608 3 556 12916 0.000000 0.344622 2 557 10976 0.000000 0.922846 2 558 79065 10.602095 0.573686 1 559 36759 10.861859 1.155054 3 560 50011 1.229094 1.638690 1 561 1155 0.410392 1.313401 2 562 71600 14.552711 0.616162 1 563 30817 14.178043 0.616313 3 564 54559 14.136260 0.362388 1 565 29764 0.093534 1.207194 1 566 69100 10.929021 0.403110 1 567 47324 11.432919 0.825959 3 568 73199 9.134527 0.586846 1 569 44461 5.071432 1.421420 1 570 45617 11.460254 1.541749 3 571 28221 11.620039 1.103553 3 572 7091 4.022079 0.207307 2 573 6110 3.057842 1.631262 2 574 79016 7.782169 0.404385 1 575 18289 7.981741 0.929789 3 576 43679 4.601363 0.268326 1 577 22075 2.595564 1.115375 1 578 23535 10.049077 0.391045 3 579 25301 3.265444 1.572970 2 580 32256 11.780282 1.511014 3 581 36951 3.075975 0.286284 1 582 31290 1.795307 0.194343 1 583 38953 11.106979 0.202415 3 584 35257 5.994413 0.800021 1 585 25847 9.706062 1.012182 3 586 32680 10.582992 0.836025 3 587 62018 7.038266 1.458979 1 588 9074 0.023771 0.015314 2 589 33004 12.823982 0.676371 3 590 44588 3.617770 0.493483 1 591 32565 8.346684 0.253317 3 592 38563 6.104317 0.099207 1 593 75668 16.207776 0.584973 1 594 9069 6.401969 1.691873 2 595 53395 2.298696 0.559757 1 596 28631 7.661515 0.055981 3 597 71036 6.353608 1.645301 1 598 71142 10.442780 0.335870 1 599 37653 3.834509 1.346121 1 600 76839 10.998587 0.584555 1 601 9916 2.695935 1.512111 2 602 38889 3.356646 0.324230 1 603 39075 14.677836 0.793183 3 604 48071 1.551934 0.130902 1 605 7275 2.464739 0.223502 2 606 41804 1.533216 1.007481 1 607 35665 12.473921 0.162910 3 608 67956 6.491596 0.032576 1 609 41892 10.506276 1.510747 3 610 38844 4.380388 0.748506 1 611 74197 13.670988 1.687944 1 612 14201 8.317599 0.390409 2 613 3908 0.000000 0.556245 2 614 2459 0.000000 0.290218 2 615 32027 10.095799 1.188148 3 616 12870 0.860695 1.482632 2 617 9880 1.557564 0.711278 2 618 72784 10.072779 0.756030 1 619 17521 0.000000 0.431468 2 620 50283 7.140817 0.883813 3 621 33536 11.384548 1.438307 3 622 9452 3.214568 1.083536 2 623 37457 11.720655 0.301636 3 624 17724 6.374475 1.475925 3 625 43869 5.749684 0.198875 3 626 264 3.871808 0.552602 2 627 25736 8.336309 0.636238 3 628 39584 9.710442 1.503735 3 629 31246 1.532611 1.433898 1 630 49567 9.785785 0.984614 3 631 7052 2.633627 1.097866 2 632 35493 9.238935 0.494701 3 633 10986 1.205656 1.398803 2 634 49508 3.124909 1.670121 1 635 5734 7.935489 1.585044 2 636 65479 12.746636 1.560352 1 637 77268 10.732563 0.545321 1 638 28490 3.977403 0.766103 1 639 13546 4.194426 0.450663 2 640 37166 9.610286 0.142912 3 641 16381 4.797555 1.260455 2 642 10848 1.615279 0.093002 2 643 35405 4.614771 1.027105 1 644 15917 0.000000 1.369726 2 645 6131 0.608457 0.512220 2 646 67432 6.558239 0.667579 1 647 30354 12.315116 0.197068 3 648 69696 7.014973 1.494616 1 649 33481 8.822304 1.194177 3 650 43075 10.086796 0.570455 3 651 38343 7.241614 1.661627 3 652 14318 4.602395 1.511768 2 653 5367 7.434921 0.079792 2 654 37894 10.467570 1.595418 3 655 36172 9.948127 0.003663 3 656 40123 2.478529 1.568987 1 657 10976 5.938545 0.878540 2 658 12705 0.000000 0.948004 2 659 12495 5.559181 1.357926 2 660 35681 9.776654 0.535966 3 661 46202 3.092056 0.490906 1 662 11505 0.000000 1.623311 2 663 22834 4.459495 0.538867 1 664 49901 8.334306 1.646600 3 665 71932 11.226654 0.384686 1 666 13279 3.904737 1.597294 2 667 49112 7.038205 1.211329 3 668 77129 9.836120 1.054340 1 669 37447 1.990976 0.378081 1 670 62397 9.005302 0.485385 1 671 0 1.772510 1.039873 2 672 15476 0.458674 0.819560 2 673 40625 10.003919 0.231658 3 674 36706 0.520807 1.476008 1 675 28580 10.678214 1.431837 3 676 25862 4.425992 1.363842 1 677 63488 12.035355 0.831222 1 678 33944 10.606732 1.253858 3 679 30099 1.568653 0.684264 1 680 13725 2.545434 0.024271 2 681 36768 10.264062 0.982593 3 682 64656 9.866276 0.685218 1 683 14927 0.142704 0.057455 2 684 43231 9.853270 1.521432 3 685 66087 6.596604 1.653574 1 686 19806 2.602287 1.321481 2 687 41081 10.411776 0.664168 3 688 10277 7.083449 0.622589 2 689 7014 2.080068 1.254441 2 690 17275 0.522844 1.622458 2 691 31600 10.362000 1.544827 3 692 59956 3.412967 1.035410 1 693 42181 6.796548 1.112153 3 694 51743 4.092035 0.075804 1 695 5194 2.763811 1.564325 2 696 30832 12.547439 1.402443 3 697 7976 5.708052 1.596152 2 698 14602 4.558025 0.375806 2 699 41571 11.642307 0.438553 3 700 55028 3.222443 0.121399 1 701 5837 4.736156 0.029871 2 702 39808 10.839526 0.836323 3 703 20944 4.194791 0.235483 2 704 22146 14.936259 0.888582 3 705 42169 3.310699 1.521855 1 706 7010 2.971931 0.034321 2 707 3807 9.261667 0.537807 2 708 29241 7.791833 1.111416 3 709 52696 1.480470 1.028750 1 710 42545 3.677287 0.244167 1 711 24437 2.202967 1.370399 1 712 16037 5.796735 0.935893 2 713 8493 3.063333 0.144089 2 714 68080 11.233094 0.492487 1 715 59016 1.965570 0.005697 1 716 11810 8.616719 0.137419 2 717 68630 6.609989 1.083505 1 718 7629 1.712639 1.086297 2 719 71992 10.117445 1.299319 1 720 13398 0.000000 1.104178 2 721 26241 9.824777 1.346821 3 722 11160 1.653089 0.980949 2 723 76701 18.178822 1.473671 1 724 32174 6.781126 0.885340 3 725 45043 8.206750 1.549223 3 726 42173 10.081853 1.376745 3 727 69801 6.288742 0.112799 1 728 41737 3.695937 1.543589 1 729 46979 6.726151 1.069380 3 730 79267 12.969999 1.568223 1 731 4615 2.661390 1.531933 2 732 32907 7.072764 1.117386 3 733 37444 9.123366 1.318988 3 734 569 3.743946 1.039546 2 735 8723 2.341300 0.219361 2 736 6024 0.541913 0.592348 2 737 52252 2.310828 1.436753 1 738 8358 6.226597 1.427316 2 739 26166 7.277876 0.489252 3 740 18471 0.000000 0.389459 2 741 3386 7.218221 1.098828 2 742 41544 8.777129 1.111464 3 743 10480 2.813428 0.819419 2 744 5894 2.268766 1.412130 2 745 7273 6.283627 0.571292 2 746 22272 7.520081 1.626868 3 747 31369 11.739225 0.027138 3 748 10708 3.746883 0.877350 2 749 69364 12.089835 0.521631 1 750 37760 12.310404 0.259339 3 751 13004 0.000000 0.671355 2 752 37885 2.728800 0.331502 1 753 52555 10.814342 0.607652 3 754 38997 12.170268 0.844205 3 755 69698 6.698371 0.240084 1 756 11783 3.632672 1.643479 2 757 47636 10.059991 0.892361 3 758 15744 1.887674 0.756162 2 759 69058 8.229125 0.195886 1 760 33057 7.817082 0.476102 3 761 28681 12.277230 0.076805 3 762 34042 10.055337 1.115778 3 763 29928 3.596002 1.485952 1 764 9734 2.755530 1.420655 2 765 7344 7.780991 0.513048 2 766 7387 0.093705 0.391834 2 767 33957 8.481567 0.520078 3 768 9936 3.865584 0.110062 2 769 36094 9.683709 0.779984 3 770 39835 10.617255 1.359970 3 771 64486 7.203216 1.624762 1 772 0 7.601414 1.215605 2 773 39539 1.386107 1.417070 1 774 66972 9.129253 0.594089 1 775 15029 1.363447 0.620841 2 776 44909 3.181399 0.359329 1 777 38183 13.365414 0.217011 3 778 37372 4.207717 1.289767 1 779 0 4.088395 0.870075 2 780 17786 3.327371 1.142505 2 781 39055 1.303323 1.235650 1 782 37045 7.999279 1.581763 3 783 6435 2.217488 0.864536 2 784 72265 7.751808 0.192451 1 785 28152 14.149305 1.591532 3 786 25931 8.765721 0.152808 3 787 7538 3.408996 0.184896 2 788 1315 1.251021 0.112340 2 789 12292 6.160619 1.537165 2 790 49248 1.034538 1.585162 1 791 9025 0.000000 1.034635 2 792 13438 2.355051 0.542603 2 793 69683 6.614543 0.153771 1 794 25374 10.245062 1.450903 3 795 55264 3.467074 1.231019 1 796 38324 7.487678 1.572293 3 797 69643 4.624115 1.185192 1 798 44058 8.995957 1.436479 3 799 41316 11.564476 0.007195 3 800 29119 3.440948 0.078331 1 801 51656 1.673603 0.732746 1 802 3030 4.719341 0.699755 2 803 35695 10.304798 1.576488 3 804 1537 2.086915 1.199312 2 805 9083 6.338220 1.131305 2 806 47744 8.254926 0.710694 3 807 71372 16.067108 0.974142 1 808 37980 1.723201 0.310488 1 809 42385 3.785045 0.876904 1 810 22687 2.557561 0.123738 1 811 39512 9.852220 1.095171 3 812 11885 3.679147 1.557205 2 813 4944 9.789681 0.852971 2 814 73230 14.958998 0.526707 1 815 17585 11.182148 1.288459 3 816 68737 7.528533 1.657487 1 817 13818 5.253802 1.378603 2 818 31662 13.946752 1.426657 3 819 86686 15.557263 1.430029 1 820 43214 12.483550 0.688513 3 821 24091 2.317302 1.411137 1 822 52544 10.069724 0.766119 3 823 61861 5.792231 1.615483 1 824 47903 4.138435 0.475994 1 825 37190 12.929517 0.304378 3 826 6013 9.378238 0.307392 2 827 27223 8.361362 1.643204 3 828 69027 7.939406 1.325042 1 829 78642 10.735384 0.705788 1 830 30254 11.592723 0.286188 3 831 21704 10.098356 0.704748 3 832 34985 9.299025 0.545337 3 833 31316 11.158297 0.218067 3 834 76368 16.143900 0.558388 1 835 27953 10.971700 1.221787 3 836 152 0.000000 0.681478 2 837 9146 3.178961 1.292692 2 838 75346 17.625350 0.339926 1 839 26376 1.995833 0.267826 1 840 35255 10.640467 0.416181 3 841 19198 9.628339 0.985462 3 842 12518 4.662664 0.495403 2 843 25453 5.754047 1.382742 2 844 12530 0.000000 0.037146 2 845 62230 9.334332 0.198118 1 846 9517 3.846162 0.619968 2 847 71161 10.685084 0.678179 1 848 1593 4.752134 0.359205 2 849 33794 0.697630 0.966786 1 850 39710 10.365836 0.505898 3 851 16941 0.461478 0.352865 2 852 69209 11.339537 1.068740 1 853 4446 5.420280 0.127310 2 854 9347 3.469955 1.619947 2 855 55635 8.517067 0.994858 3 856 65889 8.306512 0.413690 1 857 10753 2.628690 0.444320 2 858 7055 0.000000 0.802985 2 859 7905 0.000000 1.170397 2 860 53447 7.298767 1.582346 3 861 9194 7.331319 1.277988 2 862 61914 9.392269 0.151617 1 863 15630 5.541201 1.180596 2 864 79194 15.149460 0.537540 1 865 12268 5.515189 0.250562 2 866 33682 7.728898 0.920494 3 867 26080 11.318785 1.510979 3 868 19119 3.574709 1.531514 2 869 30902 7.350965 0.026332 3 870 63039 7.122363 1.630177 1 871 51136 1.828412 1.013702 1 872 35262 10.117989 1.156862 3 873 42776 11.309897 0.086291 3 874 64191 8.342034 1.388569 1 875 15436 0.241714 0.715577 2 876 14402 10.482619 1.694972 2 877 6341 9.289510 1.428879 2 878 14113 4.269419 0.134181 2 879 6390 0.000000 0.189456 2 880 8794 0.817119 0.143668 2 881 43432 1.508394 0.652651 1 882 38334 9.359918 0.052262 3 883 34068 10.052333 0.550423 3 884 30819 11.111660 0.989159 3 885 22239 11.265971 0.724054 3 886 28725 10.383830 0.254836 3 887 57071 3.878569 1.377983 1 888 72420 13.679237 0.025346 1 889 28294 10.526846 0.781569 3 890 9896 0.000000 0.924198 2 891 65821 4.106727 1.085669 1 892 7645 8.118856 1.470686 2 893 71289 7.796874 0.052336 1 894 5128 2.789669 1.093070 2 895 13711 6.226962 0.287251 2 896 22240 10.169548 1.660104 3 897 15092 0.000000 1.370549 2 898 5017 7.513353 0.137348 2 899 10141 8.240793 0.099735 2 900 35570 14.612797 1.247390 3 901 46893 3.562976 0.445386 1 902 8178 3.230482 1.331698 2 903 55783 3.612548 1.551911 1 904 1148 0.000000 0.332365 2 905 10062 3.931299 0.487577 2 906 74124 14.752342 1.155160 1 907 66603 10.261887 1.628085 1 908 11893 2.787266 1.570402 2 909 50908 15.112319 1.324132 3 910 39891 5.184553 0.223382 3 911 65915 3.868359 0.128078 1 912 65678 3.507965 0.028904 1 913 62996 11.019254 0.427554 1 914 36851 3.812387 0.655245 1 915 36669 11.056784 0.378725 3 916 38876 8.826880 1.002328 3 917 26878 11.173861 1.478244 3 918 46246 11.506465 0.421993 3 919 12761 7.798138 0.147917 3 920 35282 10.155081 1.370039 3 921 68306 10.645275 0.693453 1 922 31262 9.663200 1.521541 3 923 34754 10.790404 1.312679 3 924 13408 2.810534 0.219962 2 925 30365 9.825999 1.388500 3 926 10709 1.421316 0.677603 2 927 24332 11.123219 0.809107 3 928 45517 13.402206 0.661524 3 929 6178 1.212255 0.836807 2 930 10639 1.568446 1.297469 2 931 29613 3.343473 1.312266 1 932 22392 5.400155 0.193494 1 933 51126 3.818754 0.590905 1 934 53644 7.973845 0.307364 3 935 51417 9.078824 0.734876 3 936 24859 0.153467 0.766619 1 937 61732 8.325167 0.028479 1 938 71128 7.092089 1.216733 1 939 27276 5.192485 1.094409 3 940 30453 10.340791 1.087721 3 941 18670 2.077169 1.019775 2 942 70600 10.151966 0.993105 1 943 12683 0.046826 0.809614 2 944 81597 11.221874 1.395015 1 945 69959 14.497963 1.019254 1 946 8124 3.554508 0.533462 2 947 18867 3.522673 0.086725 2 948 80886 14.531655 0.380172 1 949 55895 3.027528 0.885457 1 950 31587 1.845967 0.488985 1 951 10591 10.226164 0.804403 3 952 70096 10.965926 1.212328 1 953 53151 2.129921 1.477378 1 954 11992 0.000000 1.606849 2 955 33114 9.489005 0.827814 3 956 7413 0.000000 1.020797 2 957 10583 0.000000 1.270167 2 958 58668 6.556676 0.055183 1 959 35018 9.959588 0.060020 3 960 70843 7.436056 1.479856 1 961 14011 0.404888 0.459517 2 962 35015 9.952942 1.650279 3 963 70839 15.600252 0.021935 1 964 3024 2.723846 0.387455 2 965 5526 0.513866 1.323448 2 966 5113 0.000000 0.861859 2 967 20851 7.280602 1.438470 2 968 40999 9.161978 1.110180 3 969 15823 0.991725 0.730979 2 970 35432 7.398380 0.684218 3 971 53711 12.149747 1.389088 3 972 64371 9.149678 0.874905 1 973 9289 9.666576 1.370330 2 974 60613 3.620110 0.287767 1 975 18338 5.238800 1.253646 2 976 22845 14.715782 1.503758 3 977 74676 14.445740 1.211160 1 978 34143 13.609528 0.364240 3 979 14153 3.141585 0.424280 2 980 9327 0.000000 0.120947 2 981 18991 0.454750 1.033280 2 982 9193 0.510310 0.016395 2 983 2285 3.864171 0.616349 2 984 9493 6.724021 0.563044 2 985 2371 4.289375 0.012563 2 986 13963 0.000000 1.437030 2 987 2299 3.733617 0.698269 2 988 5262 2.002589 1.380184 2 989 4659 2.502627 0.184223 2 990 17582 6.382129 0.876581 2 991 27750 8.546741 0.128706 3 992 9868 2.694977 0.432818 2 993 18333 3.951256 0.333300 2 994 3780 9.856183 0.329181 2 995 18190 2.068962 0.429927 2 996 11145 3.410627 0.631838 2 997 68846 9.974715 0.669787 1 998 26575 10.650102 0.866627 3 999 48111 9.134528 0.728045 3 1000 43757 7.882601 1.332446 3
2.2 KNN算法实现
myKNN.py
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Mon Sep 17 15:58:58 2018 4 KNN(K-Nearest Neighbor) K-近邻算法 5 @author: weixw 6 """ 7 8 import numpy as np 9 import operator 10 #输入:行测试数据集,训练数据集,标签数据集,用于选择最近邻居的数目 11 #功能:根据欧氏距离公式,找到与未知类别的测试数据距离最小的 k 个点, 12 # 以这 k 个点出现频率最高的类别座位测试数据的预测分类。 13 # 欧氏距离公式:测试数据与训练数据对应位置作差,平方和,然后开方 14 #输出:测试数据预测分类结果 15 def classify(testDataSet, trainingDataSet, labelList, k): 16 #训练数据集行数 17 trainingDataSetSize = trainingDataSet.shape[0] 18 #np.tile(testDataSet, (trainingDataSetSize,1)沿X轴复制1倍(相当于没有复制),再沿Y轴复制trainingDataSetSize倍,维数:1000*3 19 #欧氏距离公式实现 20 #1 测试数据 - 训练数据 21 diffMat = np.mat(np.tile(testDataSet, (trainingDataSetSize, 1)) - trainingDataSet) 22 #2 差平方(需要将matrix转化为数组,否则报错) 23 sqDiffMat = diffMat.A**2 24 #3 按行求和 axis = 0(默认按列) axis = 1(按行) 25 sqDistances = sqDiffMat.sum(axis = 1) 26 #4 开方 27 distances = sqDistances**0.5 28 #agrsort():从小到大排序,返回欧氏距离最小值对应的索引列表 29 sortedDistIndicies = distances.argsort() 30 #预测分类计数 31 predictClassCount = {} 32 #多数表决方式,选择 k 个欧氏距离最小值 33 for i in range(k): 34 #找到索引对应的标签值 35 voteLabel = labelList[sortedDistIndicies[i]] 36 #预测标签值字典,存储索引标签值预测次数 37 predictClassCount[voteLabel] = predictClassCount.get(voteLabel, 0) + 1 38 #对象按值逆向(由大到小)排序 39 # sorted(iterable[, cmp[, key[, reverse]]]) 40 # itemgetter(1) 取第一项结果 41 sortedPredictClassCount = sorted(predictClassCount.items(), key = operator.itemgetter(1), reverse = True) 42 return sortedPredictClassCount[0][0] 43 44 45 46 #输入:数据文件 47 #功能:加载文件,文件最后一列是标签数据,分离特征数据集与标签数据集 48 # 自动检测多少列特征数据并分离 49 #输出:特征数据集矩阵,标签数据集矩阵 50 def loadDataSet(fileName): 51 #特征数据列长度 52 numberFeat = len(open(fileName).readline().split('\t')) - 1 53 dataSet = []; labelSet = [] 54 fr = open(fileName) 55 for line in fr.readlines(): 56 lineArr = [] 57 #去除收尾空格,然后分割每一列 58 curLine = line.strip().split('\t') 59 #保存每一列特征数据 60 for i in range(numberFeat): 61 lineArr.append(float(curLine[i])) 62 dataSet.append(lineArr) 63 labelSet.append(float(curLine[-1])) 64 return np.mat(dataSet), labelSet 65 66 #输入:原始特征数据集 67 #功能:数据归一化,使每类数据都在同一范围内 (0, 1) 变化 68 # 归一化公式:newValue = (oldValue - min)/(max - min) 69 #输出:归一化后特征数据集,范围数组大小(分母),列最小值数组 70 def autoNorm(dataMat): 71 #min(axis) 无参数:所有值中最小值;axis = 0:每列最小值;axis = 1:每行最小值 72 #求出每列最小值 73 minValsMat = dataMat.min(0) 74 #求出每列最大值 75 maxValsMat = dataMat.max(0) 76 #计算差值(对应位置相减) 77 rangesMat = maxValsMat - minValsMat 78 #归一化特征数据集初始化,维数:1000*3 79 normDataMat = np.zeros(np.shape(dataMat)) 80 #原始数据集行数目 81 m = dataMat.shape[0] 82 #归一化公式分子实现 83 #np.tile(minVals, (m,1)沿X轴复制1倍(相当于没有复制),再沿Y轴复制m倍,维数:1000*3 84 normDataMat = dataMat - np.tile(minValsMat, (m, 1)) 85 #归一化公式实现,求得归一化结果 86 normDataMat = normDataMat/np.tile(rangesMat, (m, 1)) 87 return normDataMat, rangesMat, minValsMat 88 89 #输入:特征数据集矩阵,标签数据集列表,测试数据与训练数据比例,用于选择最近邻居的数目 90 #功能:求出测试特征数据集预测分类结果 91 # 1.解析文件 92 # 2.通过ratio确定测试数据集 93 # 3.归一化 94 # 4.对每一行测试数据运用欧氏距离公式以及多数表决方式预测分类结果 95 # 5.求出整个测试数据集的预测分类结果 96 #输出:测试数据预测分类结果 97 def dataClassify(dataMat, labelList, ratio, k): 98 99 #特征数据集归一化 100 normDataMat, rangesMat, minValsMat = autoNorm(dataMat) 101 #归一化特征数据集行数目 102 m = normDataMat.shape[0] 103 #测试数据集行数目(也就知道训练数据集行数) 104 testDataNum = int(m*ratio) 105 #预测分类错误计数 106 errorCount = 0.0 107 for i in range(testDataNum): 108 #求出测试数据集每行预测分类 109 classifierResult = classify(normDataMat[i, :], normDataMat[testDataNum:m, :], labelList[testDataNum:m], k) 110 print ("the classifier result is: %d, the real answer is: %d"% (classifierResult, labelList[i])) 111 #统计错误预测分类 112 if(classifierResult != labelList[i]): 113 errorCount += 1.0 114 print ("the total error count is %d"% errorCount) 115 print ("the total error rate is: %f"%(errorCount/float(testDataNum))) 116 117 118 #绘制散点图 119 def drawScatter(filename): 120 import matplotlib.pyplot as plt 121 #加载文件,分离特征数据集和标签数据集 122 dataMat, labelList = loadDataSet(filename) 123 #矩阵转化为数组 124 dataArr = dataMat.A 125 #创建一副图画 126 plt.figure() 127 #保存标签类型相同的索引值(观察标签数据集,有3种不同类型) 128 label_idx1 = []; label_idx2 = []; label_idx3 = [] 129 #遍历标签数组,索引,值 130 for index, value in enumerate(labelList): 131 if(value == 1): 132 label_idx1.append(index) 133 elif(value == 2): 134 label_idx2.append(index) 135 else: 136 label_idx3.append(index) 137 #scatter(x,y,s,maker,color,label) 138 #x,y必须是数组类型,s表示形状大小,maker:形状 139 plt.scatter(dataArr[label_idx1, 1], dataArr[label_idx1, 2], marker = 'x', color = 'm', label = 'no like', s = 30) 140 plt.scatter(dataArr[label_idx2, 1], dataArr[label_idx2, 2], marker = '+', color = 'c', label = 'like', s = 50) 141 plt.scatter(dataArr[label_idx3, 1], dataArr[label_idx3, 2], marker = 'o', color = 'r', label = 'very like', s = 15) 142 plt.legend(loc = 'upper right') 143 144
2.3 测试代码
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Tue Sep 18 14:07:14 2018 4 测试KNN算法 5 @author: weixw 6 """ 7 import myKNN as mk 8 #前50%是测试数据,后50%作为训练数据 9 ratio = 0.5 10 #选择邻居数目 11 #errCount:31 errRate:6.2% 12 k = 4 13 #errCount:30 errRate:6.0% 14 #k = 8 15 #errCount:30 errRate:6.0% 16 #k = 12 17 #errCount:33 errRate:6.6% 18 #k = 16 19 #errCount:32 errRate:6.4% 20 #k = 20 21 22 23 24 fileName = 'datingTestSet2.txt' 25 #绘制数据散点图 26 mk.drawScatter(fileName) 27 #加载文件,分离特征数据集和标签数据集 28 dataMat, labelList = mk.loadDataSet(fileName) 29 #预测测试数据结果 30 mk.dataClassify(dataMat, labelList, ratio, k)
2.4 运行结果
输入数据的散点图:
k = 4 ,ratio = 0.5(一半测试数据,一半训练数据)时分类结果:
在 k为不同值时运行结果:
可以看出,并不是 k越大,正确率越高,会产生过拟合。
3. 优缺点
优点:
1. 简单,易于理解,易于实现,无需训练;
2. 精度高,对异常值不敏感;
缺点:
计算复杂度高,空间复杂度高。
4. 参考文献
《机器学习实战》
《统计学习方法》
知乎:https://www.zhihu.com/search?type=content&q=KNN
博客:https://www.cnblogs.com/ybjourney/p/4702562.html
不要让懒惰占据你的大脑,不要让妥协拖垮了你的人生。青春就是一张票,能不能赶上时代的快车,你的步伐就掌握在你的脚下。