机器学习实战(代码讲解)
机器学习实战 http://www.cnblogs.com/qwertWZ/p/4582096.html
机器学习实战笔记:http://blog.csdn.net/Lu597203933/article/details/37969799
#第一个kNN分类器 inX-测试数据 dataSet-样本数据 labels-标签 k-邻近的k个样本 def classify0(inX,dataSet, labels, k): #计算距离 dataSetSize = dataSet.shape[0] diffMat = tile(inX, (dataSetSize,1))- dataSet sqDiffMat = diffMat ** 2 sqDistances = sqDiffMat.sum(axis = 1) distances = sqDistances **0.5 sortedDistIndicies = distances.argsort() classCount = {} #选择距离最小的k个点 for i in range(k): voteIlabel = labels[sortedDistIndicies[i]] classCount[voteIlabel] = classCount.get(voteIlabel,0)+1 #排序 sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(1),reverse = True) return sortedClassCount[0][0]
代码讲解:(a)tile函数 tile(inX, i);扩展长度 tile(inX, (i,j)) ;i是扩展个数,j是扩展长度。如:
>>> from numpy import * >>> inX= array([[0,0],[1,2]]) >>> tile(inX,2) array([[0, 0, 0, 0], [1, 2, 1, 2]]) >>> tile(inX,(4,2)) array([[0, 0, 0, 0], [1, 2, 1, 2], [0, 0, 0, 0], [1, 2, 1, 2], [0, 0, 0, 0], [1, 2, 1, 2], [0, 0, 0, 0], [1, 2, 1, 2]]) >>> tile(inX,3) array([[0, 0, 0, 0, 0, 0], [1, 2, 1, 2, 1, 2]]) >>> tile(inX,1) array([[0, 0], [1, 2]])