python数据集处理,加载成list
正确加载方法:
def loadDataSet(filename): numFeatures = len(open(filename).readline().split('\t')) - 1 dataMat = [] labelMat = [] f = open(filename) for line in f.readlines(): lineArr=[] curLine=line.strip().split('\t') for i in range(0,numFeatures): lineArr.append(float(curLine[i])) dataMat.append(lineArr) labelMat.append(float(curLine[-1])) return dataMat,labelMat
错误加载方法:
def loadDataSet(filename): f = open(filename) numFeatures = len(f.readline().split('\t')) - 1 dataMat = [] labelMat = [] for line in f.readlines(): lineArr=[] curLine=line.strip().split('\t') for i in range(0,numFeatures): lineArr.append(float(curLine[i])) dataMat.append(lineArr) labelMat.append(float(curLine[-1])) return dataMat,labelMat
原因:获取numFeatures时使用了readline()函数,使得句柄f移动到第二行,下面代码中的readlines()函数只能读取剩下的行,相当于少读取了第一行。