机器学习实战ch04 关于python版本所支持的文本格式问题

Posted on 2017-12-06 00:18 senyang 阅读(850) 评论(0) 编辑收藏举报

函数定义中：

def spamTest():
docList=[]; classList = []; fullText =[]
for i in range(1,26):
# print('cycle counts is % i'%i)
wordList = textParse(open('email/spam/%d.txt' % i).read())
docList.append(wordList)
fullText.extend(wordList)
classList.append(1)
wordList = textParse(open('email/ham/%d.txt' % i).read())
docList.append(wordList)
fullText.extend(wordList)
classList.append(0)
vocabList = createVocabList(docList)#create vocabulary
trainingSet = range(50); testSet=[] #create test set
for i in range(10):
randIndex = int(random.uniform(0,len(trainingSet)))
testSet.append(trainingSet[randIndex])
del(trainingSet[randIndex])
trainMat=[]; trainClasses = []
for docIndex in trainingSet:#train the classifier (get probs) trainNB0
trainMat.append(bagOfWords2VecMN(vocabList, docList[docIndex]))
trainClasses.append(classList[docIndex])
p0V,p1V,pSpam = trainNB0(array(trainMat),array(trainClasses))
errorCount = 0
for docIndex in testSet: #classify the remaining items
wordVector = bagOfWords2VecMN(vocabList, docList[docIndex])
if classifyNB(array(wordVector),p0V,p1V,pSpam) != classList[docIndex]:
errorCount += 1
print ("classification error",docList[docIndex])
print ('the error rate is: ',float(errorCount)/len(testSet))
#return vocabList,fullText

程序调试时出现两个错误：

（1）UnicodeDecodeError: 'utf8' codec can't decode ...........

解决办法：将spam和ham文件夹.txt文件用Sublime Text打开，Save with Encoding UTF-8

(2) 'range' object doesn't support item deletion

解决办法：将trainingSet = range(50)改为 trainingSet = list(range(50)). #python3.x range返回的是range对象,不返回数组对象解决方法

以上两个错误均由python版本差异引起

刷新页面返回顶部

senyang

公告

机器学习实战ch04 关于python版本所支持的文本格式问题