Coursera：liner algebra[week0 inverse_index_lab task4]

Coursera上线性代数课程作业，要求利用python完成如下任务

Task 4: Write a procedure makeInverseIndex(strlist) that, given a list of strings (documents), returns
a dictionary that maps each word to the set consisting of the document numbers of documents in which that
word appears. This dictionary is called an inverse index. (Hint: use enumerate.)

e.g

input: s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence']

output: {'third': {2}, 'sentence': {0, 1, 2}, 'this': {0, 1, 2}, 'second': {1}, 'is': {0, 1, 2}, 'at': {2}, 'last': {2}, 'and': {1}, 'first': {0}, 'the': {0, 1, 2}}

目的是输入文件，将文件分成若干句子，每个句子有自身的代号，输出的目的是，找出在本文存在的所有词在哪一个代号的句子中出现过。

程序如下：

#make inverse index

def makeInverseIndex(strlist):
    result={}
    a=[]
   for i in range(len(strlist)):
        a.append(strlist[i].split())
    for j in range(len(strlist)):
        for k in a[j]:
            result[k]={j}
            for l in range(len(strlist)):
                if k in a[l]:
                    result[k].add(l)
    return result
        
s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence']
print(makeInverseIndex(s))

在此，并未利用enumerate函数，以后需要多加研究。
此问题中，之前一直没有完成，原因在于最后一句：result[k].add(l)

之前写的是：先定义一个集合t=set()

然后 result[k]=t.add(l)

返回结果是 result[k]=None

正确的写法 result[k].add(l) 原因在于之前 result[k]={j}，即已经将result这一dict类型的变量的key赋值，且赋值类型为{}，所以再次调用result[k]时，默认其数据类型为集合（set），可以用.add() method

posted @ 2013-07-17 00:59 can't waste time 阅读(526) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

can't waste time

Coursera：liner algebra[week0 inverse_index_lab task4]

公告