Coursera:liner algebra[week0 inverse_index_lab task4]

Coursera上线性代数课程作业,要求利用python完成如下任务

Task 4: Write a procedure makeInverseIndex(strlist) that, given a list of strings (documents), returns
a dictionary that maps each word to the set consisting of the document numbers of documents in which that
word appears. This dictionary is called an inverse index. (Hint: use enumerate.)

e.g

input: s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence']

output:  {'third': {2}, 'sentence': {0, 1, 2}, 'this': {0, 1, 2}, 'second': {1}, 'is': {0, 1, 2}, 'at': {2}, 'last': {2}, 'and': {1}, 'first': {0}, 'the': {0, 1, 2}}

目的是输入文件,将文件分成若干句子,每个句子有自身的代号,输出的目的是,找出在本文存在的所有词在哪一个代号的句子中出现过。

程序如下:

#make inverse index

def makeInverseIndex(strlist):
    result={}
    a=[]
   for i in range(len(strlist)):
        a.append(strlist[i].split())
    for j in range(len(strlist)):
        for k in a[j]:
            result[k]={j}
            for l in range(len(strlist)):
                if k in a[l]:
                    result[k].add(l)
    return result
        
s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence']
print(makeInverseIndex(s))

在此,并未利用enumerate函数,以后需要多加研究。
此问题中,之前一直没有完成,原因在于最后一句:result[k].add(l)

之前写的是:先定义一个集合t=set()

然后  result[k]=t.add(l)

返回结果是  result[k]=None

正确的写法   result[k].add(l)  原因在于之前 result[k]={j},即已经将result这一dict类型的变量的key赋值,且赋值类型为{},所以再次调用result[k]时,默认其数据类型为集合(set),可以用.add()  method

 

 

posted @ 2013-07-17 00:59  can't waste time  阅读(526)  评论(0编辑  收藏  举报