摘要: From wikipedia.org英文版,我主要将其改变成中文。 BM25(Best Match25)是在信息检索系统中根据提出的query对document进行评分的算法。It is based on theprobabilistic retrieval frameworkdeveloped in the 1970s and 1980s byStephen E. Robertson,Karen Spärck Jones, and others.BM25算法首先由OKapi系统实现,所以又称为OKapi BM25。 BM25属于bag-of-words模型,bag-of-words 阅读全文
posted @ 2012-08-22 13:45 lycan785 阅读(5263) 评论(0) 推荐(0) 编辑
摘要: TF-IDF的全称为:term frequence|inverse document frequence,它是揭示一个词对文档重要性的数字统计。From wikipedia.orgThetf*idfweight (term frequency–inverse document frequency, a.k.a.TF-IDF) is a numerical statistic which reflects how important a word is to adocumentin a collection orcorpus. It is often used as a weighting fa 阅读全文
posted @ 2012-08-22 10:56 lycan785 阅读(1092) 评论(0) 推荐(0) 编辑