摘要: TF-IDF(term frequency–inverse document frequency)是一种统计方法,用于评估一个词对N篇文章或一个语料库中其中一篇文章的重要性。一个词在一篇文章中出现的次数并不能表明该词的重要性,例如“我们”,“的”这种常见词,我们需要TF-IDF统计方法。词的重要度随着它在一篇文章中出现的次数成正比增加,但同时会随着它在N篇文章(文章集合)中出现的频率成反比下降。在一篇给定的文章中,词频(term frequency, TF)指的是该文章中某个词出现的次数除以该文章的总词数。逆向文档频率(inverse document frequency, IDF)是一个词普 阅读全文
posted @ 2011-12-28 14:01 ‰流浪※ 阅读(3719) 评论(0) 推荐(0) 编辑
摘要: The Term Count ModelDemystifying Term Vector CalculationsDr. E. GarciaMi Islita.comEmail| Last Update: 10/27/06Article 2 of the seriesTerm Vector Theory and Keyword WeightsTopicsBackgroundTerm CountsDemystifying Term VectorsComputing Vector MagnitudesComputing Dot ProductsComputing Cosine ValuesA Li 阅读全文
posted @ 2011-12-28 09:49 ‰流浪※ 阅读(246) 评论(0) 推荐(0) 编辑
摘要: The Classic Vector Space ModelDescription, Advantages and Limitations of the Classic Vector Space ModelDr. E. GarciaMi Islita.comEmail| Last Update: 10/27/06Article 3 of the seriesTerm Vector Theory and Keyword WeightsTopicsGlobal InformationSelf-Similarity ElementsVector Space ExampleSimilarity Ana 阅读全文
posted @ 2011-12-28 09:48 ‰流浪※ 阅读(106) 评论(0) 推荐(0) 编辑
摘要: Term Vector Theory and Keyword WeightsAn Introductory Series on Term Vector Theory for Information Retrieval Students and Search Engine MarketersDr. E. GarciaMi Islita.comEmail| Last Update: 10/27/06Article 1 of the seriesTerm Vector Theory and Keyword WeightsTopicsSalton's Vector Space ModelLoc 阅读全文
posted @ 2011-12-28 09:48 ‰流浪※ 阅读(152) 评论(0) 推荐(0) 编辑