随笔档案「2012年2月2日」：Latent Semantic Analysis (LSA) Tutorial ... - app_

2012年2月2日

Latent Semantic Analysis (LSA) Tutorial 潜语义分析LSA介绍第三部分（转载）

摘要： Part 4 - Clustering by Color用颜色聚类We can also turnthe numbers into colors. For instance, here is a color display that correspondsto the first 3 dimensions of the Titles matrix that we showed above. Itcontains exactly the same information, except that blue shows negative numbers,red shows positive num 阅读全文

posted @ 2012-02-02 16:03 app_ 阅读(1739) 评论(0) 推荐(0)

Latent Semantic Analysis (LSA) Tutorial 潜语义分析LSA介绍第二部分（转载）

摘要： Part 2 - Modify the Counts with TFIDF计算TFIDF替代简单计数In sophisticated Latent Semantic Analysis systems, the raw matrix countsare usually modified so that rare words are weighted more heavily than commonwords. For example, a word that occurs in only 5% of the documents shouldprobably be weighted more he 阅读全文

posted @ 2012-02-02 15:51 app_ 阅读(1691) 评论(1) 推荐(0)

Latent Semantic Analysis (LSA) Tutorial第一部分（转载）

摘要：译：http://www.puffinwarellc.com/index.php/news-and-articles/articles/33.htmlWangBen 2011-09-16 beijing潜语义分析LSA介绍Latent Semantic Analysis (LSA), also known as Latent Semantic Indexing (LSI) literally means analyzing documents to find the underlying meaning or concepts of those documents. If each word 阅读全文

posted @ 2012-02-02 15:47 app_ 阅读(2988) 评论(0) 推荐(0)

基于python的中文分词的实现及应用（转载）

摘要：基于python的中文分词的实现及应用刘新亮严姗姗(北京工商大学计算机学院，100037) 摘要中文分词的实现及应用属于自然语言处理范畴，完成的是中文分词在Python语言环境下的实现，以及利用这个实现的一个应用程序接口和一个中文文本处理的应用。设计共分为五个部分，分别是：分词模块、包装模块、应用程序接口、Nonsense模块，这个项目是为了下一步开放源代码的中文搜索引擎提供中文分词功能，同时通过表现代码的娱乐性达到促进公开源代码的发展。关键词中文分词；Python语言；程序接口1　引言自然语言处理是研究实现人与计算机之间用自然语言进行有效通信的各种理论和方法的一个领域，是计算机科阅读全文

posted @ 2012-02-02 15:20 app_ 阅读(16290) 评论(0) 推荐(0)

Python中文排序（转载）

摘要： Python比较字符串大小时，根据的是ord函数得到的编码值。基于它的排序函数sort可以很容易为数字和英文字母排序，因为它们在编码表中就是顺序排列的。>> print ','< '1'<'A'<'a'<'阿'True但要很处理中文就没那么容易了。中文通常有拼音和笔画两种排序方式，在最常用中文标准字符集GB2312中，3755个一级中文汉字是按照拼音序进行编码的，而3008个二级汉字则是按部首笔画排列，>> print '曙'< '鲑&# 阅读全文

posted @ 2012-02-02 15:19 app_ 阅读(8394) 评论(0) 推荐(0)

appler的博客

python, nlp, 机器学习

公告