fxjwind

An Introduction to Asynchronous Programming and Twisted (2)

摘要： Part 6: And Then We Took It Higher Part5中的client2.0, 在封装性上已经做的不错, 用户只需要了解和修改PoetryProtocol, PoetryClientFactory就可以完成一个应用. 其实此处, protocol的逻辑就是接受数据, 接受完以后通知factory处理, 这段逻辑已经可以作为common的框架代码, 用户无需改动. 真正需... 阅读全文

posted @ 2011-09-07 10:02 fxjwind 阅读(390) 评论(0) 推荐(0) 编辑

Mining of Massive Datasets – Link Analysis

摘要： 5.1 PageRank5.1.1 Early Search Engines and Term SpamAs people began to use search engines to find their way around the Web, unethical people saw the opportunity to fool search engines into leading people to their page.Techniques for fooling search engines into believing your page is about something 阅读全文

posted @ 2011-09-06 15:49 fxjwind 阅读(625) 评论(0) 推荐(0) 编辑

Mining of Massive Datasets – Mining Data Streams

摘要： Most of the algorithms described in this book assume that we are mining a database. That is, all our data is available when and if we want it. In this chapter, we shall make another assumption: data... 阅读全文

posted @ 2011-08-31 14:48 fxjwind 阅读(583) 评论(0) 推荐(0) 编辑

Bloom Filter Python

摘要： http://bitworking.org/news/380/bloom-filter-resourcesThe Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be adde 阅读全文

posted @ 2011-08-30 10:20 fxjwind 阅读(811) 评论(0) 推荐(0) 编辑

Mining of Massive Datasets – Data Mining

摘要： 1 What is Data Mining? The most commonly accepted definition of “data mining” is the discovery of “models” for data. 1.1 Statistical Modeling Statisticians were the first to use the term “data m... 阅读全文

posted @ 2011-08-29 15:00 fxjwind 阅读(565) 评论(0) 推荐(0) 编辑

Mining of Massive Datasets – Finding similar items

摘要：在前面一篇blog中 (http://www.cnblogs.com/fxjwind/archive/2011/07/05/2098642.html), 我记录了相关的海量文档查同问题, 这儿就系统的来记录一下对于大规模数据挖掘技术而言, 怎样finding similar items……1 Applications of Near-Neighbor SearchThe Jaccard similarity of sets S and T is |S ∩ T |/|S ∪ T |, that is, the ratio of the size of the intersection of S 阅读全文

posted @ 2011-08-24 09:44 fxjwind 阅读(713) 评论(0) 推荐(0) 编辑

Filtering microblogging messages for Social TV

摘要：论文摘要, Filtering microblogging messages for Social TV, A Bootstrapping Approach to Identifying Relevant Tweets for Social TVSocial TV was named one of the ten most important emerging technologies in 2010 by the MIT Technology Review.Social Television is a general term for technology that supports com 阅读全文

posted @ 2011-08-02 17:30 fxjwind 阅读(335) 评论(0) 推荐(0) 编辑