Ted

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

1 Bloom filter

 When we write crawler,  if we have to crawler millions websites,  we need to check whether a website is crawled or not,

 So we need both space and time efficient algorithoms to achieve this goal, Bloom filter is our choice.

 Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.

 So  Bloom filter may be false positive, but never false negative.

 About false positive and false negative, you can refer to Type I and type II errors and "什么是False Positive和False Negative".

 

2 Use case

  a) search engine, when crawls website, check whether a website is crawled or not.

  b) browses use this to check whether a url is malicious or not.

posted on 2012-05-11 00:21  wufawei  阅读(180)  评论(0编辑  收藏  举报