Ted

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

The original article is an excellent one to explain How Google chrome use bloom filtes, so I just quote the article below.

Nice Bloom filter application

 http://blog.alexyakunin.com/2010/03/nice-bloom-filter-application.html
 
Today I accidentally found a couple of interesting files in one of Google Chrome folders:
  • Safe Browsing Bloom
  • Safe Browsing Bloom Filter 2
Conclusion: Chrome uses Bloom filters to make a preliminary decision whether a particular web site is malicious or safe. Cool idea!

Let me explain how it works:
  • Web site URL is converted to some canonical form (or a set of such forms - e.g. by sequentially stripping all the sub-domains; in this case check is performed for each of such URLs).
  • N of its "long" hashes are computed (likely, there are 64-bit hashes).
  • The value of each hash is multiplied by scale factor. That's how N addresses of bits in bit array called Bloom filter are computed.
  • The filter is designed so that if all these bits are set, the site is malicious with very high probability (in this case Chrome performs its precise verification - likely, by issuing a request to the corresponding Google service); otherwise it is guaranteed to be safe.
  • More detailed description of Bloom filters is here.
The benefits of this method:
  • The size of such a filter is considerably smaller than size of any other structure providing precise "yes/no" answers to similar questions. For example, if it is a set (data structure), its data length should be nearly equal to the total length of all the URLs of malicious sites it "remembers". But a Bloom filter providing false positive responses with a probability of 1% (note that it can't provide false negative response) would require just 9.6 bits per each URL of malicious web site it "remembers", i.e. nearly 1 byte! Taking into account that size of Chrome Bloom filters is about 18Mb and assuming they are really built for such a false positive response probability, this means they contain information about ~ 1 million of malicious web sites!
  • Bloom filter allows Chrome to use precise verification service practically only when the user actually goes to a malicious web site. Isn't it wonderful? ;)
posted on 2012-05-11 00:30  wufawei  阅读(443)  评论(0编辑  收藏  举报