How Google chrome use bloom filters - wufawei

The original article is an excellent one to explain How Google chrome use bloom filtes, so I just quote the article below.

Today I accidentally found a couple of interesting files in one of Google Chrome folders:

Safe Browsing Bloom
Safe Browsing Bloom Filter 2

Conclusion: Chrome uses Bloom filters to make a preliminary decision whether a particular web site is malicious or safe. Cool idea!

Let me explain how it works:

Web site URL is converted to some canonical form (or a set of such forms - e.g. by sequentially stripping all the sub-domains; in this case check is performed for each of such URLs).
N of its "long" hashes are computed (likely, there are 64-bit hashes).
The value of each hash is multiplied by scale factor. That's how N addresses of bits in bit array called Bloom filter are computed.
The filter is designed so that if all these bits are set, the site is malicious with very high probability (in this case Chrome performs its precise verification - likely, by issuing a request to the corresponding Google service); otherwise it is guaranteed to be safe.
More detailed description of Bloom filters is here.

The benefits of this method:

The size of such a filter is considerably smaller than size of any other structure providing precise "yes/no" answers to similar questions. For example, if it is a set (data structure), its data length should be nearly equal to the total length of all the URLs of malicious sites it "remembers". But a Bloom filter providing false positive responses with a probability of 1% (note that it can't provide false negative response) would require just 9.6 bits per each URL of malicious web site it "remembers", i.e. nearly 1 byte! Taking into account that size of Chrome Bloom filters is about 18Mb and assuming they are really built for such a false positive response probability, this means they contain information about ~ 1 million of malicious web sites!
Bloom filter allows Chrome to use precise verification service practically only when the user actually goes to a malicious web site. Isn't it wonderful? ;)

Ted

公告

Nice Bloom filter application