The original article is an excellent one to explain How Google chrome use bloom filtes, so I just quote the article below.
Nice Bloom filter application
Today I accidentally found a couple of interesting files in one of Google Chrome folders:
Let me explain how it works:
- Safe Browsing Bloom
- Safe Browsing Bloom Filter 2
Let me explain how it works:
- Web site URL is converted to some canonical form (or a set of such forms - e.g. by sequentially stripping all the sub-domains; in this case check is performed for each of such URLs).
- N of its "long" hashes are computed (likely, there are 64-bit hashes).
- The value of each hash is multiplied by scale factor. That's how N addresses of bits in bit array called Bloom filter are computed.
- The filter is designed so that if all these bits are set, the site is malicious with very high probability (in this case Chrome performs its precise verification - likely, by issuing a request to the corresponding Google service); otherwise it is guaranteed to be safe.
- More detailed description of Bloom filters is here.
- The size of such a filter is considerably smaller than size of any other structure providing precise "yes/no" answers to similar questions. For example, if it is a set (data structure), its data length should be nearly equal to the total length of all the URLs of malicious sites it "remembers". But a Bloom filter providing false positive responses with a probability of 1% (note that it can't provide false negative response) would require just 9.6 bits per each URL of malicious web site it "remembers", i.e. nearly 1 byte! Taking into account that size of Chrome Bloom filters is about 18Mb and assuming they are really built for such a false positive response probability, this means they contain information about ~ 1 million of malicious web sites!
- Bloom filter allows Chrome to use precise verification service practically only when the user actually goes to a malicious web site. Isn't it wonderful? ;)