布隆过滤器实战【防止缓存击穿】
http://blog.itpub.net/31561269/viewspace-2639083/
https://juejin.im/post/5cfd060ee51d4556f76e8067
适合的场景
-
数据库防止穿库 Google Bigtable,Apache HBase和Apache Cassandra以及Postgresql 使用BloomFilter来减少不存在的行或列的磁盘查找。避免代价高昂的磁盘查找会大大提高数据库查询操作的性能。 如同一开始的业务场景。如果数据量较大,不方便放在缓存中。需要对请求做拦截防止穿库。
-
缓存宕机 缓存宕机的场景,使用布隆过滤器会造成一定程度的误判。原因是除了Bloom Filter 本身有误判率,宕机之前的缓存不一定能覆盖到所有DB中的数据,当宕机后用户请求了一个以前从未请求的数据,这个时候就会产生误判。当然,缓存宕机时使用布隆过滤器作为应急的方式,这种情况应该也是可以忍受的。
-
WEB拦截器 相同请求拦截防止被攻击。用户第一次请求,将请求参数放入BloomFilter中,当第二次请求时,先判断请求参数是否被BloomFilter命中。可以提高缓存命中率
-
恶意地址检测 chrome 浏览器检查是否是恶意地址。 首先针对本地BloomFilter检查任何URL,并且仅当BloomFilter返回肯定结果时才对所执行的URL进行全面检查(并且用户警告,如果它也返回肯定结果)。
-
比特币加速 bitcoin 使用BloomFilter来加速钱包同步。
1.2 应用场景
- 数据库防止穿库。 Google Bigtable,HBase 和 Cassandra 以及 Postgresql 使用BloomFilter来减少不存在的行或列的磁盘查找。避免代价高昂的磁盘查找会大大提高数据库查询操作的性能。
- 业务场景中判断用户是否阅读过某视频或文章,比如抖音或头条,当然会导致一定的误判,但不会让用户看到重复的内容。还有之前自己遇到的一个比赛类的社交场景中,需要判断用户是否在比赛中,如果在则需要更新比赛内容,也可以使用布隆过滤器,可以减少不在的用户查询db或缓存的次数。
- 缓存宕机、缓存击穿场景,一般判断用户是否在缓存中,如果在则直接返回结果,不在则查询db,如果来一波冷数据,会导致缓存大量击穿,造成雪崩效应,这时候可以用布隆过滤器当缓存的索引,只有在布隆过滤器中,才去查询缓存,如果没查询到,则穿透到db。如果不在布隆器中,则直接返回。
- WEB拦截器,如果相同请求则拦截,防止重复被攻击。用户第一次请求,将请求参数放入布隆过滤器中,当第二次请求时,先判断请求参数是否被布隆过滤器命中。可以提高缓存命中率
链接:https://juejin.im/post/5cfd060ee51d4556f76e8067
- The servers of Akamai Technologies, a content delivery provider, use Bloom filters to prevent "one-hit-wonders" from being stored in its disk caches. One-hit-wonders are web objects requested by users just once, something that Akamai found applied to nearly three-quarters of their caching infrastructure. Using a Bloom filter to detect the second request for a web object and caching that object only on its second request prevents one-hit wonders from entering the disk cache, significantly reducing disk workload and increasing disk cache hit rates.[10]
- Google Bigtable, Apache HBase and Apache Cassandra and PostgreSQL[11] use Bloom filters to reduce the disk lookups for non-existent rows or columns. Avoiding costly disk lookups considerably increases the performance of a database query operation.[12]
- The Google Chrome web browser used to use a Bloom filter to identify malicious URLs. Any URL was first checked against a local Bloom filter, and only if the Bloom filter returned a positive result was a full check of the URL performed (and the user warned, if that too returned a positive result).[13][14]
- Microsoft Bing (search engine) uses multi-level hierarchical Bloom filters for its search index, BitFunnel. Bloom filters provided lower cost than the previous Bing index, which was based on inverted files.[15].
- The Squid Web Proxy Cache uses Bloom filters for cache digests.[16]
- Bitcoin uses Bloom filters to speed up wallet synchronization.[17]
- The Venti archival storage system uses Bloom filters to detect previously stored data.[18]
- The SPIN model checker uses Bloom filters to track the reachable state space for large verification problems.[19]
- The Cascading analytics framework uses Bloom filters to speed up asymmetric joins, where one of the joined data sets is significantly larger than the other (often called Bloom join in the database literature).[20]
- The Exim mail transfer agent (MTA) uses Bloom filters in its rate-limit feature.[21]
- Medium uses Bloom filters to avoid recommending articles a user has previously read.[22]
- Ethereum uses Bloom filters for quickly finding logs on the Ethereum blockchain.
https://juejin.im/post/5de1e37c5188256e8e43adfc