支持 equals 相等的对象(可重复对象)作为 WeakHashMap 的 Key
问题
长链接场景下通常有一个类似 Map<String, Set<Long>> 的结构,用来查找一个逻辑组内的哪些用户,String 类型的 Entry.key 是逻辑组 key,Set<Long> 类型的 Entry.value 存放逻辑组内的用户 Id,那么这个 Map 显然要在逻辑组内用户为 0 时删除这个 Entry,以避免内存泄漏。
删除 Map 的 value 很容易联想到 remove,但并发的处理很复杂,还要单独开一个线程,如果可以自动删除就好了,而 WeakHashMap 就可以自动删除 value,前提它是 Entry.key 不存在引用时删除 Entry.value,那么只要将用户的生命周期和 Entry.key 关联上即可,以 Netty 的 Channel 为例就是将该 Entry.key 放到 Channel.attr 中。
上面稍微一看就有问题,Entry.key 是一个 String 类型的变量,字符串存在常量池(字符串其实挺好的),Channel 就算销毁了也不会丢失对 WeakHashMap Entry.value 的引用,如果每次都 new 一个对象呢?问题更大,此时只有第一个用户强引用 WeakHashMap 的 Entry.value(即 new Set 再 add),其他用户仅仅是获取到了(此时 Entry.key 是第一个用户的,而不是当前用户的),这样第一个用户下线时,这个 Set 就会被 GC。显而易见问题是 Entry.Key 引用不一致导致的,只要给用户返回永远相同的 Entry.key 即可。
如何返回永远相同的对象呢?感觉又回到了原点,因为返回一样的对象显然是 Map<String, Object>,但这个 Map 同样不能内存泄漏,不过情况略有不同,区别在于查找 Set 变成了一个嵌套的查找(String -> Object -> Set<Long>),而用户强引用的 Entry.key 变成了 Object,即 Object 对象的生命周期跟随用户走即可( WeakHashMap<Object, Set<Long>> 负责 GC Set),也就是 WeakHashMap<Object, WeakReference<Object>>。
解决
下面给出代码:
package io.github.hligaty.util;
import java.lang.ref.WeakReference;
import java.util.Objects;
import java.util.WeakHashMap;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
/**
* Recreatable key objects.
* With recreatable key objects,
* the automatic removal of WeakHashMap entries whose keys have been discarded may prove to be confusing,
* but WeakKey will not.
*
* @param <K> the type of keys maintained
* @author hligaty
* @see java.util.WeakHashMap
*/
public class WeakKey<K> {
private static final WeakHashMap<WeakKey<?>, WeakReference<WeakKey<?>>> cache = new WeakHashMap<>();
private static final ReadWriteLock cacheLock = new ReentrantReadWriteLock();
private static final WeakHashMap<Thread, WeakKey<?>> shadowCache = new WeakHashMap<>();
private static final ReadWriteLock shadowCacheLock = new ReentrantReadWriteLock();
private K key;
private WeakKey() {
}
@SuppressWarnings("unchecked")
public static <T> WeakKey<T> wrap(T key) {
WeakKey<T> shadow = (WeakKey<T>) getShadow();
shadow.key = key;
cacheLock.readLock().lock();
try {
WeakReference<WeakKey<?>> ref = cache.get(shadow);
if (ref != null) {
shadow.key = null;
return (WeakKey<T>) ref.get();
}
} finally {
cacheLock.readLock().unlock();
}
cacheLock.writeLock().lock();
try {
WeakReference<WeakKey<?>> newRef = cache.get(shadow);
shadow.key = null;
if (newRef == null) {
WeakKey<T> weakKey = new WeakKey<>();
weakKey.key = key;
newRef = new WeakReference<>(weakKey);
cache.put(weakKey, newRef);
return weakKey;
}
return (WeakKey<T>) newRef.get();
} finally {
cacheLock.writeLock().unlock();
}
}
private static WeakKey<?> getShadow() {
Thread thread = Thread.currentThread();
shadowCacheLock.readLock().lock();
WeakKey<?> shadow;
try {
shadow = shadowCache.get(thread);
if (shadow != null) {
return shadow;
}
} finally {
shadowCacheLock.readLock().unlock();
}
shadowCacheLock.writeLock().lock();
try {
shadow = shadowCache.get(thread);
if (shadow == null) {
shadow = new WeakKey<>();
shadowCache.put(thread, shadow);
return shadow;
}
return shadow;
} finally {
shadowCacheLock.writeLock().unlock();
}
}
public K unwrap() {
return key;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
WeakKey<?> weakKey = (WeakKey<?>) o;
return Objects.equals(key, weakKey.key);
}
@Override
public int hashCode() {
return Objects.hash(key);
}
@Override
public String toString() {
return "WeakKey{" +
"attr=" + key +
'}';
}
}
WeakKey 是前面说的 Object,使用时将需要释放的数据 Data 放到以 WeakKey 为 key 的 WeakHashMap(WeakHashMap<WeakKey, Data>),这样当全部用户释放 WeakKey 引用时就可以完成 WeakHashMap Entry 的 GC(包括 WeakKey 和 Data)。
WeakKey 的主要工作是将用户传入的 key 封装一下再返回,保证全局唯一和内存安全,核心结构是 WeakHashMap<WeakKey<?>, WeakReference<WeakKey<?>>> cache
,Entry.key 是对用户 key 封装的 WeakKey,Entry.value 是 Entry.key 外层嵌套的 WeakReference,作用是避免 value 对 key 强引用而无法对 Entry GC。因此 cache 只要没人强引用里面的 WeakKey,这个 map 在 GC 后就是空的,这样就完成了目标,其余的就是优化了。
如果想在 cache 里查到 WeakKey,那么首先要新建一个 WeakKey,再把 key 赋值到 WeakKey 中,再通过这个新建的 WeakKey 查找,像下面一样:
String key = "key";
WeakKey<String> weakKey = new WeakKey<>();
weakKey.key = key;
WeakReference<WeakKey<?>> ref = cache.get(weakKey);
每次查找都新建对象,有点沙雕,这里使用缓存对象赋值再查找就可以,另外要保证线程安全,threadLocal 没大问题(ThreadLocal.withInitial(WeakKey::new)
),只是不能在 finally 里 remove(remove 的话下次还得新建),在线程池里使用问题不大,不过还有另一种办法,就是 WeakHashMap<Thread, WeakKey<?>>
,它可以保证这个缓存中的“影子”对象在这个线程只创建一次,当线程被 GC 的同时删除“影子”对象,与 threadLocal 的区别只是牺牲了一些加读锁的时间。
测试
下面的 WeakHashMap put 了 Arrays.asList(705, 630, 818) 和 Collections.singletonList(705630818) 两个数据,只有后面的 key 被方法引用了,因此在 GC 后 前一个 key 在 map 中找不到 value,而后一个 key 能获取到 value。
package io.github.hligaty.util;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import java.util.*;
class WeakKeyTest {
@Test
public void testWeakKey() throws InterruptedException {
WeakHashMap<WeakKey<List<Integer>>, Object> map = new WeakHashMap<>();
map.put(WeakKey.wrap(Arrays.asList(705, 630, 818)), new Object());
WeakKey<List<Integer>> weakKey = WeakKey.wrap(Collections.singletonList(705630818));
map.put(weakKey, new Object());
System.gc();
Thread.sleep(5000L);
Assertions.assertNull(map.get(WeakKey.wrap(Arrays.asList(705, 630, 818))));
Assertions.assertNotNull(map.get(WeakKey.wrap(Collections.singletonList(705630818))));
}
}
其他
如果你想使用 null,那 WeakKey 是支持的,但需要注意一点,如果你有两个不同类型的 key 使用了 WeakKey,而两者都允许 WeakKey.wrap(null),那么当有一个类型的使用者持有 WeakKey.wrap(null),另一个类型的 WeakKey.wrap(null) 是不会被释放的,因为显然 Objects.equals(null, null) 为 true(github 上的代码解决这个问题了)。
上面的代码在 github 有比较大的改动,但原理相同,测试类介绍了用法。
扩展
前面的类还可以应用到对字符串加锁(synchronized 锁上面就支持),对于 Lock 锁我提供了 InfiniteStriped 类来使用 Lock 和 ReadWriteLock,当然也可以考虑 Guava Striped,两者一样的是只有 equals 相等的对象加同一个锁,区别是 Guava Striped 在 equals 不等时也可能加同一个锁(可以用 bulkGet 对 key 排序获取顺序的锁再加锁来防止死锁),而 InfiniteStriped 不会,因为 Guava Striped 使用固定数量的锁从而导致 hash 碰撞,而 InfiniteStriped 每次都“新建锁”(如果锁被 GC 了)。
下面附上 Guava Striped 类上的注释(注意,InfiniteStriped 和下面的 Map<K, Lock> 不一样):
A striped Lock/Semaphore/ReadWriteLock. This offers the underlying lock striping similar to that of ConcurrentHashMap in a reusable form, and extends it for semaphores and read-write locks. Conceptually, lock striping is the technique of dividing a lock into many stripes, increasing the granularity of a single lock and allowing independent operations to lock different stripes and proceed concurrently, instead of creating contention for a single lock.
The guarantee provided by this class is that equal keys lead to the same lock (or semaphore), i.e. if (key1.equals(key2)) then striped.get(key1) == striped.get(key2) (assuming Object.hashCode() is correctly implemented for the keys). Note that if key1 is not equal to key2, it is not guaranteed that striped.get(key1) != striped.get(key2); the elements might nevertheless be mapped to the same lock. The lower the number of stripes, the higher the probability of this happening.
There are three flavors of this class: Striped<Lock>, Striped<Semaphore>, and Striped<ReadWriteLock>. For each type, two implementations are offered: strong and weak Striped<Lock>, strong and weak Striped<Semaphore>, and strong and weak Striped<ReadWriteLock>. Strong means that all stripes (locks/semaphores) are initialized eagerly, and are not reclaimed unless Striped itself is reclaimable. Weak means that locks/semaphores are created lazily, and they are allowed to be reclaimed if nobody is holding on to them. This is useful, for example, if one wants to create a Striped<Lock> of many locks, but worries that in most cases only a small portion of these would be in use.
Prior to this class, one might be tempted to use Map<K, Lock>, where K represents the task. This maximizes concurrency by having each unique key mapped to a unique lock, but also maximizes memory footprint. On the other extreme, one could use a single lock for all tasks, which minimizes memory footprint but also minimizes concurrency. Instead of choosing either of these extremes, Striped allows the user to trade between required concurrency and memory footprint. For example, if a set of tasks are CPU-bound, one could easily create a very compact Striped<Lock> of availableProcessors() * 4 stripes, instead of possibly thousands of locks which could be created in a Map<K, Lock> structure.
最后
遇到很奇怪的问题?一定要仔细找找 Guava,全都有,可别写完了才发现:(,另外 github 上的代码已经变成 Guava 的样子了......
再附上 Guava 的用法(sherry/GuavaTest.java at master · hligaty/sherry (github.com)):
package io.github.hligaty.util;
import com.google.common.collect.Interner;
import com.google.common.collect.Interners;
import com.google.common.collect.MapMaker;
import lombok.Data;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import java.util.concurrent.ConcurrentMap;
/**
* @author hligaty
*/
class GuavaTest extends BaseTest {
@Test
public void testGuavaInterner() {
Interner<ID> interner = Interners.newWeakInterner();
ID id = interner.intern(new ID("1"));
String toString = id.toString();
gc(2000);
Assertions.assertSame(id, interner.intern(new ID("1")));
id = null;
gc(2000);
Assertions.assertNotEquals(toString, interner.intern(new ID("1")).toString());
}
@Test
public void testGuavaWeakHashMap() {
Interner<ID> interner = Interners.newWeakInterner();
ConcurrentMap<ID, Object> map = new MapMaker().concurrencyLevel(1).weakKeys().makeMap();
// new/threadLocal/objectPool, 这里自己决定
map.put(interner.intern(new ID("1")), new Object());
gc(2000);
Assertions.assertNull(map.get(interner.intern(new ID("1"))));
}
@Data
static class ID {
private final String id;
@Override
public String toString() {
return String.valueOf(super.hashCode());
}
}
}