关于如何用java实现一个高效的计数器

最近在写毕设的时候遇到的一个,很常见的问题

就是对单词统计个数

stackoverflow上的解答

关于如何高效的实现一般有下面几种方法:

[1]使用hashmap

但是注意不要使用containsKey(X) 来判断是否已经事先存在某个word 这会导致每次都遍历整个map

可以使用get(X)==null 来判断是否存在了该单词 这样更快

Integer count = map.get(word);
	if(count == null){
		count = 0;
	}
map.put(word, count + 1);


[2]使用AtomicLong 

final ConcurrentMap<String, AtomicLong> map = 
    new ConcurrentHashMap<String, AtomicLong>();
...
map.putIfAbsent(word, new AtomicLong(0));
map.get(word).incrementAndGet();


[3]使用trove【high perfomance for java collections】

TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
...
freq.adjustOrPutValue(word, 1, 1);


[4]使用MutableInt

至于这个为什么会比HashMap<String,Integer> 快的原因 还要慢慢寻找?

class MutableInt {
  int value = 1; // note that we start at 1 since we're counting
  public void increment () { ++value;      }
  public int  get ()       { return value; }
}
...
Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
...
MutableInt count = freq.get(word);
if (count == null) {
    freq.put(word, new MutableInt());
}
else {
    count.increment();
}

======结论:

  • ContainsKey: 30.654 seconds (baseline)
  • TestForNull: 28.804 seconds (1.06 times as fast)
  • AtomicLong: 29.780 seconds (1.03 times as fast)
  • Trove: 26.313 seconds (1.16 times as fast)
  • MutableInt: 25.747 seconds (1.19 times as fast)
可以看见MutableInt 尽然是最快的(ps:apache commons组件实现了该类)还在思索中。。。。求高人解答(为何比<String,Integer>)还要快?

下面可能是原因:理解中~

Memory rotation may be an issue here, since every boxing of an int larger than or equal to 128 causes an object allocation (see Integer.valueOf(int)). Although the garbage collector very efficiently deals with short-lived objects, performance will suffer to some degree.

If you know that the number of increments made will largely outnumber the number of keys (=words in this case), consider using an int holder instead. Phax already presented code for this. Here it is again, with two changes (holder class made static and initial value set to 1):

static class MutableInt {
  int value = 1;
  void inc() { ++value; }
  int get() { return value; }
}
...
Map<String,MutableInt> map = new HashMap<String,MutableInt>();
MutableInt value = map.get(key);
if (value == null) {
  value = new MutableInt();
  map.put(key, value);
} else {
  value.inc();
}

If you need extreme performance, look for a Map implementation which is directly tailored towards primitive value types. jrudolph mentioned GNU Trove.

By the way, a good search term for this subject is "histogram".


======

posted on 2013-03-25 11:19  scugxl  阅读(507)  评论(0编辑  收藏  举报

导航