Java词频统计
public class WordCount {
public static void main(String[] args) {
String[] stopWords = { "", ",", "." };
List<String> stopWordList = Arrays.asList(stopWords);
String strWorld = "Regular expressions are used in search engines, search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK and in lexical analysis. Many programming languages provide regex capabilities, built-in, or via libraries.";
String[] words = strWorld.split(" |,|\\.");
System.out.println(Arrays.toString(words));
List<String> wordList = Arrays.asList(words);
System.out.println(wordList);
Multiset<String> wordSet = HashMultiset.create();
wordSet.addAll(wordList);
wordSet.removeAll(stopWordList);
System.out.println("word count:" + wordSet.size());
System.out.println("unique word count:" + wordSet.elementSet().size());
for (String key : wordSet.elementSet()) {
System.out.println(key + ":" + wordSet.count(key));
}
}
}
作者: Acode
出处: http://www.cnblogs.com/acode/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出, 原文链接 如有问题, 可留言咨询.