作业3：个人项目-词频统计 - 周木南

(1). 实现一个控制台程序，给定一段英文字符串，统计其中各个英文单词（4字符以上含4字符）的出现频率。附加要求：读入一段文本文件，统计该文本文件中单词的频率。

(2). 性能分析：

对C++代码运行VS的性能分析工具，找出性能问题并进行优化。
对Java程序运行性能分析工具 NetBeans IDE 6.0，找出性能问题并进行优化。
字母: A-Z, a-z.
字母数字: A-Z, a-z, 0-9.
分隔符: 非字母数字
单词:
包含有4个或4个以上的字母
单词由分隔符分开
如果一个字符串包含_非_字母数字，则不是单词
单词大小写不敏感，例如 “file”、“FILE”和“File”可以看作同一个单词
单词必须是字母开头，“file123”是单词，“123file”不是单词

首先看到问题，1.要统计的单词是字符串。2.在读取句子的时候要去掉单词上的双引号和单词之间的空格。3.输出统计的单词要4个字母以上。

刚学JAVA没多久，这个问题对我而言还有些难度，就请教了同学。

package zn_1;
 
import java.util.Map;  
import java.util.StringTokenizer;  
import java.util.Map.Entry; 
import java.util.ArrayList;  
import java.util.HashMap;  
import java.util.List;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.Collections;  
import java.util.Comparator; 
    
public class zn_1  {  
public static void main(String arg[]){      
  
    String sentence="Word is case insensitive, i.e. “file”, “FILE” and “File” are considered the same word."; 
    Map<String,Integer> map=new HashMap<String,Integer>();  
    String turn_sentence= sentence.toLowerCase();
    StringTokenizer token=new StringTokenizer(turn_sentence); 
    while(token.hasMoreTokens()){  
        
        String word=token.nextToken(", :\"\".“”");  
        if(map.containsKey(word)){      
            int count=map.get(word);  
            map.put(word, count+1);     
        }  
        else  
            map.put(word, 1);          
    }  
    small(map);                        
}  
public static boolean isNumeric(String str) {
Pattern pattern = Pattern.compile("[0-9]*");
Matcher isNum = pattern.matcher(str.charAt(0)+"");
if (!isNum.matches()) {
return false;
}
return true;
}
public static void small(Map<String,Integer> map){ 
    List<Map.Entry<String, Integer>> infoids = new ArrayList<Map.Entry<String, Integer>>(map.entrySet());   
    Collections.sort(infoids, new Comparator<Map.Entry<String, Integer>>() {    
        public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {     
            return (o2.getValue() - o1.getValue());     
        }     
});
    for (int i = 0; i <infoids.size(); i++) {   
        Entry<String, Integer> id =infoids.get(i);  
        if(id.getKey().length()>3){
    System.out.println(id.getKey()+":"+id.getValue()); 
}
        }  
}  
}

发表于 2016-03-16 14:44 周木南阅读(147) 评论(1) 编辑收藏举报