java实现中文分词

IK Analyzer是基于lucene实现的分词开源框架

下载路径:http://so.csdn.net/so/search/s.do?q=IKAnalyzer2012.jar&t=doc&o=&s=all&l=null

需要在项目中引入:

  IKAnalyzer2012.jar

  lucene-core-3.6.0.jar

实现的两种方法:

使用(lucene)实现:

 1 import java.io.IOException;
 2 import java.io.StringReader;
 3 import org.wltea.analyzer.core.IKSegmenter;
 4 import org.wltea.analyzer.core.Lexeme;
 5 
 6 public class Fenci1 {
 7     public static void main(String[] args) throws IOException{
 8         String text="你好,我的世界!";  
 9         StringReader sr=new StringReader(text);  
10         IKSegmenter ik=new IKSegmenter(sr, true);  
11         Lexeme lex=null;  
12         while((lex=ik.next())!=null){  
13             System.out.print(lex.getLexemeText()+",");  
14         } 
15     }
16 
17 }

 

使用(IK Analyzer)实现:

 1 import java.io.IOException;
 2 import java.io.StringReader;
 3 import org.apache.lucene.analysis.Analyzer;
 4 import org.apache.lucene.analysis.TokenStream;
 5 import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
 6 import org.wltea.analyzer.lucene.IKAnalyzer;
 7 
 8 public class Fenci {
 9     public static void main(String[] args) throws IOException {
11             String text="你好,我的世界!";  
12             //创建分词对象  
13             Analyzer anal=new IKAnalyzer(true);       
14             StringReader reader=new StringReader(text);  
15             //分词  
16             TokenStream ts=anal.tokenStream("", reader);  
17             CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);  
18             //遍历分词数据  
19             while(ts.incrementToken()){  
20                 System.out.print(term.toString()+",");  
21             }  
22             reader.close();  
23             System.out.println(); 
24     }
25 
26 }

运行后结果:

你好,我,的,世界,

posted @ 2017-08-02 14:00  阿水写程序  阅读(7115)  评论(1编辑  收藏  举报