jieba分词
1.引入依赖
<dependency> <groupId>com.huaban</groupId> <artifactId>jieba-analysis</artifactId> <version>1.0.2</version> </dependency>
2.分词工具类
package com.itcast.utils; import com.huaban.analysis.jieba.JiebaSegmenter; import com.huaban.analysis.jieba.SegToken; import com.huaban.analysis.jieba.WordDictionary; import java.io.*; import java.nio.file.*; import java.util.ArrayList; import java.util.List; public class JiebaAnalyzerUtil { /* *对语句进行分词 *@Param text 语句 *@return 分词后的集合 *@throws IOException */ public List segment(String text) throws IOException { List<Object> strings = new ArrayList<>(); //dict.txt 自定义词典 String path = getClass().getClassLoader().getResource("dict.txt").getPath(); Path upath = Paths.get(new File(path).getAbsolutePath()); WordDictionary.getInstance().loadUserDict(upath); JiebaSegmenter jiebaSegmenter = new JiebaSegmenter(); List<SegToken> process = jiebaSegmenter.process(text, JiebaSegmenter.SegMode.SEARCH); for (SegToken segToken : process) { String word = segToken.word; strings.add(word); } return strings; } }
3.测试
public static void main(String[] args) throws IOException { String str= "亲爱的请帮忙推荐一个稳健型-理财基金1期封闭式净值型产品";
List<String > segment = new JiebaAnalyzerUtil().segment(str); System.out.println(segment); }
4.未指定分词效果
[亲爱, 的, 请, 帮忙, 推荐, 一个, 稳健, 型, -, 理财, 基金, 1, 期, 封闭式, 净值, 型, 产品]
5.指定分词:
dict.txt中输入:
亲爱的 3 n 稳健型 3 n 理财基金1期 3 n 净值型产品 3 n
6.指定分词效果:
[亲爱的, 请, 帮忙, 推荐, 一个, 稳健型, -, 理财基金1期, 封闭式, 净值型产品]