Java 实现结巴分词
pom.xml 引入结巴分词maven依赖
<dependency>
<groupId>com.huaban</groupId>
<artifactId>jieba-analysis</artifactId>
<version>1.0.2</version>
</dependency>
测试
@Test
public void test() {
String goodsName = "设计小众托特包女大容量通勤包高级手提大包时尚单肩包";
TokenizerEngine engine = TokenizerUtil.createEngine();
Result result = engine.parse(goodsName.replaceAll("\\s*", ""));
ArrayList<String> strList = new ArrayList<>();
for (Word word : result) {
strList.add(word.getText());
}
String collect = strList.stream()
.distinct()
.collect(Collectors.joining("|"));
System.out.println(collect);
}
最终输出
设计|小众|托特|包女|大容量|通勤|包|高级|手提|大包|时尚|单肩
看得出来实现简易分词还是没问题的,复杂的情况就不适用了