Java 实现结巴分词

pom.xml 引入结巴分词maven依赖

<dependency>
	<groupId>com.huaban</groupId>
	<artifactId>jieba-analysis</artifactId>
	<version>1.0.2</version>
</dependency>

测试

@Test
    public void test() {
        String goodsName = "设计小众托特包女大容量通勤包高级手提大包时尚单肩包";
        TokenizerEngine engine = TokenizerUtil.createEngine();
        Result result = engine.parse(goodsName.replaceAll("\\s*", ""));
        ArrayList<String> strList = new ArrayList<>();
        for (Word word : result) {
            strList.add(word.getText());
        }
        String collect = strList.stream()
                .distinct()
                .collect(Collectors.joining("|"));
        System.out.println(collect);
    }

最终输出

设计|小众|托特|包女|大容量|通勤|包|高级|手提|大包|时尚|单肩

看得出来实现简易分词还是没问题的,复杂的情况就不适用了

posted @ 2023-10-21 11:46  Kllin  阅读(228)  评论(0编辑  收藏  举报