jieba分词

1.引入依赖

<dependency>
    <groupId>com.huaban</groupId>
    <artifactId>jieba-analysis</artifactId>
    <version>1.0.2</version>
</dependency>

2.分词工具类

package com.itcast.utils;

import com.huaban.analysis.jieba.JiebaSegmenter;
import com.huaban.analysis.jieba.SegToken;
import com.huaban.analysis.jieba.WordDictionary;
import java.io.*;
import java.nio.file.*;
import java.util.ArrayList;
import java.util.List;

public class JiebaAnalyzerUtil {

    /*
    *对语句进行分词
    *@Param text 语句
    *@return 分词后的集合
    *@throws IOException
    */
    public List segment(String text) throws IOException {
        List<Object> strings = new ArrayList<>();
        //dict.txt 自定义词典
        String path = getClass().getClassLoader().getResource("dict.txt").getPath();
        Path upath = Paths.get(new File(path).getAbsolutePath());
        WordDictionary.getInstance().loadUserDict(upath);
        JiebaSegmenter jiebaSegmenter = new JiebaSegmenter();
        List<SegToken> process = jiebaSegmenter.process(text, JiebaSegmenter.SegMode.SEARCH);
        for (SegToken segToken : process) {
            String word = segToken.word;
            strings.add(word);
        }
        return strings;
    }
}

 3.测试

public static void main(String[] args) throws IOException {
        String str= "亲爱的请帮忙推荐一个稳健型-理财基金1期封闭式净值型产品";
    List<String > segment = new JiebaAnalyzerUtil().segment(str); System.out.println(segment); }

4.未指定分词效果

[亲爱, 的, 请, 帮忙, 推荐, 一个, 稳健, 型, -, 理财, 基金, 1, 期, 封闭式, 净值, 型, 产品]

5.指定分词:

dict.txt中输入:

亲爱的 3 n
稳健型 3 n
理财基金1期 3 n
净值型产品  3 n

6.指定分词效果:

[亲爱的, 请, 帮忙, 推荐, 一个, 稳健型, -, 理财基金1期, 封闭式, 净值型产品]

posted @ 2023-06-11 16:48  shadow321  阅读(32)  评论(0编辑  收藏  举报