elasticsearch安装中文分词器

1. 分词器的安装

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.3/elasticsearch-analysis-ik-6.2.3.zip

NOTE: replace 6.2.3 to your own elasticsearch version

github上面的地址

https://github.com/medcl/elasticsearch-analysis-ik

需要注意安装的版本和对应的elasticsearch相匹配

使用方法：

1> 在ElasticSearch的配置文件config/elasticsearch.yml中的最后一行添加参数 index.analysis.analyzer.default.type: ik，则设置所有索引的默认分词器为ik分词。

2> 也可以通过设置mapping来使用ik分词

2. IK分词器的两种分词模式。

1> ik_max_word: 会将文本做最细粒度的拆分，比如会将"北京邮电大学"拆分，会穷尽各种可能的组合；

{
    "tokens":[
        {
            "token":"北京邮电",
            "start_offset":0,
            "end_offset":4,
            "type":"CN_WORD",
            "position":0
        },
        {
            "token":"北京",
            "start_offset":0,
            "end_offset":2,
            "type":"CN_WORD",
            "position":1
        },
        {
            "token":"邮电大学",
            "start_offset":2,
            "end_offset":6,
            "type":"CN_WORD",
            "position":2
        },
        {
            "token":"邮电",
            "start_offset":2,
            "end_offset":4,
            "type":"CN_WORD",
            "position":3
        },
        {
            "token":"电大",
            "start_offset":3,
            "end_offset":5,
            "type":"CN_WORD",
            "position":4
        },
        {
            "token":"大学",
            "start_offset":4,
            "end_offset":6,
            "type":"CN_WORD",
            "position":5
        }
    ]
}

2> ik_smart: 会做最粗粒度的拆分

{
    "tokens":[
        {
            "token":"北京",
            "start_offset":0,
            "end_offset":2,
            "type":"CN_WORD",
            "position":0
        },
        {
            "token":"邮电大学",
            "start_offset":2,
            "end_offset":6,
            "type":"CN_WORD",
            "position":1
        }
    ]
}

posted @ 2018-12-04 20:50 archer-wong 阅读(961) 评论(0) 收藏举报

刷新页面返回顶部

ArcheWong

宝剑锋从磨砺出,梅花香自苦寒来

elasticsearch安装中文分词器

1. 分词器的安装

2. IK分词器的两种分词模式。

公告