Elasticsearch技术解析与实战（9）IK分词

序言

内置分词器

POST /_analyze
{
  "analyzer":"standard",
  "text":"且听风吟"
}

内置分词器对中文的局限性

ik分词器

安装IK分词器

下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

测试ik分词器

根据官方的建议，ik分词器的名字可以使用：ik_smart , ik_max_word

POST /_analyze
{
"analyzer": "ik_max_word",
"text": "且听风吟"
}

POST /_analyze
{
"analyzer": "ik_smart",
"text": "且听风吟"
}

下面创建一个索引，然后要求对中文部分的text用ik分词器来解析，来观察ik分词器的效果。

PUT /ropledata
{
  "settings": {
    "index": {
      "number_of_shards": "2",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "hobby": {
        "type": "text"
      }
    }
  }
}

ik分词器是根据什么来分词的呢？如果有些特殊的词汇比如人名，店名，网名，想根据自己的要求特殊处理来分词，能不能解决呢？

ik分词器本身维护了一个超大的词汇文本，里面有非常多的中文词汇。这个文件在ik/config/下，名为main.dic，咱们可以打开看看：

D:\Java\ELK6.7.0\elasticsearch-6.7.0\plugins\ik\config

资料

ElasticSearch中文分词，看这一篇就够了

posted @ 2022-04-09 20:11 ~沐风阅读(96) 评论(0) 编辑收藏举报

刷新页面返回顶部

沐风

Elasticsearch技术解析与实战（9）IK分词

序言

内置分词器

ik分词器

资料

公告