hello

elastaticsearch安装ik分词器

1、官网下载安装包
地址:https://github.com/medcl/elasticsearch-analysis-ik/releases

2、下载对应es版本的分词器安装包
cd /home/fch/module
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.16.3/elasticsearch-analysis-ik-7.16.3.zip

3、解压到es的安装目录/plugin/ik
unzip elasticsearch-analysis-ik-7.16.3.zip -d /home/fch/module/es/plugin/ik

不用重启,先测试未使用分词器
POST http://119.91.127.xxx:9200/student2/_analyze
{
"text":"我的华为手机"
}

返回结果
{
    "tokens": [
        {
            "token": "我",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "的",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "华",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "为",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "手",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "机",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        }
    ]
}

4、重启后使用分词器
resful: POST
url: http://119.91.127.xxx:9200/student2/_analyze
请求方式:JSON
{
"analyzer": "ik_max_word",
"text":"我的华为手机"
}

返回结果
{
    "tokens": [
        {
            "token": "我",
            "start_offset": 0,
            "end_offset": 1,
            "type": "CN_CHAR",
            "position": 0
        },
        {
            "token": "的",
            "start_offset": 1,
            "end_offset": 2,
            "type": "CN_CHAR",
            "position": 1
        },
        {
            "token": "华为",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "手机",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 3
        }
    ]
}

对于上面两个分词效果的解释:

  1. 如果未安装ik分词器,那么,你如果写 "analyzer": "ik_max_word",那么程序就会报错,因为你没有安装ik分词器
  2. 如果你安装了ik分词器之后,你不指定分词器,不加上 "analyzer": "ik_max_word" 这句话,那么其分词效果跟你没有安装ik分词器是一致的,也是分词成每个汉字。
  3. ik_max_word : 细粒度分词,会穷尽一个语句中所有分词可能,ik_smart : 粗粒度分词,优先匹配最长词

5、创建索引时可以指定字段类型为ik分词器
Es 内置分词器
Standard Analyer 默认分词器,按词切分,小写处理
Simple Analyer 按照非字母切分(符号被过滤),小写处理
Stop Analyer 小写处理,停用过滤词(the, is , a)
Whitespace Analyer 按照空格切分,不转小写
Keyword Analyer 不分词,直接将输入当作输出
Pattern Analyer 正则表达式,默认 \W+(非字符分隔)
Language 提供30种分词器
Customer Analyzer 自定义分词器

点击查看代码
PUT test_index
{
"settings":{
    "number_of_shards": "6",
    "number_of_replicas": "1",  
     //指定分词器  
    "analysis":{   
      "analyzer":{
        "ik":{
          "tokenizer":"ik_max_word"
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "title":{
          "type": "text",
          "analyzer": "whitespace",
          "search_analyzer": "standard"
        }
      }
    }
  }
}
posted @ 2022-02-15 15:29  八股文研究生  阅读(217)  评论(0)    收藏  举报
my name is zhangsan