第一章第十六节:Elasticsearch之ik分词器

1、docker安装ik分词器

1:下载对应版的ik分词器安装包
https://github.com/medcl/elasticsearch-analysis-ik/releases/

2:上传会服务器并解压到ik文件夹
unzip elasticsearch-analysis-ik-7.4.2.zip -d ik/

3:把ik文件夹移动到elasticsearch的plugins挂载目录
mv ik /mydata/elasticsearch/plugins/

4:给ik文件夹赋予权限
chmod -R 777 /mydata/elasticsearch/plugins/ik

5:重启elasticsearch docker服务
docker restart elasticsearch

2、ik分词器使用

#细粒度分词
GET _analyze
{
  "text": "北京市朝阳区",
  "analyzer":"ik_max_word"
}

响应数据:

{
  "tokens" : [
    {
      "token" : "北京市",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "北京",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "市",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "朝阳区",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "朝阳",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "区",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 5
    }
  ]
}

#粗粒度分词
GET _analyze
{
  "text": "北京市朝阳区",
  "analyzer":"ik_smart"
}

响应数据:

{
  "tokens" : [
    {
      "token" : "北京市",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "朝阳区",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

3、自定义分词库

1:在nginx的html目录下创建es文件夹,在es文件夹下创建fenci.txt文件

2:在fenci.txt里面写入内如,比如:乔碧罗 小阿峰

3:修改/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
<entry key="remote_ext_dict">http://nginx_ip/es/fenci.txt</entry>

4:重启elasticsearch docker服务

fenci.txt:

测试:

GET _analyze
{
  "text": "乔碧罗殿下喜欢小阿峰",
  "analyzer":"ik_max_word"
}

响应数据:

{
  "tokens" : [
    {
      "token" : "乔碧罗",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "殿下",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "喜欢",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "小阿峰",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

posted @ 2021-07-04 14:39  努力的校长  阅读(79)  评论(0编辑  收藏  举报