第一章第十六节：Elasticsearch之ik分词器

1、docker安装ik分词器

1：下载对应版的ik分词器安装包
https://github.com/medcl/elasticsearch-analysis-ik/releases/

2：上传会服务器并解压到ik文件夹
unzip elasticsearch-analysis-ik-7.4.2.zip -d ik/

3：把ik文件夹移动到elasticsearch的plugins挂载目录
mv ik /mydata/elasticsearch/plugins/

4:给ik文件夹赋予权限
chmod -R 777 /mydata/elasticsearch/plugins/ik

5:重启elasticsearch docker服务
docker restart elasticsearch

2、ik分词器使用

#细粒度分词
GET _analyze
{
  "text": "北京市朝阳区",
  "analyzer":"ik_max_word"
}

响应数据：

{
  "tokens" : [
    {
      "token" : "北京市",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "北京",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "市",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "朝阳区",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "朝阳",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "区",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 5
    }
  ]
}

#粗粒度分词
GET _analyze
{
  "text": "北京市朝阳区",
  "analyzer":"ik_smart"
}

响应数据：

{
  "tokens" : [
    {
      "token" : "北京市",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "朝阳区",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

3、自定义分词库

1：在nginx的html目录下创建es文件夹，在es文件夹下创建fenci.txt文件

2：在fenci.txt里面写入内如，比如：乔碧罗 小阿峰

3：修改/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
<entry key="remote_ext_dict">http://nginx_ip/es/fenci.txt</entry>

4：重启elasticsearch docker服务

fenci.txt:

测试：

GET _analyze
{
  "text": "乔碧罗殿下喜欢小阿峰",
  "analyzer":"ik_max_word"
}

响应数据：

{
  "tokens" : [
    {
      "token" : "乔碧罗",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "殿下",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "喜欢",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "小阿峰",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

posted @ 2021-07-04 14:39 努力的校长阅读(79) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

努力的校长

第一章第十六节：Elasticsearch之ik分词器

1、docker安装ik分词器

2、ik分词器使用

3、自定义分词库

公告