es集群环境安装分词器elasticsearch-analysis-ik(直接解压方式)

环境:
OS:Centos 7
es:6.8.5
拓扑:3节点的集群

1.下载地址
https://github.com/medcl/elasticsearch-analysis-ik
下载的版本需要跟es的版本保持一致
我这里的es版本是6.8.5的,所以下载相应版本的分词器
elasticsearch-analysis-ik-6.8.5.zip

2.解压部署
[root@localhost soft]# mkdir ik
[root@localhost soft]# mv elasticsearch-analysis-ik-6.8.5.zip ./ik
[root@localhost soft]# cd ik
[root@localhost ik]# unzip elasticsearch-analysis-ik-6.8.5.zip
[root@localhost ik]# rm elasticsearch-analysis-ik-6.8.5.zip ##删除原包
[root@localhost ik]# cd ..
[root@localhost soft]# mv ik /usr/local/services/elasticsearch/plugins/ ##将解压目录拷贝到es的plugins目录
[root@localhost soft]# cd /usr/local/services/elasticsearch
[root@localhost elasticsearch]# chown -R elasticsearch:elasticsearch ./plugins ##修改权限

3.拷贝到另外的节点
su - elasticsearch
cd /usr/local/services/elasticsearch/plugins
scp -r ik elasticsearch@192.168.1.103:/usr/local/services/elasticsearch/plugins/
scp -r ik elasticsearch@192.168.1.105:/usr/local/services/elasticsearch/plugins/

若不是使用elasticsearch账号拷贝的,拷贝过去需要注意修改权限
cd /usr/local/services/elasticsearch
chown -R elasticsearch:elasticsearch ./plugins

4.重启es集群
每个节点kill掉es进程,然后启动
[elasticsearch@localhost bin]$ ./elasticsearch -d

5.查看分词器
[root@localhost plugins]# curl -u elastic:elastic -X GET "http://192.168.1.103:19200/_cat/plugins?v&s=component&h=name,component,version,description&pretty"
name component version description
node104 analysis-ik 6.8.5 IK Analyzer for Elasticsearch
node103 analysis-ik 6.8.5 IK Analyzer for Elasticsearch
node101 analysis-ik 6.8.5 IK Analyzer for Elasticsearch

6.使用分词器

curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "ik_smart",
  "text": "我爱你,特靠谱"
}
'


ik_max_word 分词
curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "ik_max_word",
  "text": "我爱你,特靠谱"
}
'

7.自定义分词库
集群每个节点都要操作
cd /usr/local/services/elasticsearch/plugins/ik/config ##进去配置目录
cp suffix.dic mytest.dic ##拷贝一个例子
删除原有的内容,加入新增的内容
[root@localhost config]# more mytest.dic
特靠谱
靠谱

8.配置自定义分词词库(字典)
集群每个节点都要操作
cd /usr/local/services/elasticsearch/plugins/ik/config
指定自定义分词词库名称

[elasticsearch@localhost config]$ more IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict">mytest.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

9.重启每个节点的es

10.再次测试ik_smart和ik_max_word分词器

[root@localhost config]# curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
> {
>   "analyzer": "ik_smart",
>   "text": "我爱你,特靠谱"
> }
> '
{
  "tokens" : [
    {
      "token" : "我爱你",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "特靠谱",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}
[root@localhost config]# curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
> {
>   "analyzer": "ik_max_word",
>   "text": "我爱你,特靠谱"
> }
> '
{
  "tokens" : [
    {
      "token" : "我爱你",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "爱你",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "特靠谱",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "特",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "靠谱",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}

发现"特靠谱"和"靠谱" 单独作为一个分词了

posted @ 2023-07-03 14:40 slnngk 阅读(257) 评论(0) 编辑收藏举报

刷新页面返回顶部

es集群环境安装分词器elasticsearch-analysis-ik(直接解压方式)

公告