es集群环境安装分词器elasticsearch-analysis-ik(直接解压方式)
环境:
OS:Centos 7
es:6.8.5
拓扑:3节点的集群
1.下载地址
https://github.com/medcl/elasticsearch-analysis-ik
下载的版本需要跟es的版本保持一致
我这里的es版本是6.8.5的,所以下载相应版本的分词器
elasticsearch-analysis-ik-6.8.5.zip
2.解压部署
[root@localhost soft]# mkdir ik
[root@localhost soft]# mv elasticsearch-analysis-ik-6.8.5.zip ./ik
[root@localhost soft]# cd ik
[root@localhost ik]# unzip elasticsearch-analysis-ik-6.8.5.zip
[root@localhost ik]# rm elasticsearch-analysis-ik-6.8.5.zip ##删除原包
[root@localhost ik]# cd ..
[root@localhost soft]# mv ik /usr/local/services/elasticsearch/plugins/ ##将解压目录拷贝到es的plugins目录
[root@localhost soft]# cd /usr/local/services/elasticsearch
[root@localhost elasticsearch]# chown -R elasticsearch:elasticsearch ./plugins ##修改权限
3.拷贝到另外的节点
su - elasticsearch
cd /usr/local/services/elasticsearch/plugins
scp -r ik elasticsearch@192.168.1.103:/usr/local/services/elasticsearch/plugins/
scp -r ik elasticsearch@192.168.1.105:/usr/local/services/elasticsearch/plugins/
若不是使用elasticsearch账号拷贝的,拷贝过去需要注意修改权限
cd /usr/local/services/elasticsearch
chown -R elasticsearch:elasticsearch ./plugins
4.重启es集群
每个节点kill掉es进程,然后启动
[elasticsearch@localhost bin]$ ./elasticsearch -d
5.查看分词器
[root@localhost plugins]# curl -u elastic:elastic -X GET "http://192.168.1.103:19200/_cat/plugins?v&s=component&h=name,component,version,description&pretty"
name component version description
node104 analysis-ik 6.8.5 IK Analyzer for Elasticsearch
node103 analysis-ik 6.8.5 IK Analyzer for Elasticsearch
node101 analysis-ik 6.8.5 IK Analyzer for Elasticsearch
6.使用分词器
curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "ik_smart",
"text": "我爱你,特靠谱"
}
'
ik_max_word 分词
curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "ik_max_word",
"text": "我爱你,特靠谱"
}
'
7.自定义分词库
集群每个节点都要操作
cd /usr/local/services/elasticsearch/plugins/ik/config ##进去配置目录
cp suffix.dic mytest.dic ##拷贝一个例子
删除原有的内容,加入新增的内容
[root@localhost config]# more mytest.dic
特靠谱
靠谱
8.配置自定义分词词库(字典)
集群每个节点都要操作
cd /usr/local/services/elasticsearch/plugins/ik/config
指定自定义分词词库名称
[elasticsearch@localhost config]$ more IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">mytest.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
9.重启每个节点的es
10.再次测试ik_smart和ik_max_word分词器
[root@localhost config]# curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
> {
> "analyzer": "ik_smart",
> "text": "我爱你,特靠谱"
> }
> '
{
"tokens" : [
{
"token" : "我爱你",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "特靠谱",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 1
}
]
}
[root@localhost config]# curl -u elastic:elastic -X POST "192.168.1.103:19200/_analyze?pretty" -H 'Content-Type: application/json' -d'
> {
> "analyzer": "ik_max_word",
> "text": "我爱你,特靠谱"
> }
> '
{
"tokens" : [
{
"token" : "我爱你",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "爱你",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "特靠谱",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "特",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "靠谱",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 5
}
]
}
发现"特靠谱"和"靠谱" 单独作为一个分词了