第一章第十六节:Elasticsearch之ik分词器
1、docker安装ik分词器
1:下载对应版的ik分词器安装包
https://github.com/medcl/elasticsearch-analysis-ik/releases/
2:上传会服务器并解压到ik文件夹
unzip elasticsearch-analysis-ik-7.4.2.zip -d ik/
3:把ik文件夹移动到elasticsearch的plugins挂载目录
mv ik /mydata/elasticsearch/plugins/
4:给ik文件夹赋予权限
chmod -R 777 /mydata/elasticsearch/plugins/ik
5:重启elasticsearch docker服务
docker restart elasticsearch
2、ik分词器使用
#细粒度分词
GET _analyze
{
"text": "北京市朝阳区",
"analyzer":"ik_max_word"
}
响应数据:
{
"tokens" : [
{
"token" : "北京市",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "北京",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "市",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "朝阳区",
"start_offset" : 3,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "朝阳",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "区",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_CHAR",
"position" : 5
}
]
}
#粗粒度分词
GET _analyze
{
"text": "北京市朝阳区",
"analyzer":"ik_smart"
}
响应数据:
{
"tokens" : [
{
"token" : "北京市",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "朝阳区",
"start_offset" : 3,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 1
}
]
}
3、自定义分词库
1:在nginx的html目录下创建es文件夹,在es文件夹下创建fenci.txt文件
2:在fenci.txt里面写入内如,比如:乔碧罗 小阿峰
3:修改/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
<entry key="remote_ext_dict">http://nginx_ip/es/fenci.txt</entry>
4:重启elasticsearch docker服务
fenci.txt:
测试:
GET _analyze
{
"text": "乔碧罗殿下喜欢小阿峰",
"analyzer":"ik_max_word"
}
响应数据:
{
"tokens" : [
{
"token" : "乔碧罗",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "殿下",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "喜欢",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "小阿峰",
"start_offset" : 7,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 3
}
]
}