elasticsearch: 安装ik中文分词(es 8.14.2)

一,测试分词命令:

1,查看已安装的插件:

[lhdop@blog ~]$ curl -X GET "localhost:9200/_cat/plugins?v&s=component"
name component version

2,standard分词

[lhdop@blog ~]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
> {
  "analyzer": "standard",
  "text": "Text to analyze"
}
'
{
  "tokens" : [
    {
      "token" : "text",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "to",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "analyze",
      "start_offset" : 8,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

2, 从命令行安装smartcn分词插件:

[lhdop@blog bin]$ ./elasticsearch-plugin install analysis-smartcn
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
-> Installing analysis-smartcn
-> Downloading analysis-smartcn from elastic
[=================================================] 100%  
-> Installed analysis-smartcn
-> Please restart Elasticsearch to activate any plugins installed

3,smartcn安装到了plugins目录下,查看文件:

[lhdop@blog elasticsearch-8.14.2]$ ls plugins/analysis-smartcn/
analysis-smartcn-8.14.2.jar  lucene-analysis-smartcn-9.10.0.jar  plugin-descriptor.properties

安装完后,如果想让插件生效,需要重启elasticsearch服务

关闭

[root@blog ~]# kill 260903

启动:

[root@blog ~]# /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch -d 

4,试用smartcn分词,效果不怎么理想,把'海鲜味'给分成了'海'和'鲜味'两个词

[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "smartcn",
  "text": "这是一碗海鲜味方便面"
}
'
{
  "tokens" : [
    {
      "token" : "这",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "一",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "碗",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "海",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "鲜味",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "方便面",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "word",
      "position" : 6
    }
  ]
}

5,查看已安装的插件,已经可以看到安装后的smartcn插件了:

[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_cat/plugins?v&s=component"
name                    component        version
iZ2zejc9t0hf6pnw6sewrxZ analysis-smartcn 8.14.2

二,ik分词插件安装

1,github地址

https://github.com/infinilabs/analysis-ik/releases

2,官网:

https://release.infinilabs.com/analysis-ik/stable/

3,查看本地es的版本:

[lhdop@blog ~]$ /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch --version
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
Version: 8.14.2, Build: tar/2afe7caceec8a26ff53817e5ed88235e90592a1b/2024-07-01T22:06:58.515911606Z, JVM: 17.0.11

4, 安装支持的elasticsearch版本地址:

说明:ik的版本是要和es的版本严格对应的,否则可能会导致安装或运行报错

[lhdop@blog elasticsearch-8.14.2]$ bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/8.14.2
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
-> Installing https://get.infini.cloud/elasticsearch/analysis-ik/8.14.2
-> Downloading https://get.infini.cloud/elasticsearch/analysis-ik/8.14.2
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See https://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed analysis-ik
-> Please restart Elasticsearch to activate any plugins installed

5,重启服务:

关闭

[root@blog ~]# kill 264687

启动:

[root@blog ~]# /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch -d 

6,安装完成后查看插件列表

[lhdop@blog elasticsearch-8.14.2]$ ./bin/elasticsearch-plugin list
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
analysis-ik
analysis-smartcn

 

三,测试效果

1,两种分词方式

ik中文分词效果

ik分词插件支持 ik_smart 和 ik_max_word 两种分词器

ik_smart - 粗粒度的分词
ik_max_word - 会尽可能的枚举可能的关键词,就是分词比较细致一些,会分解出更多的关键词

2,测试ik_smart分词器:

[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "ik_smart",
  "text": "这是一碗海鲜味方便面"
}
> '
{
  "tokens" : [
    {
      "token" : "这是",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "一碗",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "海",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "鲜味",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "方便面",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
} 

3,测试ik_max_word分词器:

[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "ik_max_word",
  "text": "这是一碗海鲜味方便面"
}'
{
  "tokens" : [
    {
      "token" : "这是",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "一碗",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "一",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "TYPE_CNUM",
      "position" : 2
    },
    {
      "token" : "碗",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "COUNT",
      "position" : 3
    },
    {
      "token" : "海鲜",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "鲜味",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "方便面",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "方便",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "面",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "CN_CHAR",
      "position" : 8
    }
  ]
}

 

四,查看es版本

[lhdop@blog ~]$ /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch --version
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
Version: 8.14.2, Build: tar/2afe7caceec8a26ff53817e5ed88235e90592a1b/2024-07-01T22:06:58.515911606Z, JVM: 17.0.11

 

posted @ 2024-07-11 11:17  刘宏缔的架构森林  阅读(1064)  评论(0编辑  收藏  举报