Elasticsearch Text analysis

本文档主要介绍分词器，如何使用内置分词器、自定义分词器。

Concepts

分词器通常有几部分组成：

字符过滤器：可以有0-N个。例如字符转换等操作。

分词执行器：只有有1个。例如把 "Quick brown fox!" 分词为[Quick, brown, fox!] 。

分词过滤器：可以有0-N个。例如把tokens转换为小写。

分词器的使用时机是在 Index 和 Search 阶段，通常2者使用相同的分词器，也可以使用不同的。

Configure text analysis

在使用分词器之前，可以测试分词的结果是否符合预期：

POST _analyze
{
  "analyzer": "whitespace",
  "text":     "The quick brown fox."
}

使用内置分词器：

PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_english": { 
          "type":      "standard",
          "stopwords": "_english_"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_text": {
        "type":     "text",
        "analyzer": "standard", 
        "fields": {
          "english": {
            "type":     "text",
            "analyzer": "std_english" 
          }
        }
      }
    }
  }
}

创建自定义的分词器 Create a custom analyzer

指定使用分词器 Specify an analyzer

Anatomy of an analyzer

posted on 2021-11-11 10:28 icodegarden 阅读(37) 评论(0) 编辑收藏举报