Elasticsearch 索引mapping的参数

Elasticsearch(7.15) mapping 提供的参数

[官方文档]([Mapping parameters | Elasticsearch Guide 7.15] | Elastic)

参数	描述	说明
analyzer	指定一个用来文本分析的索引或者搜索text字段的分析器应用于索引以及查询，除非使用search_analyzer参数覆盖此参数	只有text类型字段支持该参数
boost	查询时，依赖相关性得分字段自动提升	增强仅适用于术语查询（前缀，范围和模糊查询不会被提升）
coerce	强制尝试清除脏值以适合字段的数据类型	字符串将被强制转换为数字浮点将被截断为整数值
copy_to	参数允许您创建自定义 _all 字段。换句话说，可以将多个字段的值复制到组字段中，然后可以将其作为单个字段进行查询	_source不会修改可以将相同的值复制到多个字段
doc_values	ES倒排索引的检索性能是非常快的，但是在字段值排序时却不是理想的结构。`Doc Values` 本质上是一个序列化的列式存储，这个结构非常适用于聚合、排序、脚本等操作。	Doc Values 默认对所有字段启用，除了 analyzed strings。也就是说所有的数字、地理坐标、日期、IP 和不分析（not_analyzed）字符类型都会默认开启。虽然Doc Values非常好用，但是如果你存储的数据确实不需要这个特性，就不如禁用他，这样不仅节省磁盘空间，也许会提升索引的速度。
dynamic	只需索引包含新字段的文档，即可将字段动态添加到文档或文档中的内部对象	true：新检测到的字段将添加到映射中。（默认） false：新检测到的字段将被忽略。这些字段不会被编入索引，因此无法搜索，但仍会出现在`_source`返回的匹配字段中。这些字段不会添加到映射中，必须显式添加新字段。 strict：如果检测到新字段，则抛出异常并拒绝该文档。
eager_global_ordinals	是否总是使用全局序号
enabled	只想存储字段而不对其进行索引	`enabled`设置只能应用于映射类型和 `object`字段，导致Elasticsearch完全跳过对字段内容的解析。仍然可以从`_source`字段中检索JSON ，但它不可搜索或以任何其他方式存储
fielddata	text字段是否加载到内存中做聚合排序等操作	默认text字段禁用Fielddata，可以被搜索，但不能被用于聚合、排序或者脚本编写字段设置fielddata=true，以便通过取消反转索引将fielddata加载到内存中。请注意，这可能会占用大量内存。
fields	经常被用来索引同一个字段的不同方式	比如，一个string字段可以被映射为text字段作为全文检索，也可以作为keyword用来排序和聚合。
format	字段格式	主要是date类型数据一个json文档，日期表示为字符串，ES使用一个预配的格式解析该字符串为一个long的毫秒值
ignore_above	长度超过设置将不被存储和索引	对于数组类型作用于数组每一个元素长度单位字符
ignore_malformed	是否忽略数据错误字段	默认false，错误数据会抛出异常设置了忽略后，错误字段不会被存储，而其他字段将会被正常存储
index_options	控制重点突出以及高亮	docs：仅索引文档编号 freqs：索引文档编号以及术语频率 positions(default)：索引文档编号以及术语频率和术语位置(顺序) offsets：索引文档编号以及术语频率和术语位置以及开始和结束字符偏移量参数仅适用于text字段
index_phrases	If enabled, two-term word combinations (shingles) are indexed into a separate field. This allows exact phrase queries (no slop) to run more efficiently, at the expense of a larger index. Note that this works best when stopwords are not removed, as phrases containing stopwords will not use the subsidiary field and will fall back to a standard phrase query	true/false(默认)
index_prefixes	启用术语前缀索引加速查询	min_chars：要索引的最小前缀长度，必须大于0，默认2 max_chars：要索引的最大前缀长度，必须小于20，默认5
index	字段是否可以被索引和查询	默认true；当false时，字段可存储不可索引不可查询
meta	元数据	附加到字段上的元数据，对ES不透明元数据最多5个，键长度小于20，值长度小于50
normalizer	该参数对于keyword字段与analyzer相似	它只保证分析链生成单个标记
norms	Norms store various normalization factors that are later used at query time in order to compute the score of a document relatively to a query	true/false
null_value	显式替换null值	null值或者空数组等无法存储和索引
position_increment_gap	When indexing text fields with multiple values a "fake" gap is added between the values to prevent most phrase queries from matching across the values. The size of this gap is configured using `position_increment_gap`	默认100
properties	嵌套子字段属性
search_analyzer	指定查询分析器	可以覆盖analyzer配置
similarity	字段配置评分算法或相似性算法	BM25(默认) classic boolean
store	是否存储	默认情况下，字段值会编制索引以使其可搜索，但不会存储。这意味着可以查询字段，但无法检索原始字段值
term_vector	Term vectors contain information about the terms produced by the analysis process	a list of terms. the position (or order) of each term. the start and end character offsets mapping the term to its origin in the original string. payloads (if they are available) — user-defined binary data associated with each term position.

analyzer 示例

An analyzer setting for indexing all terms including stop words

A search_analyzer setting for non-phrase queries that will remove stop words

A search_quote_analyzer setting for phrase queries that will not remove stop words

PUT my-index-000001
{
   "settings":{
      "analysis":{
         "analyzer":{
            "my_analyzer":{ 
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase"
               ]
            },
            "my_stop_analyzer":{ 
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "english_stop"
               ]
            }
         },
         "filter":{
            "english_stop":{
               "type":"stop",
               "stopwords":"_english_"
            }
         }
      }
   },
   "mappings":{
       "properties":{
          "title": {
             "type":"text",
             "analyzer":"my_analyzer", 
             "search_analyzer":"my_stop_analyzer", 
             "search_quote_analyzer":"my_analyzer" 
         }
      }
   }
}

PUT my-index-000001/_doc/1
{
   "title":"The Quick Brown Fox"
}

PUT my-index-000001/_doc/2
{
   "title":"A Quick Brown Fox"
}

GET my-index-000001/_search
{
   "query":{
      "query_string":{
         "query":"\"the quick brown fox\"" 
      }
   }
}

boost 示例

匹配到 title 字段相比较 content 字段将会是两倍权重，其他字段默认boost=1.0

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "boost": 2 
      },
      "content": {
        "type": "text"
      }
    }
  }
}

coerce 示例

该number_one字段将包含整数10

number_two将拒绝文档

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "number_one": {
        "type": "integer"
      },
      "number_two": {
        "type": "integer",
        "coerce": false
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "number_one": "10" 
}

PUT my-index-000001/_doc/2
{
  "number_two": "10" 
}

copy_to 示例

first_name和last_name字段的值将复制到 full_name字段中


PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name"
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name"
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}
 
PUT my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}
 
GET my_index/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

eager_global_ordinals 示例

PUT my-index-000001/_mapping
{
  "properties": {
    "tags": {
      "type": "keyword",
      "eager_global_ordinals": true
    }
  }
}

enabled 示例

假设您使用Elasticsearch作为Web会话存储。您可能希望索引会话ID和上次更新时间，但不需要在会话数据本身上查询或运行聚合。

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "user_id": {
        "type":  "keyword"
      },
      "last_updated": {
        "type": "date"
      },
      "session_data": { 
        "type": "object",
        "enabled": false
      }
    }
  }
}

PUT my-index-000001/_doc/session_1
{
  "user_id": "kimchy",
  "session_data": { 
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  },
  "last_updated": "2015-12-06T18:20:22"
}

PUT my-index-000001/_doc/session_2
{
  "user_id": "jpountz",
  "session_data": "none", 
  "last_updated": "2015-12-06T18:22:13"
}

fielddata 示例

my_field 字段用来搜索

my_field.keyword 字段用来聚合、排序和写脚本

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_field": { 
        "type": "text",
        "fields": {
          "keyword": { 
            "type": "keyword"
          }
        }
      }
    }
  }
}
# 开启 字段 fielddata
PUT my-index-000001/_mapping
{
  "properties": {
    "my_field": { 
      "type":     "text",
      "fielddata": true
    }
  }
}

fields 示例

city.raw 字段是 city 字段的 keyword 版本

city 可以直接被用来进行全文检索

city.raw 可以被用来进行排序和聚合

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "fields": {
          "raw": { 
            "type":  "keyword"
          }
        }
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "city": "New York"
}

PUT my-index-000001/_doc/2
{
  "city": "York"
}

GET my-index-000001/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

ignore_malformed 示例

第一个文档将会有text字段，但不会有number_one字段

第二个文档会被直接拒绝，因为number_two字段值不正确

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "number_one": {
        "type": "integer",
        "ignore_malformed": true
      },
      "number_two": {
        "type": "integer"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}

PUT my-index-000001/_doc/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

index_prefixes 示例

PUT my-index-000001
# 空设置对象代表使用默认值
{
  "mappings": {
    "properties": {
      "body_text": {
        "type": "text",
        "index_prefixes": { }    
      }
    }
  }
}

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "full_name": {
        "type": "text",
        "index_prefixes": {
          "min_chars" : 1,
          "max_chars" : 10
        }
      }
    }
  }
}

null_value 示例

替换null值为"NULL"

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type":       "keyword",
        "null_value": "NULL" 
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "status_code": null
}

PUT my-index-000001/_doc/2
{
  "status_code": [] 
}

GET my-index-000001/_search
{
  "query": {
    "term": {
      "status_code": "NULL" 
    }
  }
}

properties 示例

PUT my-index-000001
{
  "mappings": {
    "properties": { 
      "manager": {
        "properties": { 
          "age":  { "type": "integer" },
          "name": { "type": "text"  }
        }
      },
      "employees": {
        "type": "nested",
        "properties": { 
          "age":  { "type": "integer" },
          "name": { "type": "text"  }
        }
      }
    }
  }
}

PUT my-index-000001/_doc/1 
{
  "region": "US",
  "manager": {
    "name": "Alice White",
    "age": 30
  },
  "employees": [
    {
      "name": "John Smith",
      "age": 34
    },
    {
      "name": "Peter Brown",
      "age": 26
    }
  ]
}

similarity 示例

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "default_field": { 
        "type": "text"
      },
      "boolean_sim_field": {
        "type": "text",
        "similarity": "boolean" 
      }
    }
  }
}

store 示例

显式的存储某些field的值是必须的：当_source被disabled的时候，或者你并不想从source中parser来得到 field的值（即使这个过程是自动的）。请记住：从每一个stored field中获取值都需要一次磁盘io，如果想获取多个field的值，就需要多次磁盘io，但是，如果从_source中获取多个field的值，则只需要一次磁盘io，因为_source只是一个字段而已。所以在大多数情况下，从_source中获取是快速而高效的。

es中默认的设置_source是enable的

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "store": true 
      },
      "date": {
        "type": "date",
        "store": true 
      },
      "content": {
        "type": "text"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

GET my-index-000001/_search
{
  "stored_fields": [ "title", "date" ] 
}

posted @ 2021-10-14 16:42 衰草寒烟阅读(230) 评论(0) 编辑收藏举报

刷新页面返回顶部

衰草寒烟

Elasticsearch 索引mapping的参数

Elasticsearch(7.15) mapping 提供的参数

公告