Elasticsearch 索引mapping的参数
Elasticsearch(7.15) mapping 提供的参数
[官方文档]([Mapping parameters | Elasticsearch Guide 7.15] | Elastic)
参数 | 描述 | 说明 |
---|---|---|
analyzer | 指定一个用来文本分析的索引或者搜索text字段的分析器 应用于索引以及查询,除非使用search_analyzer参数覆盖此参数 |
只有text类型字段支持该参数 |
boost | 查询时,依赖相关性得分字段自动提升 | 增强仅适用于术语查询(前缀,范围和模糊查询不会被提升) |
coerce | 强制尝试清除脏值以适合字段的数据类型 | 字符串将被强制转换为数字 浮点将被截断为整数值 |
copy_to | 参数允许您创建自定义 _all 字段。换句话说,可以将多个字段的值复制到组字段中,然后可以将其作为单个字段进行查询 | _source不会修改 可以将相同的值复制到多个字段 |
doc_values | ES倒排索引的检索性能是非常快的,但是在字段值排序时却不是理想的结构。Doc Values 本质上是一个序列化的 列式存储,这个结构非常适用于聚合、排序、脚本等操作。 |
Doc Values 默认对所有字段启用,除了 analyzed strings。也就是说所有的数字、地理坐标、日期、IP 和不分析(not_analyzed)字符类型都会默认开启。 虽然Doc Values非常好用,但是如果你存储的数据确实不需要这个特性,就不如禁用他,这样不仅节省磁盘空间,也许会提升索引的速度。 |
dynamic | 只需索引包含新字段的文档,即可将字段动态添加到文档或文档中的内部对象 | true:新检测到的字段将添加到映射中。(默认) false:新检测到的字段将被忽略。这些字段不会被编入索引,因此无法搜索,但仍会出现在 _source 返回的匹配字段中。这些字段不会添加到映射中,必须显式添加新字段。strict:如果检测到新字段,则抛出异常并拒绝该文档。 |
eager_global_ordinals | 是否总是使用全局序号 | |
enabled | 只想存储字段而不对其进行索引 | enabled 设置只能应用于映射类型和 object 字段,导致Elasticsearch完全跳过对字段内容的解析。仍然可以从_source 字段中检索JSON ,但它不可搜索或以任何其他方式存储 |
fielddata | text字段是否加载到内存中做聚合排序等操作 | 默认text字段禁用Fielddata,可以被搜索,但不能被用于聚合、排序或者脚本编写 字段设置fielddata=true,以便通过取消反转索引将fielddata加载到内存中。请注意,这可能会占用大量内存。 |
fields | 经常被用来索引同一个字段的不同方式 | 比如,一个string字段可以被映射为text字段作为全文检索,也可以作为keyword用来排序和聚合。 |
format | 字段格式 | 主要是date类型数据一个json文档,日期表示为字符串,ES使用一个预配的格式解析该字符串为一个long的毫秒值 |
ignore_above | 长度超过设置将不被存储和索引 | 对于数组类型作用于数组每一个元素 长度单位字符 |
ignore_malformed | 是否忽略数据错误字段 | 默认false,错误数据会抛出异常 设置了忽略后,错误字段不会被存储,而其他字段将会被正常存储 |
index_options | 控制重点突出以及高亮 | docs:仅索引文档编号 freqs:索引文档编号以及术语频率 positions(default):索引文档编号以及术语频率和术语位置(顺序) offsets:索引文档编号以及术语频率和术语位置以及开始和结束字符偏移量 参数仅适用于text字段 |
index_phrases | If enabled, two-term word combinations (shingles) are indexed into a separate field. This allows exact phrase queries (no slop) to run more efficiently, at the expense of a larger index. Note that this works best when stopwords are not removed, as phrases containing stopwords will not use the subsidiary field and will fall back to a standard phrase query | true/false(默认) |
index_prefixes | 启用术语前缀索引加速查询 | min_chars:要索引的最小前缀长度,必须大于0,默认2 max_chars:要索引的最大前缀长度,必须小于20,默认5 |
index | 字段是否可以被索引和查询 | 默认true; 当false时,字段可存储不可索引不可查询 |
meta | 元数据 | 附加到字段上的元数据,对ES不透明 元数据最多5个,键长度小于20,值长度小于50 |
normalizer | 该参数对于keyword字段与analyzer相似 | 它只保证分析链生成单个标记 |
norms | Norms store various normalization factors that are later used at query time in order to compute the score of a document relatively to a query | true/false |
null_value | 显式替换null值 | null值或者空数组等无法存储和索引 |
position_increment_gap | When indexing text fields with multiple values a "fake" gap is added between the values to prevent most phrase queries from matching across the values. The size of this gap is configured using position_increment_gap |
默认100 |
properties | 嵌套子字段属性 | |
search_analyzer | 指定查询分析器 | 可以覆盖analyzer配置 |
similarity | 字段配置评分算法或相似性算法 | BM25(默认) classic boolean |
store | 是否存储 | 默认情况下,字段值会编制索引以使其可搜索,但不会存储。这意味着可以查询字段,但无法检索原始字段值 |
term_vector | Term vectors contain information about the terms produced by the analysis process | a list of terms. the position (or order) of each term. the start and end character offsets mapping the term to its origin in the original string. payloads (if they are available) — user-defined binary data associated with each term position. |
analyzer 示例
- An
analyzer
setting for indexing all terms including stop words- A
search_analyzer
setting for non-phrase queries that will remove stop words- A
search_quote_analyzer
setting for phrase queries that will not remove stop words
PUT my-index-000001
{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"lowercase"
]
},
"my_stop_analyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"lowercase",
"english_stop"
]
}
},
"filter":{
"english_stop":{
"type":"stop",
"stopwords":"_english_"
}
}
}
},
"mappings":{
"properties":{
"title": {
"type":"text",
"analyzer":"my_analyzer",
"search_analyzer":"my_stop_analyzer",
"search_quote_analyzer":"my_analyzer"
}
}
}
}
PUT my-index-000001/_doc/1
{
"title":"The Quick Brown Fox"
}
PUT my-index-000001/_doc/2
{
"title":"A Quick Brown Fox"
}
GET my-index-000001/_search
{
"query":{
"query_string":{
"query":"\"the quick brown fox\""
}
}
}
boost 示例
匹配到
title
字段相比较content
字段将会是两倍权重,其他字段默认boost=1.0
PUT my-index-000001
{
"mappings": {
"properties": {
"title": {
"type": "text",
"boost": 2
},
"content": {
"type": "text"
}
}
}
}
coerce 示例
该
number_one
字段将包含整数10
number_two
将拒绝文档
PUT my-index-000001
{
"mappings": {
"properties": {
"number_one": {
"type": "integer"
},
"number_two": {
"type": "integer",
"coerce": false
}
}
}
}
PUT my-index-000001/_doc/1
{
"number_one": "10"
}
PUT my-index-000001/_doc/2
{
"number_two": "10"
}
copy_to 示例
first_name
和last_name
字段 的值将复制到full_name
字段中
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/_doc/1
{
"first_name": "John",
"last_name": "Smith"
}
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
eager_global_ordinals 示例
PUT my-index-000001/_mapping
{
"properties": {
"tags": {
"type": "keyword",
"eager_global_ordinals": true
}
}
}
enabled 示例
假设您使用Elasticsearch作为Web会话存储。您可能希望索引会话ID和上次更新时间,但不需要在会话数据本身上查询或运行聚合。
PUT my-index-000001
{
"mappings": {
"properties": {
"user_id": {
"type": "keyword"
},
"last_updated": {
"type": "date"
},
"session_data": {
"type": "object",
"enabled": false
}
}
}
}
PUT my-index-000001/_doc/session_1
{
"user_id": "kimchy",
"session_data": {
"arbitrary_object": {
"some_array": [ "foo", "bar", { "baz": 2 } ]
}
},
"last_updated": "2015-12-06T18:20:22"
}
PUT my-index-000001/_doc/session_2
{
"user_id": "jpountz",
"session_data": "none",
"last_updated": "2015-12-06T18:22:13"
}
fielddata 示例
my_field
字段用来搜索
my_field.keyword
字段用来聚合、排序和写脚本
PUT my-index-000001
{
"mappings": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
# 开启 字段 fielddata
PUT my-index-000001/_mapping
{
"properties": {
"my_field": {
"type": "text",
"fielddata": true
}
}
}
fields 示例
city.raw
字段是city
字段的keyword
版本
city
可以直接被用来进行全文检索
city.raw
可以被用来进行排序和聚合
PUT my-index-000001
{
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"city": "New York"
}
PUT my-index-000001/_doc/2
{
"city": "York"
}
GET my-index-000001/_search
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}
ignore_malformed 示例
第一个文档将会有text字段,但不会有number_one字段
第二个文档会被直接拒绝,因为number_two字段值不正确
PUT my-index-000001
{
"mappings": {
"properties": {
"number_one": {
"type": "integer",
"ignore_malformed": true
},
"number_two": {
"type": "integer"
}
}
}
}
PUT my-index-000001/_doc/1
{
"text": "Some text value",
"number_one": "foo"
}
PUT my-index-000001/_doc/2
{
"text": "Some text value",
"number_two": "foo"
}
index_prefixes 示例
PUT my-index-000001
# 空设置对象代表使用默认值
{
"mappings": {
"properties": {
"body_text": {
"type": "text",
"index_prefixes": { }
}
}
}
}
PUT my-index-000001
{
"mappings": {
"properties": {
"full_name": {
"type": "text",
"index_prefixes": {
"min_chars" : 1,
"max_chars" : 10
}
}
}
}
}
null_value 示例
替换null值为"NULL"
PUT my-index-000001
{
"mappings": {
"properties": {
"status_code": {
"type": "keyword",
"null_value": "NULL"
}
}
}
}
PUT my-index-000001/_doc/1
{
"status_code": null
}
PUT my-index-000001/_doc/2
{
"status_code": []
}
GET my-index-000001/_search
{
"query": {
"term": {
"status_code": "NULL"
}
}
}
properties 示例
PUT my-index-000001
{
"mappings": {
"properties": {
"manager": {
"properties": {
"age": { "type": "integer" },
"name": { "type": "text" }
}
},
"employees": {
"type": "nested",
"properties": {
"age": { "type": "integer" },
"name": { "type": "text" }
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"region": "US",
"manager": {
"name": "Alice White",
"age": 30
},
"employees": [
{
"name": "John Smith",
"age": 34
},
{
"name": "Peter Brown",
"age": 26
}
]
}
similarity 示例
PUT my-index-000001
{
"mappings": {
"properties": {
"default_field": {
"type": "text"
},
"boolean_sim_field": {
"type": "text",
"similarity": "boolean"
}
}
}
}
store 示例
- 显式的存储某些field的值是必须的:当_source被disabled的时候,或者你并不想从source中parser来得到 field的值(即使这个过程是自动的)。请记住:从每一个stored field中获取值都需要一次磁盘io,如果想获取多个field的值,就需要多次磁盘io,但是,如果从_source中获取多个field的值,则只 需要一次磁盘io,因为_source只是一个字段而已。所以在大多数情况下,从_source中获取是快速而高效的。
- es中默认的设置_source是enable的
PUT my-index-000001
{
"mappings": {
"properties": {
"title": {
"type": "text",
"store": true
},
"date": {
"type": "date",
"store": true
},
"content": {
"type": "text"
}
}
}
}
PUT my-index-000001/_doc/1
{
"title": "Some short title",
"date": "2015-01-01",
"content": "A very long content field..."
}
GET my-index-000001/_search
{
"stored_fields": [ "title", "date" ]
}