随笔- 22 文章- 11 评论- 1 阅读- 29853

ES的字段类型梳理

概述

es的字段都有一个字段类型，不同的类型都各有所长，比如keyword类型的字段适合做聚合和排序，而text的类型可以用来全文搜索。下面按大类介绍下es常用的数据类型,es的数据字段的类型定义和搜索的方式紧密相关，
例如 keyword类型，Number类型在搜索时，只适合精准匹配，范围搜索之类的，不能用于全文搜索。
而text类型适合全文搜索。

字段类型

总结版本：ES7.8，Elasticsearch文档中的字段支持多种不同的数据类型

Core data types(核心数据类型)

string(字符串)
text and keyword

Numeric(数字)
long, integer, short, byte, double, float, half_float, scaled_float

Date(日期)
date

Date nanoseconds(日期纳秒)
date_nanos

Boolean(布尔)
boolean

Binary(二进制)
binary

Range(范围)
integer_range, float_range, long_range, double_range, date_range, ip_range

Complex data types(复杂数据类型)

Object（对象）
object for single JSON objects

Nested（嵌套类型，嵌套JSON对象数组）
nested for arrays of JSON objects

Geo data types（地理数据类型）

Geo-point(表示经纬度点)
geo_point for lat/lon points

Geo-shape(用于复杂形状，如多边形)
geo_shape for complex shapes like polygons

Specialised data types(专用数据类型)

IP
ip for IPv4 and IPv6 addresses
ip表示IPv4地址和IPv6地址

Completion data type（完成数据类型）
completion to provide auto-complete suggestions
提供自动完成建议

Token count(令牌计数)
token_count to count the number of tokens in a string
计算字符串中记号的数量

mapper-murmur3
murmur3 to compute hashes of values at index-time and store them in the index
在索引时计算值的哈希值，并将它们存储在索引中

mapper-annotated-text
annotated-text to index text containing special markup (typically used for identifying named entities)
带注释的文本到包含特殊标记的索引文本(通常用于标识命名实体)过滤器

Percolator（过滤器）
Accepts queries from the query-dsl 
接受来自query-dsl的查询

Join
Defines parent/child relation for documents within the same index
为同一索引内的文档定义父/子关系

Rank feature（排序功能）
Record numeric feature to boost hits at query time.
记录数字特性以在查询时提高命中率。

Rank features（等级特性）
Record numeric features to boost hits at query time.
记录数字特性以在查询时提高命中率。

Dense vector（密集的向量）
Record dense vectors of float values.
记录浮点值的密集向量。

Sparse vector（稀疏的向量）
Record sparse vectors of float values.
记录浮点值的稀疏向量。

Search-as-you-type
A text-like field optimized for queries to implement as-you-type completion
一个类似文本的字段，针对查询进行了优化，以实现按类型完成

Alias
Defines an alias to an existing field.
定义现有字段的别名。

Flattened
Allows an entire JSON object to be indexed as a single field.
允许将整个JSON对象索引为单个字段。

Shape
shape for arbitrary cartesian geometries.
任意笛卡尔几何形状。

Histogram（柱状图）
histogram for pre-aggregated numerical values for percentiles aggregations.
百分比聚合的预聚合数值的直方图。

Constant keyword （常数的关键字）
Specialization of keyword for the case when all documents have the same value.
针对所有文档具有相同值的情况，对关键字进行专门化。

Arrays(数组)

在Elasticsearch中，数组不需要专用的字段数据类型。默认情况下，任何字段都可以包含零个或多个值，但是数组中的所有值必须具有相同的数据类型。

Multi-fields（多类型）

为不同的目的以不同的方式索引同一字段通常是有用的。例如，字符串字段可以映射为全文搜索的文本字段，也可以映射为排序或聚合的关键字字段。或者，您可以使用标准分析器、英语分析器和法语分析器为文本字段建立索引。这就是多字段的目的。大多数数据类型通过fields参数支持多字段。

实战

1.1 binary

二进制类型 ,值以base64字符串的形式存贮，_source默认不会存贮该类型的值，如果需要实际的存贮，请设置 store属性为true默认是false
type:binary

PUT my-index-000003
{
  "mappings": {
    "properties": {
      
      "blob": {
        "type": "binary",
        "store": true
      }
    }
  }
}

PUT my-index-000003/_doc/2
{
  "blob": "U29tZSBiaW5hcnkgYmxvYg==" 
}

1.2 boolean

布尔类型
type:boolean
true ：true,“true"都可以表示 true
false: false,“false”,”" 都可以表示 false

PUT my-index-000003
{
  "mappings": {
    "properties": {
      
      "boolean_field": {
        "type": "boolean"
      }
    }
  }
}

PUT my-index-000003/_doc/2
{
  "boolean_field": "false" 
}

1.3 Keywords

keywords大家庭包括 keyword,constant_keyword,widecard

keyword类型的字段适合聚合（aggragate）和排序(sort)操作，term和term_level查询比较快,如果字段的值都是数字，但term查询比较多，可以考虑定义keyword类型，而不是interger或者其他数字类型，

constant_keyword:所有文档的该字段的值都是一样的，可以定义该类型

widecard:适合通配符搜索日志等场景，text类型不支持通配符搜索，但widecard 在聚合和排序的性能会低于其他keyword类型，而且如果前导词是通配符，搜索效率相对会比较慢。

PUT my-index-000003/_mapping
{
  
    "properties": {
      "my_keyword":{
        "type": "keyword"
      },
      "my_constant_keyword":{
        "type": "constant_keyword",
        "value":"constant"
      },
      "my_wildcard": {
        "type": "wildcard"
      }
    }
  
}

PUT my-index-000003/_doc/1
{
  "my_keyword":"keyword",
  "my_wildcard" : "This string can be quite lengthy"
}

GET my-index-000003/_search
{
  "query": {
    "wildcard": {
      "my_wildcard": {
        "value": "*quite*"
      }
    }
  }
}

1.4 Numeric

数字类型：包括整数和浮点数类型，有 short,interger,long unsigned_long,
float,double,half_float,scaled_float（底层是以long类型存贮的，需要配置scaled_factor ，浮点数*scaled_factor，比如价格一般是2位小数，如果scaled_factor配置为100 ，某个文档的该类型字段的值为 98.99，那么底层存贮 9899，相对于直接存double，节约了存贮空间）,不是该字段存的是数字类型就一定要定义数字类型，数字类型在range查询，和一些数值计算查询相对会快，但是如果是term查询不如keyword类型。

PUT /my-index-000003/_mapping
{
  
    "properties": {
      "number_of_bytes": {
        "type": "integer"
      },
      "time_in_seconds": {
        "type": "float"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      }
    }
  
}

1.5 Date

日期类型包括 date和date_nanos.类型，date类型可以通过format属性指定字段值的日期格式， || 连接多种格式
如果没有显示的指定format的属性值，默认为"strict_date_optional_time||epoch_millis""strict_date_optional_time"为常规的iso日期时间转换器，格式“yyyy-MM-dd ‘T’ HH:mm:ss.SSSZ” or “yyyy-MM-dd”，es默认内置了很多日期格式，可以翻阅官方文档查看

PUT my-index-000003/_mapping
{
  
    "properties": {
      "create_time":{
        "type": "date",
        "format":"yyyy-MM-dd HH:mm:ss"
      }
    }
  
}

POST my-index-000003/_doc
{
  "create_time":"2021-05-15 23:14:55"
  
}

1.6 alias

为已存在的字段定义别名，别名可用于搜索api和field capabilities 。目前不适合写入api,比如添加或者更新等操作。

type : “alias”， path为当前目标字段的路径，如果该字段是有父类对象，
则path为 parentObject1.parentObject2.target_field

例如：

PUT /my-index-000003
{
  "mappings": {
    "properties": {
      "user_name":{
        "type": "text"
      },
      "name":{
        "type": "alias",
        "path" :"user_name"
      },
      "friends":{
        "type": "nested",
        "properties": {
          "friends_name":{
            "type":"text"
          },
          "f_name":{
            "type": "alias",
            "path":"friends.friends_name"
          }
        }
        
      }
      
    }
  }
}


POST /my-index-000003/_doc/1
{
  "user_name":"ly",
  "frends":{
    "friends_name":"luck"
  }
}

GET /my-index-000003/_search?pretty
{
  "query": {
    "match": {
      "name": "ly"
    }
  }
}

1.7 ip

ip字段可以索引/存储IPv4或IPv6地址

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "ip_addr": {
        "type": "ip"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "ip_addr": "192.168.1.1"
}

GET my-index-000001/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}

1.8 object

对象类型：字段值是一个json对象

POST customer/_doc/3
{
  "name":"孙七",
  "state":1,
  "create_time":"2021-03-14 18:00:00",
  "other":{
    "age":27,
    "address":"beijin",
    "phone":"10010"
  }
}

不显示指定字段类型，直接往里添加文档也是可以的，es默认会帮我们创建合适的类型，其他对象类型还有flattened,nested,join在object的基础上，提供了自己的一些特性，弥补了object的不足

1.9 Text search types

text是全文搜索类型，在建立索引和搜索时，输入的文本会经过分析器处理（Analyzed），分析器会对其进行过滤，分词，转换等操作。es内置了多种分析器，而且我们可以定义分析器。
只有text类型才能用于match，match_phrase等搜索语句，

PUT /my-index-000003
{
  "mappings": {
    "properties": {
      "user_name":{
        "type": "text"
      }
      
    }
  }
}

POST /my-index-000003/_doc
{
  "user_name":"jack liu"
}

GET /my-index-000003/_search
{
  "query": {
    "match": {
      "user_name": "jack"
    }
  }
}

1.10 Arrays 数组类型

在Elasticsearch中，没有专用的数组数据类型。默认情况下，任何字段都可以包含零个或多个值，但是数组中的所有值必须具有相同的数据类型。例如:

an array of strings: [ "one", "two" ]
an array of integers: [ 1, 2 ]
an array of arrays: [ 1, [ 2, 3 ]] which is the equivalent of [ 1, 2, 3 ]
an array of objects: [ { "name": "Mary", "age": 12 }, { "name": "John", "age": 10 }]

关于数组类型的数据有一个警告

对象数组
对象数组不像您期望的那样工作:您不能独立于数组中的其他对象查询每个对象。如果您需要能够做到这一点，那么您应该使用嵌套数据类型而不是对象数据类型。

PUT my-index-000001/_doc/1
{
  "message": "some arrays in this document...",
  "tags":  [ "elasticsearch", "wow" ], 
  "lists": [ 
    {
      "name": "prog_list",
      "description": "programming list"
    },
    {
      "name": "cool_list",
      "description": "cool stuff list"
    }
  ]
}

PUT my-index-000001/_doc/2 
{
  "message": "no arrays in this document...",
  "tags":  "elasticsearch",
  "lists": {
    "name": "prog_list",
    "description": "programming list"
  }
}

GET my-index-000001/_search
{
  "query": {
    "match": {
      "tags": "elasticsearch" 
    }
  }
}

posted @ 2023-06-07 15:34 忘崽牛仔阅读(2605) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

您的浏览器不兼容canvas

昵称：忘崽牛仔
园龄： 3年7个月
粉丝： 1
关注： 5

+加关注

2025年3月

日

一

二

三

四

五

六

随笔分类 (11)

随笔档案 (20)

文章分类 (6)

相册 (9)

梦想庄园(9)

蓝闪博客

等到天蓝再看海