Elasticsearch 之(11)mapping引入
核心的数据类型
string
byte,short,integer,long
float,double
boolean
date
dynamic mapping (动态生成mapping)
true or false --> boolean
123 --> long
123.45 --> double
2017-01-01 --> date
"hello world" --> string/text
查看mapping
GET /index/_mapping/type
如何建立索引
analyzed
not_analyzed
no
修改mapping
只能创建index时手动建立mapping,或者新增field mapping,但是不能update field mapping
创建mapping
PUT /website { "mappings": { "article": { "properties": { "author_id": { "type": "long" }, "title": { "type": "text", "analyzer": "english" }, "content": { "type": "text" }, "post_date": { "type": "date" }, "publisher_id": { "type": "text", "index": "not_analyzed" } } } } }只能新增不能修改,即创建后,无法修改
PUT /website { "mappings": { "article": { "properties": { "author_id": { "type": "text" } } } } } { "error": { "root_cause": [ { "type": "index_already_exists_exception", "reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists", "index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ", "index": "website" } ], "type": "index_already_exists_exception", "reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists", "index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ", "index": "website" }, "status": 400 } PUT /website/_mapping/article { "properties" : { "new_field" : { "type" : "string", "index": "not_analyzed" } } }
测试mapping
GET /website/_analyze { "field": "content", "text": "my-dogs" } GET website/_analyze { "field": "new_field", "text": "my dogs" } { "error": { "root_cause": [ { "type": "remote_transport_exception", "reason": "[4onsTYV][127.0.0.1:9300][indices:admin/analyze[s]]" } ], "type": "illegal_argument_exception", "reason": "Can't process field [new_field], Analysis requests are only supported on tokenized fields" }, "status": 400 }
multivalue field
{ "tags": [ "tag1", "tag2" ]}
建立索引时与string是一样的,数据类型不能混
empty field
null,[],[null]
object field
PUT /company/employee/1
{
"address": {
"country": "china",
"province": "guangdong",
"city": "guangzhou"
},
"name": "jack",
"age": 27,
"join_date": "2017-01-01"
}
address:object类型
{ "company": { "mappings": { "employee": { "properties": { "address": { "properties": { "city": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "country": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "province": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "age": { "type": "long" }, "join_date": { "type": "date" }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } }
存储格式
{ "address": { "country": "china", "province": "guangdong", "city": "guangzhou" }, "name": "jack", "age": 27, "join_date": "2017-01-01" } { "name": [jack], "age": [27], "join_date": [2017-01-01], "address.country": [china], "address.province": [guangdong], "address.city": [guangzhou] } { "authors": [ { "age": 26, "name": "Jack White"}, { "age": 55, "name": "Tom Jones"}, { "age": 39, "name": "Kitty Smith"} ] } { "authors.age": [26, 55, 39], "authors.name": [jack, white, tom, jones, kitty, smith] }
root object
就是某个type对应的mapping json,包括了properties,metadata(_id,_source,_type),settings(analyzer),其他settings(比如include_in_all)
PUT /my_index { "mappings": { "my_type": { "properties": {} } } }
properties
type,index,analyzer
PUT /my_index/_mapping/my_type { "properties": { "title": { "type": "text", "index": true, "analyzer": "standard" } } }
_source
(1)查询的时候,直接可以拿到完整的document,不需要先拿document id,再发送一次请求拿document
(2)partial update基于_source实现
(3)reindex时,直接基于_source实现,不需要从数据库(或者其他外部存储)查询数据再修改
(4)可以基于_source定制返回field
(5)debug query更容易,因为可以直接看到_source
如果不需要上述,可以禁用_source
PUT /my_index/_mapping/my_type2 { "_source": {"enabled": false} }_all
将所有field打包在一起,作为一个_all field,建立索引。没指定任何field进行搜索时,就是使用_all field在搜索。
PUT /my_index/_mapping/my_type3 { "_all": {"enabled": false} }
也可以在field级别设置include_in_all field,设置是否要将field的值包含在_all field中
PUT /my_index/_mapping/my_type4 { "properties": { "my_field": { "type": "text", "include_in_all": false } } }
标识性metadata
_index,_type,_id
定制dynamic策略
true:遇到陌生字段,就进行dynamic mapping
false:遇到陌生字段,就忽略
strict:遇到陌生字段,就报错
PUT /my_index { "mappings": { "my_type": { "dynamic": "strict", "properties": { "title": { "type": "text" }, "address": { "type": "object", "dynamic": "true" } } } } } PUT /my_index/my_type/1 { "title": "my article", "content": "this is my article", "address": { "province": "guangdong", "city": "guangzhou" } } { "error": { "root_cause": [ { "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed" } ], "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed" }, "status": 400 } PUT /my_index/my_type/1 { "title": "my article", "address": { "province": "guangdong", "city": "guangzhou" } } GET /my_index/_mapping/my_type { "my_index": { "mappings": { "my_type": { "dynamic": "strict", "properties": { "address": { "dynamic": "true", "properties": { "city": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "province": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "title": { "type": "text" } } } } } }
定制dynamic mapping策略
(1)date_detection
默认会按照一定格式识别date,比如yyyy-MM-dd。但是如果某个field先过来一个2017-01-01的值,就会被自动dynamic mapping成date,后面如果再来一个"hello world"之类的值,就会报错。可以手动关闭某个type的date_detection,如果有需要,自己手动指定某个field为date类型。
PUT /my_index/_mapping/my_type { "date_detection": false }
(2)定制自己的dynamic mapping template(type level)
PUT /my_index { "mappings": { "my_type": { "dynamic_templates": [ { "en": { "match": "*_en", "match_mapping_type": "string", "mapping": { "type": "string", "analyzer": "english" } }} ] }}} PUT /my_index/my_type/1 { "title": "this is my first article" } PUT /my_index/my_type/2 { "title_en": "this is my first article" }
title没有匹配到任何的dynamic模板,默认就是standard分词器,不会过滤停用词,is会进入倒排索引,用is来搜索是可以搜索到的
title_en匹配到了dynamic模板,就是english分词器,会过滤停用词,is这种停用词就会被过滤掉,用is来搜索就搜索不到了
(3)定制自己的default mapping template(index level)
PUT /my_index { "mappings": { "_default_": { "_all": { "enabled": false } }, "blog": { "_all": { "enabled": true } } } }