Elasticsearch 基本操作
1、创建索引
1.1、使用缺省配置创建索引(5个分片,1个副本)
PUT test
索引名称test必须小写
1.2、指定分片和副本:
PUT mytest { "settings": { "number_of_shards": 3, "number_of_replicas": 1 } }
2、查看索引
2.1、查看基本信息:
GET mytest
只返回配置信息:
GET mytest/_settings
2.2、查看多个索引:
GET bus,home,blog,mytest/_settings
GET bus,home,blog,mytest
3、删除索引
DELETE mytest
4、关闭和打开索引
关闭: POST mytest/_close
打开:
POST mytest/_open
关闭索引后不能更新索引和查询索引内容,否则会抛出错误
{ "error": { "root_cause": [ { "type": "index_closed_exception", "reason": "closed", "index_uuid": "9LpmSP7mR3KlXXZ1oD-YFw", "index": "mytest" } ], "type": "index_closed_exception", "reason": "closed", "index_uuid": "9LpmSP7mR3KlXXZ1oD-YFw", "index": "mytest" }, "status": 400 }
5、查看集群索引和健康度
5.1、查看某几个的状态:
查看索引bus,home,blog,mytest四个的状态 GET /_cat/indices/bus,home,blog,mytest?v
查看bus开头的索引
GET /_cat/indices/bus*?v
5.2、查看所有索引:
GET _cat/indices?v
5.3、查看集群健康度:
GET /_cat/health?v
6、文档基本操作
文档格式:
index/type/id
6.1、添加文档:
PUT /bus/product/1 { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 2, "producer" : "东部公交", "tags": [ "空调", "普通","单层"] }
或者:
POST /bus/product/5 { "name" : "机场大巴A2线", "desc" : "机场到B酒店来回", "price" : 25, "producer" : "机场大巴", "tags": [ "单层", "空调","大巴"] }
假设索引id不存在就创建数据(put-if-absent),如果id存在则创建失败
PUT twitter/_doc/1?op_type=create { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" } PUT twitter/_doc/1/_create { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
创建失败: { "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[product][1]: version conflict, document already exists (current version [7])", "index_uuid": "G4DrNdPhRWK_rBuEaluwsA", "shard": "2", "index": "bus" } ], "type": "version_conflict_engine_exception", "reason": "[product][1]: version conflict, document already exists (current version [7])", "index_uuid": "G4DrNdPhRWK_rBuEaluwsA", "shard": "2", "index": "bus" }, "status": 409 }
设置写入数据的超时时间,缺省是1分钟
超时时间为5分钟 PUT twitter/_doc/1?timeout=5m { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
6.2、获取文档:
GET bus/product/1 返回: { "_index" : "bus", "_type" : "product", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 2, "producer" : "东部公交", "tags" : [ "空调", "普通", "单层" ] } }
指定source返回内容:
GET bus/product/122?_source=name,price
{ "_index" : "bus", "_type" : "product", "_id" : "122", "_version" : 1, "found" : true, "_source" : { "price" : 5, "name" : "公交车1路" } }
不返回source
GET bus/product/122?_source=false
只返回source
GET bus/product/122/_source
判断文档是否存在
HEAD bus/product/1
关闭_source字段内容或指定内容
GET twitter/_doc/0?_source=false
GET twitter/_doc/0?_source_include=*.id&_source_exclude=entities
GET twitter/_doc/0?_source=*.id,retweeted
取回的数据,取决于stored_fields参数
建立索引,counter不储存数据 PUT twitter { "mappings": { "_doc": { "properties": { "counter": { "type": "integer", "store": false }, "tags": { "type": "keyword", "store": true } } } } } 添加数据 PUT twitter/_doc/1 { "counter" : 1, "tags" : ["red"] } 取回tags和counter数据 GET twitter/_doc/1?stored_fields=tags,counter 返回结果里只有tags有数据 { "_index": "twitter", "_type": "_doc", "_id": "1", "_version": 1, "found": true, "fields": { "tags": [ "red" ] } }
6.3、获取多个文档:
返回id为1和2的文档 GET bus/product/_mget { "ids":[1,2] }
查询的document是不同index:
GET /_mget { "docs":[ { "_index":"bus", "_type":"product", "_id":1 }, { "_index":"mytest", "_type":"product", "_id":1 } ] }
6.4、替换文档:全部更新
PUT /bus/product/1 { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 5, "producer" : "东部公交", "tags": [ "空调", "普通","单层"] } GET /bus/product/1 返回: { "_index" : "bus", "_type" : "product", "_id" : "1", "_version" : 2, "found" : true, "_source" : { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 5, "producer" : "东部公交", "tags" : [ "空调", "普通", "单层" ] } }
或者用POST
根据版本进行更新,如果版本号变化则更新失败。
PUT bus/product/1?version=5 { "name":"公交车5路(version5)" } { "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[product][1]: version conflict, current version [7] is different than the one provided [5]", "index_uuid": "G4DrNdPhRWK_rBuEaluwsA", "shard": "2", "index": "bus" } ], "type": "version_conflict_engine_exception", "reason": "[product][1]: version conflict, current version [7] is different than the one provided [5]", "index_uuid": "G4DrNdPhRWK_rBuEaluwsA", "shard": "2", "index": "bus" }, "status": 409 }
6.5、更新文档:部分更新
POST /bus/product/1/_update { "doc": { "price": 10 } }
GET /bus/product/1 返回: { "_index" : "bus", "_type" : "product", "_id" : "1", "_version" : 4, "found" : true, "_source" : { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 10, "producer" : "东部公交", "tags" : [ "空调", "普通", "单层" ] } }
6.6、删除文档:
DELETE /bus/product/1
然后再查询:
GET /bus/product/1 { "_index" : "bus", "_type" : "product", "_id" : "1", "found" : false }
在删除文档时,可以指定版本,以确保我们试图删除的相关文档实际上正在被删除,同时它没有改变。对文档执行的每个写操作(包括删除)都会导致其版本增加。
DELETE bus/product/100?version=6
根据检索条件删除,慎用,非常容易误删除
POST twitter/_delete_by_query { "query": { "match": { "message": "some message" } } }
7、检索文档
7.1 检索所有文档
GET bus/product/_search
7.2 term检索
term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个,如果没有安装分词插件,汉字分词按每个汉字来分。
查询不到内容: GET bus/product/_search { "query": { "term": { "producer": "公交" } } }
producer中所有带“公”的文档都会被查询出来 GET bus/product/_search { "query": { "term": { "producer": "公" } } }
7.3 match检索
match查询会先对搜索词进行分词,分词完毕后再逐个对分词结果进行匹配,因此相比于term的精确搜索,match是分词匹配搜索
描述中带有机场酒店四个字的各种组合的文档都会被返回 GET bus/product/_search { "query": { "match": { "desc": "机场酒店" } } }
7.4 分页
GET bus/_search { "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } }
GET bus/_search
{
"from": 0,
"size": 5,
"query": {
"match_all": {}
}
}
7.5 过滤字段,类似select a,b from table中a,b
GET bus/_search { "_source": ["name","desc"] , "query": { "match": { "desc": "机场" } } }
result:
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 12,
"max_score" : 2.1208954,
"hits" : [
{
"_index" : "bus",
"_type" : "product",
"_id" : "9",
"_score" : 2.1208954,
"_source" : {
"name" : "机场大巴A2线",
"desc" : "机机场场"
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "10",
"_score" : 2.1208954,
"_source" : {
"name" : "机场大巴A2线",
"desc" : "机机场场"
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "6",
"_score" : 0.62362677,
"_source" : {
"name" : "机场大巴A2线",
"desc" : "机机场场"
}
}
]
}
}
7.6 显示版本
GET bus/_search { "version": true, "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } }
7.7 评分
GET bus/_search { "version": true, "min_score":"2.3", #大于2.3 "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } }
7.8 高亮关键字
GET bus/_search { "version": true, "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } , "highlight": { "fields": { "desc": {} } } }
7.9 短语匹配match_phrase
与match query类似,但用于匹配精确短语,分词后所有词项都要出现在该字段中,字段中的词项顺序要一致。
GET bus/_search { "query": { "match_phrase": { "name": "公交车122" } } }
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.4102418,
"hits" : [
{
"_index" : "bus",
"_type" : "product",
"_id" : "3",
"_score" : 3.4102418,
"_source" : {
"name" : "公交车122路",
"desc" : "从前兴路枢纽到东站",
"price" : 2,
"producer" : "公交集团",
"tags" : [
"单层",
"空调"
]
}
}
]
}
}
对比match
GET bus/_search
{
"query": {
"match": {
"name": "公交车122"
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 5.3417225,
"hits" : [
{
"_index" : "bus",
"_type" : "product",
"_id" : "2",
"_score" : 5.3417225,
"_source" : {
"name" : "公交车5路",
"desc" : "从巫家坝到梁家河",
"price" : 1,
"producer" : "公交集团",
"tags" : [
"双层",
"普通",
"热门"
]
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "3",
"_score" : 3.4102418,
"_source" : {
"name" : "公交车122路",
"desc" : "从前兴路枢纽到东站",
"price" : 2,
"producer" : "公交集团",
"tags" : [
"单层",
"空调"
]
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "1",
"_score" : 2.1597636,
"_source" : {
"name" : "公交车5路",
"desc" : "从巫家坝到梁家河",
"price" : 1,
"producer" : "公交集团",
"tags" : [
"双层",
"普通",
"热门"
]
}
}
]
}
}
7.10 前缀查询match_phrase_prefix
match_phrase_prefix与match_phrase相同,只是它允许在文本中的最后一个词的前缀匹配
GET bus/_search { "query": { "match_phrase_prefix": { "name": "公交车1" } } } { "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 6.8204837, "hits" : [ { "_index" : "bus", "_type" : "product", "_id" : "3", "_score" : 6.8204837, "_source" : { "name" : "公交车122路", "desc" : "从前兴路枢纽到东站", "price" : 2, "producer" : "公交集团", "tags" : [ "单层", "空调" ] } } ] } } 对比: GET bus/_search { "query": { "match_phrase": { "name": "公交车1" } } } { "took" : 0, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } }
7.11 多字段查询multi_match
GET bus/_search { "query": { "multi_match": { "query": "空港", "fields": ["desc","name"] } } } { "took" : 1, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 3.6836727, "hits" : [ { "_index" : "bus", "_type" : "product", "_id" : "16", "_score" : 3.6836727, "_source" : { "name" : "机场大巴A2线", "desc" : "空港", "price" : 21, "producer" : "大巴", "tags" : [ "单层", "空调", "大巴" ] } }, { "_index" : "bus", "_type" : "product", "_id" : "18", "_score" : 3.5525968, "_source" : { "name" : "空港大巴A2线", "desc" : "机场", "price" : 21, "producer" : "大巴", "tags" : [ "单层", "空调", "大巴" ] } }, { "_index" : "bus", "_type" : "product", "_id" : "19", "_score" : 3.1757839, "_source" : { "name" : "空港大巴A2线", "desc" : "空港快线", "price" : 21, "producer" : "大巴", "tags" : [ "单层", "空调", "大巴" ] } } ] } }
8、路由routing
路由机制与其分片机制有着直接的关系。Elasticsearch的路由机制即是通过哈希算法,将具有相同哈希值的文档放置到同一个主分片中。这个和通过哈希算法来进行负载均衡几乎是一样的。
而Elasticsearch也有一个默认的路由算法:它会将文档的ID值作为依据将其哈希到相应的主分片上,这种算法基本上会保持所有数据在所有分片上的一个平均分布,而不会产生数据热点。
可以自定义路由,将数据集中保存,但控制不好会造成某分片压力过大。
PUT mytest/product/4?routing=weapon { "name" : "手枪", "desc" : "增加100点攻击", "price" : 15400, "producer" : "神秘商店", "tags": [ "机械", "穿透" ] } GET mytest/product/4 GET mytest/product/4?routing=weapon
检索中使用routing
GET mytest/_search { "query": { "match": { "_routing": "weapon" } } } GET mytest/_search { "query": { "term": { "_routing": "weapon" } } }
9、mapping
mapping相当于数据表的表结构,建立索引的时候如果不指定mapping,在创建数据的时候,es会自动推断数据类型,属于动态创建mapping结构,也可以手动(静态)创建。
PUT bus { "mappings": { "product":{ "properties": { "name":{"type":"text"}, "desc":{"type":"text"}, "price":{"type":"long"}, "producer":{"type":"text"}, "tags":{"type":"text"} } } } , "settings": { "number_of_replicas": 1 , "number_of_shards": 3 } }
格式化日期字段:
PUT bus4
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
, "mappings": {
"product":{
"properties":{
"name":{"type":"text"},
"updateDate":{
"type":"date",
"format":"yyyy-MM-dd"
}
}
}
}
}
通常,mapping中已经存在的字段不能updated,但是有几种情况是可以例外的:
- Object的数据类型可以新增属性。
- 新的字段可以增加。
- ignore_above可以更新
PUT my_index { "mappings": { "_doc": { "properties": { "name": { "properties": { "first": { "type": "text" } } }, "user_id": { "type": "keyword" } } } } } PUT my_index/_mapping/_doc { "properties": { "name": { "properties": { "last": { "type": "text" } } }, "user_id": { "type": "keyword", "ignore_above": 100 } } }
创建一个新索引,第一个字段name是Object datatype,其下有属性first; 新增一个last字段在name字段下; 将缺省的ignore_above字段设置为100。
在建立静态mapping后,还可以动态再加入类型
直接更新提交一个没有的字段,这个时候memo就是推断类型 POST /bus/product/1/_update { "doc": { "memo": "a test" } } 用GET bus/_mapping查看 { "bus" : { "mappings" : { "product" : { "properties" : { "desc" : { "type" : "text" }, "memo" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "name" : { "type" : "text" }, "price" : { "type" : "long" }, "producer" : { "type" : "text" }, "tags" : { "type" : "text" } } } } } }
10、批量操作
批量操作_bulk,在bulk操作中任意一个操作失败,是不会影响其他的操作的,但是在返回结果里,会告诉你异常日志,
bulk api对json的语法,有严格的要求,每个json串不能换行,只能放一行,同时一个json串和一个json串之间,必须有一个换行
POST /_bulk
{ "delete": { "_index": "home", "_type": "product", "_id": "1" }}
{ "create": { "_index": "home", "_type": "product", "_id": "1" }}
{ "title": "My first post2","memo":"a test2","date":"2018-12-12" }
{ "update": { "_index": "home", "_type": "product", "_id": "2"} }
{ "doc" : {"title" : "My updated post2"} }
{ "delete": { "_index": "home", "_type": "product", "_id": "3" }}
{ "create": { "_index": "home", "_type": "product", "_id": "3" }}
{ "title": "My first post3","memo":"a test23","date":"2018-12-13" }
POST /_bulk
{ "index":{ "_index": "home", "_type": "product" ,"_id":1}}
{ "title":"My post1" ,"memo":"a test1","date":"2018-12-01"}
{ "index":{ "_index": "home", "_type": "product" ,"_id":2}}
{ "title":"My post2" ,"memo":"a test2","date":"2018-12-02"}
{ "index":{ "_index": "home", "_type": "product" ,"_id":3}}
{ "title":"My post3" ,"memo":"a test3","date":"2018-12-03"}
以及:POST /home/product/_bulk 或POST /home/_bulk
11、重建索引reindex
11.1 Reindex不尝试设置目标索引。它不复制源索引的设置。应该在运行_reindex操作之前设置目标索引,包括设置mappings、shard、replica等。
PUT bus_bak
{
"settings": {
"number_of_shards": 1
, "number_of_replicas": 0
}
}
POST _reindex { "source": { "index": "bus" } , "dest": { "index": "bus_bak" } }
11.2 版本设置, 重建后,目标索引的版本缺省是重新计数的,如果需要与源目标相同需要指定版本类型为external.
POST _reindex { "source": { "index": "bus" } , "dest": { "index": "bus_bak", "version_type": "external" } }
11.3 只重建目标索引中没有的文档,如果有id相同的文档将发生冲突错误
POST _reindex { "source": { "index": "bus" } , "dest": { "index": "bus_bak", "op_type": "create" } }
默认情况下,版本冲突会中止_reindex进程,但是可以通过设置"conflicts": "proceed"
来计数冲突,而不中断执行
POST _reindex { "conflicts": "proceed", "source": { "index": "bus" } , "dest": { "index": "bus_bak", "op_type": "create" } }
11.4 根据检索结果重建索引
POST _reindex { "source": { "index": "bus", "type": "product", "query": { "match": { "name": "公交" } } } , "dest": { "index": "bus_bak" } }
{ "took" : 26, "timed_out" : false, "total" : 5, "updated" : 0, "created" : 5, "deleted" : 0, "batches" : 1, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0, "failures" : [ ] }
直接限制或选择source内容重建
POST _reindex { "source": { "index": "twitter", "_source": ["user", "_doc"] }, "dest": { "index": "new_twitter" } }
11.5 把多个索引一起重建到某个索引里
POST _reindex { "source": { "index": ["bus","user"], "type": ["product","info"] } , "dest": { "index": "blog", "type":"_doc" } }
11.6 限制重新索引的数量
POST _reindex { "size": 1, "source": { "index": "twitter" }, "dest": { "index": "new_twitter" } } POST _reindex { "size": 10000, "source": { "index": "twitter", "sort": { "date": "desc" } }, "dest": { "index": "new_twitter" } }