Elasticsearch 基本操作

1、创建索引

1.1、使用缺省配置创建索引(5个分片,1个副本)

PUT test

索引名称test必须小写

1.2、指定分片和副本:

PUT mytest
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  } 
}

2、查看索引

2.1、查看基本信息:

GET mytest

只返回配置信息:

GET mytest/_settings

2.2、查看多个索引:

GET bus,home,blog,mytest/_settings
GET bus,home,blog,mytest

 

3、删除索引

DELETE mytest

 4、关闭和打开索引

关闭:
POST mytest/_close
打开:
POST mytest/_open

关闭索引后不能更新索引和查询索引内容,否则会抛出错误

{
  "error": {
    "root_cause": [
      {
        "type": "index_closed_exception",
        "reason": "closed",
        "index_uuid": "9LpmSP7mR3KlXXZ1oD-YFw",
        "index": "mytest"
      }
    ],
    "type": "index_closed_exception",
    "reason": "closed",
    "index_uuid": "9LpmSP7mR3KlXXZ1oD-YFw",
    "index": "mytest"
  },
  "status": 400
}

5、查看集群索引和健康度

5.1、查看某几个的状态:

查看索引bus,home,blog,mytest四个的状态
GET /_cat/indices/bus,home,blog,mytest?v
查看bus开头的索引
GET /_cat/indices/bus*?v

5.2、查看所有索引:

GET _cat/indices?v

5.3、查看集群健康度:

GET /_cat/health?v

 6、文档基本操作

文档格式:
index/type/id

6.1、添加文档:

PUT /bus/product/1
{
"name" : "公交车1路",
"desc" : "从东站到西站",
"price" : 2,
"producer" : "东部公交",
"tags": [ "空调", "普通","单层"]
}

或者:

POST /bus/product/5
{
"name" : "机场大巴A2线",
"desc" : "机场到B酒店来回",
"price" : 25,
"producer" : "机场大巴",
"tags": [ "单层", "空调","大巴"]
}

 

假设索引id不存在就创建数据(put-if-absent),如果id存在则创建失败

PUT twitter/_doc/1?op_type=create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

PUT twitter/_doc/1/_create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
创建失败:
{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[product][1]: version conflict, document already exists (current version [7])",
        "index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
        "shard": "2",
        "index": "bus"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[product][1]: version conflict, document already exists (current version [7])",
    "index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
    "shard": "2",
    "index": "bus"
  },
  "status": 409
}

 

设置写入数据的超时时间,缺省是1分钟

超时时间为5分钟
PUT twitter/_doc/1?timeout=5m
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

 

6.2、获取文档:

GET bus/product/1

返回:
{
  "_index" : "bus",
  "_type" : "product",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "公交车1路",
    "desc" : "从东站到西站",
    "price" : 2,
    "producer" : "东部公交",
    "tags" : [
      "空调",
      "普通",
      "单层"
    ]
  }
}

指定source返回内容:

GET bus/product/122?_source=name,price
{
  "_index" : "bus",
  "_type" : "product",
  "_id" : "122",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "price" : 5,
    "name" : "公交车1路"
  }
}

不返回source

GET bus/product/122?_source=false

只返回source

GET bus/product/122/_source

 

判断文档是否存在

HEAD bus/product/1

关闭_source字段内容或指定内容

GET twitter/_doc/0?_source=false

GET twitter/_doc/0?_source_include=*.id&_source_exclude=entities

GET twitter/_doc/0?_source=*.id,retweeted

取回的数据,取决于stored_fields参数

建立索引,counter不储存数据
PUT twitter
{
   "mappings": {
      "_doc": {
         "properties": {
            "counter": {
               "type": "integer",
               "store": false
            },
            "tags": {
               "type": "keyword",
               "store": true
            }
         }
      }
   }
}
添加数据
PUT twitter/_doc/1
{
    "counter" : 1,
    "tags" : ["red"]
}
取回tags和counter数据
GET twitter/_doc/1?stored_fields=tags,counter
返回结果里只有tags有数据
{
   "_index": "twitter",
   "_type": "_doc",
   "_id": "1",
   "_version": 1,
   "found": true,
   "fields": {
      "tags": [
         "red"
      ]
   }
}

 

6.3、获取多个文档:

返回id为1和2的文档
GET bus/product/_mget
{
  "ids":[1,2]
}

查询的document是不同index:

GET /_mget
{
  "docs":[
    {
      "_index":"bus",
      "_type":"product",
      "_id":1
    },
    {
      "_index":"mytest",
      "_type":"product",
      "_id":1
    }
    
    ]
}

 

 

6.4、替换文档:全部更新

PUT /bus/product/1
{
"name" : "公交车1路",
"desc" : "从东站到西站",
"price" : 5,
"producer" : "东部公交",
"tags": [ "空调", "普通","单层"]
}

GET /bus/product/1
返回:
{
  "_index" : "bus",
  "_type" : "product",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "name" : "公交车1路",
    "desc" : "从东站到西站",
    "price" : 5,
    "producer" : "东部公交",
    "tags" : [
      "空调",
      "普通",
      "单层"
    ]
  }
}

或者用POST

 根据版本进行更新,如果版本号变化则更新失败。

PUT bus/product/1?version=5
{
  "name":"公交车5路(version5)"
}

{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[product][1]: version conflict, current version [7] is different than the one provided [5]",
        "index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
        "shard": "2",
        "index": "bus"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[product][1]: version conflict, current version [7] is different than the one provided [5]",
    "index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
    "shard": "2",
    "index": "bus"
  },
  "status": 409
}

 

6.5、更新文档:部分更新

POST /bus/product/1/_update
{
  "doc": {
    "price": 10
  }
}
GET /bus/product/1
返回:
{
  "_index" : "bus",
  "_type" : "product",
  "_id" : "1",
  "_version" : 4,
  "found" : true,
  "_source" : {
    "name" : "公交车1路",
    "desc" : "从东站到西站",
    "price" : 10,
    "producer" : "东部公交",
    "tags" : [
      "空调",
      "普通",
      "单层"
    ]
  }
}

 

6.6、删除文档:

DELETE /bus/product/1

然后再查询:

GET /bus/product/1

{
  "_index" : "bus",
  "_type" : "product",
  "_id" : "1",
  "found" : false
}

在删除文档时,可以指定版本,以确保我们试图删除的相关文档实际上正在被删除,同时它没有改变。对文档执行的每个写操作(包括删除)都会导致其版本增加。

DELETE bus/product/100?version=6

根据检索条件删除,慎用,非常容易误删除

POST twitter/_delete_by_query
{
  "query": { 
    "match": {
      "message": "some message"
    }
  }
}

 

 7、检索文档

7.1 检索所有文档

GET bus/product/_search

7.2 term检索

term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个,如果没有安装分词插件,汉字分词按每个汉字来分。

查询不到内容:
GET bus/product/_search
{
  "query": {
    "term": {
        "producer": "公交"
    }
  }
}
producer中所有带“公”的文档都会被查询出来
GET bus/product/_search
{
  "query": {
    "term": {
        "producer": "公"
    }
  }
}

7.3 match检索

match查询会先对搜索词进行分词,分词完毕后再逐个对分词结果进行匹配,因此相比于term的精确搜索,match是分词匹配搜索

描述中带有机场酒店四个字的各种组合的文档都会被返回
GET bus/product/_search
{
  "query": {
    "match": {
      "desc": "机场酒店"
    }
  }
}

7.4 分页

GET bus/_search
{
  "from": 0, 
  "size": 3, 
  "query": {
    "match": {
      "desc": "机场酒店"
    }
  }
}

GET bus/_search
{
  "from": 0,
  "size": 5,
  "query": {
    "match_all": {}
  }
}


7.5 过滤字段,类似select a,b from table中a,b

GET bus/_search
{
  "_source": ["name","desc"] ,
  "query": {
    "match": {
      "desc": "机场"
    }
  }
}

result:
{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 2.1208954,
    "hits" : [
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "9",
        "_score" : 2.1208954,
        "_source" : {
          "name" : "机场大巴A2线",
          "desc" : "机机场场"
        }
      },
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "10",
        "_score" : 2.1208954,
        "_source" : {
          "name" : "机场大巴A2线",
          "desc" : "机机场场"
        }
      },
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "6",
        "_score" : 0.62362677,
        "_source" : {
          "name" : "机场大巴A2线",
          "desc" : "机机场场"
        }
      }
    ]
  }
}

7.6 显示版本

GET bus/_search
{
  "version": true, 
  "from": 0, 
  "size": 3, 
  "query": {
    "match": {
      "desc": "机场酒店"
    }
  }
}

7.7 评分

GET bus/_search
{
  "version": true, 
  "min_score":"2.3", #大于2.3
  "from": 0, 
  "size": 3, 
  "query": {
    "match": {
      "desc": "机场酒店"
    }
  }
}

7.8 高亮关键字

GET bus/_search
{
  "version": true, 
  "from": 0, 
  "size": 3, 
  "query": {
    "match": {
      "desc": "机场酒店"
    }
  }
  , "highlight": {
    "fields": {
      "desc": {}
    }
  }
}

 7.9  短语匹配match_phrase

与match query类似,但用于匹配精确短语,分词后所有词项都要出现在该字段中,字段中的词项顺序要一致。

GET bus/_search
{
  "query": {
    "match_phrase": {
      "name": "公交车122"
    }
  }
}

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 3.4102418,
    "hits" : [
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "3",
        "_score" : 3.4102418,
        "_source" : {
          "name" : "公交车122路",
          "desc" : "从前兴路枢纽到东站",
          "price" : 2,
          "producer" : "公交集团",
          "tags" : [
            "单层",
            "空调"
          ]
        }
      }
    ]
  }
}


对比match
GET bus/_search
{
  "query": {
    "match": {
      "name": "公交车122"
    }
  }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 5.3417225,
    "hits" : [
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "2",
        "_score" : 5.3417225,
        "_source" : {
          "name" : "公交车5路",
          "desc" : "从巫家坝到梁家河",
          "price" : 1,
          "producer" : "公交集团",
          "tags" : [
            "双层",
            "普通",
            "热门"
          ]
        }
      },
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "3",
        "_score" : 3.4102418,
        "_source" : {
          "name" : "公交车122路",
          "desc" : "从前兴路枢纽到东站",
          "price" : 2,
          "producer" : "公交集团",
          "tags" : [
            "单层",
            "空调"
          ]
        }
      },
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "1",
        "_score" : 2.1597636,
        "_source" : {
          "name" : "公交车5路",
          "desc" : "从巫家坝到梁家河",
          "price" : 1,
          "producer" : "公交集团",
          "tags" : [
            "双层",
            "普通",
            "热门"
          ]
        }
      }
    ]
  }
}

7.10 前缀查询match_phrase_prefix

match_phrase_prefix与match_phrase相同,只是它允许在文本中的最后一个词的前缀匹配

GET bus/_search
{
  "query": {
    "match_phrase_prefix": {
      "name": "公交车1"
    }
  }
}

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 6.8204837,
    "hits" : [
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "3",
        "_score" : 6.8204837,
        "_source" : {
          "name" : "公交车122路",
          "desc" : "从前兴路枢纽到东站",
          "price" : 2,
          "producer" : "公交集团",
          "tags" : [
            "单层",
            "空调"
          ]
        }
      }
    ]
  }
}

对比:
GET bus/_search
{
  "query": {
    "match_phrase": {
      "name": "公交车1"
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

7.11 多字段查询multi_match

GET bus/_search
{
  "query": {
    "multi_match": {
      "query": "空港",
      "fields": ["desc","name"]
    }
  }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 3.6836727,
    "hits" : [
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "16",
        "_score" : 3.6836727,
        "_source" : {
          "name" : "机场大巴A2线",
          "desc" : "空港",
          "price" : 21,
          "producer" : "大巴",
          "tags" : [
            "单层",
            "空调",
            "大巴"
          ]
        }
      },
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "18",
        "_score" : 3.5525968,
        "_source" : {
          "name" : "空港大巴A2线",
          "desc" : "机场",
          "price" : 21,
          "producer" : "大巴",
          "tags" : [
            "单层",
            "空调",
            "大巴"
          ]
        }
      },
      {
        "_index" : "bus",
        "_type" : "product",
        "_id" : "19",
        "_score" : 3.1757839,
        "_source" : {
          "name" : "空港大巴A2线",
          "desc" : "空港快线",
          "price" : 21,
          "producer" : "大巴",
          "tags" : [
            "单层",
            "空调",
            "大巴"
          ]
        }
      }
    ]
  }
}

 

 8、路由routing

路由机制与其分片机制有着直接的关系。Elasticsearch的路由机制即是通过哈希算法,将具有相同哈希值的文档放置到同一个主分片中。这个和通过哈希算法来进行负载均衡几乎是一样的。
而Elasticsearch也有一个默认的路由算法:它会将文档的ID值作为依据将其哈希到相应的主分片上,这种算法基本上会保持所有数据在所有分片上的一个平均分布,而不会产生数据热点。
可以自定义路由,将数据集中保存,但控制不好会造成某分片压力过大。

PUT mytest/product/4?routing=weapon
{
      "name" : "手枪",
    "desc" :  "增加100点攻击",
    "price" :  15400,
    "producer" : "神秘商店",
    "tags": [ "机械", "穿透" ]
}

GET mytest/product/4

GET mytest/product/4?routing=weapon

检索中使用routing

GET mytest/_search
{
  "query": {
    "match": {
      "_routing": "weapon"
    }
  }
}

GET mytest/_search
{
  "query": {
    "term": {
      "_routing": "weapon"
    }
  }
}

9、mapping

mapping相当于数据表的表结构,建立索引的时候如果不指定mapping,在创建数据的时候,es会自动推断数据类型,属于动态创建mapping结构,也可以手动(静态)创建。

PUT bus
{
  "mappings": {
    "product":{
       "properties": {
      "name":{"type":"text"},
      "desc":{"type":"text"},
      "price":{"type":"long"},
      "producer":{"type":"text"},
      "tags":{"type":"text"}
    }
   }
  }
  , "settings": {
    "number_of_replicas": 1
    , "number_of_shards": 3
  }
}

格式化日期字段:
PUT bus4
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
  , "mappings": {
    "product":{
      "properties":{
        "name":{"type":"text"},
        "updateDate":{
          "type":"date",
          "format":"yyyy-MM-dd"
        }
      }
    }
  }
}

通常,mapping中已经存在的字段不能updated,但是有几种情况是可以例外的:

  • Object的数据类型可以新增属性。
  • 新的字段可以增加。
  • ignore_above可以更新
PUT my_index 
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "properties": {
            "first": {
              "type": "text"
            }
          }
        },
        "user_id": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT my_index/_mapping/_doc
{
  "properties": {
    "name": {
      "properties": {
        "last": { 
          "type": "text"
        }
      }
    },
    "user_id": {
      "type": "keyword",
      "ignore_above": 100 
    }
  }
}
创建一个新索引,第一个字段name是Object datatype,其下有属性first;
新增一个last字段在name字段下;
将缺省的ignore_above字段设置为100。

 

在建立静态mapping后,还可以动态再加入类型

直接更新提交一个没有的字段,这个时候memo就是推断类型
POST /bus/product/1/_update
{
  "doc": {
    "memo": "a test"
  }
}

用GET bus/_mapping查看
{
  "bus" : {
    "mappings" : {
      "product" : {
        "properties" : {
          "desc" : {
            "type" : "text"
          },
          "memo" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "name" : {
            "type" : "text"
          },
          "price" : {
            "type" : "long"
          },
          "producer" : {
            "type" : "text"
          },
          "tags" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

 10、批量操作

批量操作_bulk,在bulk操作中任意一个操作失败,是不会影响其他的操作的,但是在返回结果里,会告诉你异常日志,

bulk api对json的语法,有严格的要求,每个json串不能换行,只能放一行,同时一个json串和一个json串之间,必须有一个换行

POST /_bulk
{ "delete": { "_index": "home", "_type": "product", "_id": "1" }}
{ "create": { "_index": "home", "_type": "product", "_id": "1" }}
{ "title":    "My first post2","memo":"a test2","date":"2018-12-12" }
{ "update": { "_index": "home", "_type": "product", "_id": "2"} }
{ "doc" : {"title" : "My updated post2"} }
{ "delete": { "_index": "home", "_type": "product", "_id": "3" }}
{ "create": { "_index": "home", "_type": "product", "_id": "3" }}
{ "title":    "My first post3","memo":"a test23","date":"2018-12-13" }


POST /_bulk
{ "index":{ "_index": "home", "_type": "product" ,"_id":1}}
{ "title":"My post1" ,"memo":"a test1","date":"2018-12-01"}
{ "index":{ "_index": "home", "_type": "product" ,"_id":2}}
{ "title":"My post2" ,"memo":"a test2","date":"2018-12-02"}
{ "index":{ "_index": "home", "_type": "product" ,"_id":3}}
{ "title":"My post3" ,"memo":"a test3","date":"2018-12-03"}

以及:POST /home/product/_bulk 或POST /home/_bulk

 11、重建索引reindex

11.1 Reindex不尝试设置目标索引。它不复制源索引的设置。应该在运行_reindex操作之前设置目标索引,包括设置mappings、shard、replica等。

PUT bus_bak
{
  "settings": {
    "number_of_shards": 1
    , "number_of_replicas": 0
  }
}

POST _reindex { "source": { "index": "bus" } , "dest": { "index": "bus_bak" } }

11.2 版本设置, 重建后,目标索引的版本缺省是重新计数的,如果需要与源目标相同需要指定版本类型为external.

POST _reindex
{
  "source": {
    "index": "bus"
  }
  , "dest": {
    "index": "bus_bak",
    "version_type": "external"
  }
}

11.3 只重建目标索引中没有的文档,如果有id相同的文档将发生冲突错误

POST _reindex
{ 
  "source": {
    "index": "bus"
  }
  , "dest": {
    "index": "bus_bak",
    "op_type": "create"
  }
}

默认情况下,版本冲突会中止_reindex进程,但是可以通过设置"conflicts": "proceed"来计数冲突,而不中断执行

POST _reindex
{
  "conflicts": "proceed", 
  "source": {
    "index": "bus"
  }
  , "dest": {
    "index": "bus_bak",
    "op_type": "create"
  }
}

11.4 根据检索结果重建索引

POST _reindex
{
  "source": {
    "index": "bus",
    "type": "product",
    "query": {
      "match": {
        "name": "公交"
      }
    }
  }
  , "dest": {
    "index": "bus_bak"
  }
}
{
  "took" : 26,
  "timed_out" : false,
  "total" : 5,
  "updated" : 0,
  "created" : 5,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

直接限制或选择source内容重建

POST _reindex
{
  "source": {
    "index": "twitter",
    "_source": ["user", "_doc"]
  },
  "dest": {
    "index": "new_twitter"
  }
}

11.5 把多个索引一起重建到某个索引里

POST _reindex
{
  "source": {
    "index": ["bus","user"],
    "type": ["product","info"]
  }
  , "dest": {
    "index": "blog",
    "type":"_doc"
  }
}

11.6 限制重新索引的数量

POST _reindex
{
  "size": 1,
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

POST _reindex
{
  "size": 10000,
  "source": {
    "index": "twitter",
    "sort": { "date": "desc" }
  },
  "dest": {
    "index": "new_twitter"
  }
}

 

posted @ 2018-12-03 22:39  我是属车的  阅读(928)  评论(0编辑  收藏  举报