Elasticsearch (1) 文档操作

本文介绍如何在Elasticsearch中对文档进行操作。

1、检查Elasticsearch及Kibana运行是否正常

在浏览器输入192.168.6.16:9200,有如下输出则说明Elasticsearch运行正常。

浏览器中输入http://192.168.6.16:5601/,显示如下页面,则说明Kibana运行正常。

2、查看Elasticsearch信息

在kibana Dev tools中输入GET / 指令,同样可以查看Elasticsearch的版本信息及其cluster名称等:

GET /

如下图所示:

 当然也可以在terminal中输入相同的指令来达到相同的效果,不过在Kibana中更加直接:

上面的命令拷贝成为cURL,然后粘贴上terminal上去执行,当然也反过来的操作也成立:

 3、创建索引和文档

创建一个叫twitter的索引,并插入一个文档。

PUT twitter/_doc/1
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

执行结果如下图:

通过上述方法,可以自动创建索引,如果不想用自动创建索引,可以修改设置。

PUT _cluster/settings
{
    "persistent": {
        "action.auto_create_index": "false" 
    }
}

通过上述方法写入到Elasticsearch中的文档,默认情况下不会马上可进行搜索,需要refresh操作,使其对搜索可见。通常会有一个refresh timer定时器来完成这个操作,周期周期为1秒,也就是通常说的Elasticsearch可以试系秒级搜索。如果想让结果马上对搜索可见,可以用如下方法:

PUT twitter/_doc/1?refresh=true
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

 频繁使用fefresh操作,会使Elasticsearch变得非常慢。所以可以通过另外一种方式refresh=wait_for,相当于同步操作,等待下一个refresh周期发生后才返回。这样可以确保我们在调用上面接口后,马上可以搜索到我们刚才录入的文档。

PUT twitter/_doc/1?refresh=wait_for
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

每次执行post或者put接口时,如果文档已经存在,那么相应的版本会自动加1,之前的版本被抛弃掉。如果不是要更新文档的话,可以使用_create端点接口来实现:

PUT twitter/_create/1
{
  "user": "GB",
  "uid": 1,
  "city": "Shenzhen",
  "province": "Guangdong",
  "country": "China"
}

此时如果文档已经存在时,系统将返回错误信息。如下图所示:

用如下命令也是一样的效果(op_type可以有两种值:index及create):

PUT twitter/_doc/1?op_type=create
{
  "user": "张三",
  "message": "Hi",
  "uid": 2,
  "age": 20,
  "city": "北京",
  "province": "北京",
  "country": "中国",
  "address": "中国北京市海淀区",
  "location": {
    "lat": "39.970718",
    "lon": "116.325747"
  }
}

4、查看被修改的文档

1)根据id查找文档。

GET twitter/_doc/1

查询结果如下图所示:

可以通过如下命令来获取文档的_source部分:

GET twitter/_doc/1/_source
GET twitter/_source/1  //Elasticsearch 7.0之后建议使用这个命令

2)使用_mget查找多个文档

GET _mget
{
  "docs": [
    {
      "_index": "twitter",
      "_id": 1
    },
    {
      "_index": "twitter",
      "_id": 2
    }
  ]
}

可以简写为:

GET twitter/_doc/_mget
{
  "ids": ["1", "2"]
}

 

还可以使用_mget获得部分字段

 

5、自动ID生成

在上面命令中,作者特意给文档分配了一个ID。在实际应用中,并不必要。相反,手动分配一个ID时,在数据导入时会检查这个ID的文档是否存在,如果存在则更新这个版本,如果不存在,则创建一个新的文档。如果我们不指定文档ID,而让Elasticsearch自动帮我们生成ID,这样速度更快,这种情况下,我们必须使用POST,而不是PUT。比如:

POST my_index/_doc
{
  "content": "this is really cool"
}

返回结果显示,系统自动分配一个ID:ju9eG3kBAHJu3CVJ0XaV。

并可以是如下命令查询到刚刚建立的文档:

GET /my_index/_doc/ju9eG3kBAHJu3CVJ0XaV/

6、修改文档

使用PUT并指定一个特定的ID来修改文档。

PUT twitter/_doc/1
{
   "user": "GB",
   "uid": 1,
   "city": "北京",
   "province": "北京",
   "country": "中国",
   "location":{
     "lat":"29.084661",
     "lon":"111.335210"
   }
}

使用PUT修改文档时,需要填写文档中的所有项。如要修改单独项时,可以实现如下方法:

POST twitter/_update/1
{
  "doc": {
    "city": "成都",
    "province": "四川"
  }
}

使用_update_by_query来协助搜索文档,然后信息修改。

POST twitter/_update_by_query
{
  "query": {
    "match": {
      "user": "GB"
    }
  },
  "script": {
    "source": "ctx._source.city = params.city;ctx._source.province = params.province;ctx._source.country = params.country",
    "lang": "painless",
    "params": {
      "city": "成都",
      "province": "四川",
      "country": "中国"
    }
  }
}

修改结果:

执行GET twitter/_doc/1 后,显示数据已修改成功。

对于那些名字是中文字段的文档来说,在 painless 语言中,直接打入中文字段名字,并不能被认可。我们可以使用如下的方式来操作:

POST edd/_update_by_query
{
  "query": {
    "match": {
      "姓名": "张彬"
    }
  },
  "script": {
    "source": "ctx._source[\"签到状态\"] = params[\"签到状态\"]",
    "lang": "painless",
    "params" : {
      "签到状态":"已签到"
    }
  }
}

在update接口中,也可以使用script方法来修改。

POST twitter/_update/1
{
  "script" : {
      "source": "ctx._source.city=params.city",
      "lang": "painless",
      "params": {
        "city": "长沙"
      }
  }
}

可以使用 _update 接口使用 ctx['_op'] 来达到删除一个文档的目的,当检测文档的 uid 是否为 1,如果为 1 的话,那么该文档将被删除,否则将不做任何事情。:

POST twitter/_update/1
{
  "script": {
    "source": """
    if(ctx._source.uid == 1) {
      ctx.op = 'delete'
    } else {
      ctx.op = "none"
    }   
  }
}

7、UPSERT

doc_as_upsert 参数检查具有给定ID的文档是否已经存在,并将提供的 doc 与现有文档合并。 如果不存在具有给定 ID 的文档,则会插入具有给定文档内容的新文档。

POST /catalog/_update/3
{
     "doc": {
       "author": "Albert Paro",
       "title": "Elasticsearch 5.0 Cookbook",
       "description": "Elasticsearch 5.0 Cookbook Third Edition",
       "price": "54.99"
      },
     "doc_as_upsert": true
}

8、检查一个文档是否存在

返回200 - OK则说明文档存在。

HEAD twitter/_doc/1

9、删除文档

DELETE twitter/_doc/1

也可以通过查询方式来进行删除,如下语法功能把twitter索引中所有city是上海的文档都删除:

POST twitter/_delete_by_query
{
  "query": {
    "match": {
      "city": "上海"
    }
  }
}

10、检查索引是否存在

HEAD twitter

11、删除索引

DELETE twitter

12、批处理_bulk

POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

使用_search命令查询所有输入的文档

POST twitter/_search

使用_count命令查询有多少条数据:

GET twitter/_count

也可以是create命令创建文档:

POST _bulk
{ "create" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

 index 和 create 的区别:

  • index 总是可以成功,它可以覆盖之前的已经创建的文档。
  • create 如果已经有以那个 id 为名义的文档,就不会成功。

使用delete删除一个已经创建好的文档:

POST _bulk
{ "delete" : { "_index" : "twitter", "_id": 1 }}

使用update来更新一个文档:

POST _bulk
{ "update" : { "_index" : "twitter", "_id": 2 }}
{"doc": { "city": "长沙"}}
posted @ 2021-04-29 11:23  钟齐峰  阅读(193)  评论(0编辑  收藏  举报