ES 查询

1、构造运行环境

打开kibana进入Dev tools,创建索引,并插入测试数据,代码如下:

PUT /logs/_doc/1
{
  "Name":"燕麦",
  "Desc":"燕麦商品描述",
  "Price":111,
  "Tags":["Breakfast","Carbon","Cheap"]
}

PUT /logs/_doc/2
{
  "Name":"牛奶",
  "Desc":"牛奶商品描述",
  "Price":222,
  "Tags":["Breakfast","Nutrition","Expensive"]
}

PUT /logs/_doc/3
{
  "Name":"面包",
  "Desc":"牛奶商品描述",
  "Price":333,
  "Tags":["Breakfast","Barley","Cheap","Carbon"]
}

PUT /logs/_doc/4
{
  "Name":"玉米",
  "Desc":"玉米商品描述",
  "Price":444,
  "Tags":["Breakfast","Vegetables","Cheap","Carbon"]
}

PUT /logs/_doc/5
{
  "Name":"葡萄",
  "Desc":"葡萄商品描述",
  "Price":555,
  "Tags":["Breakfast","Fruits","Expensive","Carbon"]
}

执行以上代码,并执行搜索,查看数据是否插入,简单搜索代码如下:

GET /logs/_search

或者

GET /logs/_search
{
  "query": {
    "match_all": {}
  }
}

注：这里相当于select * from 表名 where 1=1,match_all相当于匹配所有.

查询结果集如下:

{
  "took": 2,//当前请求花费的时间
  "timed_out": false,//当前请求是否超时
  //当前请求的分片情况
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,//当前请求查询到5条结果
      "relation": "eq" //当前查询的关系是等于
    },
    "max_score": 1, //当前查询的评分,最大是1.0
    "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "Name": "燕麦",
          "Desc": "燕麦商品描述",
          "Price": 111,
          "Tags": [
            "Breakfast",
            "Carbon",
            "Cheap"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "2",
        "_score": 1,
        "_source": {
          "Name": "牛奶",
          "Desc": "牛奶商品描述",
          "Price": 222,
          "Tags": [
            "Breakfast",
            "Nutrition",
            "Expensive"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "3",
        "_score": 1,
        "_source": {
          "Name": "面包",
          "Desc": "牛奶商品描述",
          "Price": 333,
          "Tags": [
            "Breakfast",
            "Barley",
            "Cheap",
            "Carbon"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "4",
        "_score": 1,
        "_source": {
          "Name": "玉米",
          "Desc": "玉米商品描述",
          "Price": 444,
          "Tags": [
            "Breakfast",
            "Vegetables",
            "Cheap",
            "Carbon"
          ]
        }
      },
     
        "_index": "logs",
        "_id": "5",
        "_score": 1,
        "_source": {
          "Name": "葡萄",
          "Desc": "葡萄商品描述",
          "Price": 555,
          "Tags": [
            "Breakfast",
            "Fruits",
            "Expensive",
            "Carbon"
          ]
        }
      }
    ]
  }
}

这里插入成功了.

2、相关度评分

1中的demo查询结果集中有一个max_score字段就是相关度评分,当使用查询接口时没有指定排序字段,那么ES就会按照每条记录的评分进行排序.相关度评分中涉及到两种算法,会在后续的文章中进行介绍.

3、元数据

1中的demo查询结果集有一个source字段就是元数据,其大致结构如下

"_source": {
          "Name": "燕麦",
          "Desc": "燕麦商品描述",
          "Price": 111,
          "Tags": [
            "Breakfast",
            "Carbon",
            "Cheap"
          ]
        }

结果类似与关系型数据库中的表字段和相关的值.

3.1、禁用元数据(source)

3.1.1 优缺点

优点:节省存储开销

缺点:不支持update、update_by_query、reindex_api、不支持高亮、不支持reindex,更改mapping分析器以及版本升级、通过索引时查看原文档进行聚合查询会失效、导致自动修复索引的功能实现.

注：如果单纯为了介绍存储开销,可以使用压缩索引,比禁用source更好.

3.1.2 通过创建索引时指定mapping配置来控制source

缺点：通过创建索引时指定mapping配置,来强制限制souce的字段查询的方式或者在mapping中禁用source,设置之后将无法修改.

(1)、完全禁用source

PUT /logs
{
  "mappings": {
    "_source": {
      "enabled": false
    }
  }
}

(2)、禁用source中的部分字段

PUT /logs
{
  "mappings": {
    "_source": {
      "includes": [
        "Name",
        "Tags"
      ],
      "excludes": [
        "Price",
        "Desc"
      ]
    }
  }
}

通过指定includes和excludes来展示和禁用source中的字段

注：不推荐使用mapping的方式来控制source！！！！！！！

3.1.3 通过在查询条件来控制source

(1)、查询时禁用mapping

GET /logs/_search
{
  "_source":false,
  "query": {
    "match_all": {}
  }
}

这时结果集中不会包含mapping等相关信息,只包含index、id、score等相关信息.

(2)、常规搜索

GET /logs/_search
{
  "_source": ["Name","Tags"], 
  "query": {
    "match_all": {}
  }
}

通过在api中指定_source来控制查询结果返回的字段,类似与关系型数据库中的 select Name,Tags from logs;

(3)、通配符查找

删除1demo中的索引,新建以下索引

PUT /logs/_doc/1
{
  "Name":"燕麦",
  "Desc":"燕麦商品描述",
  "Price":111,
  "Items":{
    "Name":"子名称",
    "Price":222
  },
  "Tags":["Breakfast","Carbon","Cheap"]
}

logs索引中包含一个Items的对象属性,如果此时搜索只希望查Items的相关信息,可以执行以下操作

GET /logs/_search
{
  "_source":["Items.*"],
  "query": {
    "match_all": {}
  }
}

返回的结果如下:

   "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "Items": {
            "Price": 222,
            "Name": "子名称"
          }
        }
      }
    ]

(4)、includes查看和exincludes查找

GET /logs/_search
{
  "_source":{
    "includes": ["Items.*","Price","Desc"],
    "excludes": ["Name"]
  },
  "query": {
    "match_all": {}
  }
}

这里可以指定查询包含哪些字段和不包含哪些字段

4、QueryString 查询

4.1 查询所有

GET /logs/_search

4.2 分页搜索

GET /logs/_search?from=0&size=3&sort=Price:desc

查询从第from条开始,一共查size条数据,排序条件时Price按desc排序.

4.3 精准匹配

GET /logs/_search?q=Name:牛奶

这里查询的就是log中Name索引集合中值为牛奶的document.

注:ES默认会为所有的字段创建倒排索引,如果通过q=字段:字段值的形式进行搜索,ES会去指定字段的索引集合查找相关的值并返回.

4.4 all搜索

GET /logs/_search?q=111

重点注意:ES默认会为所有的字段创建倒排索引,所以如4.3中一样,查询条件没有以q=字段:字段值的形式进行搜索,ES扫描所有建立了倒排索引的字段.所以这里的结果集如下:

    "hits": [
      {
        "_index": "logs",
        "_id": "2",
        "_score": 1.3167865,
        "_source": {
          "Name": "牛奶",
          "Desc": "牛奶商品描述111",
          "Price": 222,
          "Tags": [
            "Breakfast",
            "Nutrition",
            "Expensive"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "Name": "燕麦",
          "Desc": "燕麦商品描述",
          "Price": 111,
          "Tags": [
            "Breakfast",
            "Carbon",
            "Cheap"
          ]
        }
      }
    ]

Price和Desc中包含111的记录都被检索出来了.

注：这里需要注意精准匹配的问题.demo中Desc字段会被分词,在进行匹配,但是Price并不会.！！！

posted @ 2022-08-02 16:14 郑小超阅读(1022) 评论(0) 编辑收藏举报

刷新页面返回顶部

Green.Leaf

ES 查询