elasticsearch使用记录

查询

es的查询语句组装分为查询体与聚合体。

查询分为匹配查询,模糊查询，范围查询，布尔查询，多个布尔查询之间使用should(or),must(and)等连接。

client.prepareSearch(index).setFetchSource(includes, excludes).setTypes(type).setQuery(queryBuilder).addAggregation(aggregation)

通过setQuery设置查询体，通过addAggregation设置聚合体。聚合体当中又分为metric（度量）和bucket(桶)。

度量相当于聚合函数，avg,sum,max,min等。

桶相当于group by。

通过setFetchSource设置_source需要返回的字段与不需要返回的字段。

其它的字段_index,_type,_id,_score等的过滤，在javapi里尚未发现好的解决方案。如果是restapi可以通过filter_path来达到。如/_search?filter_path=hits.hits._source表示只返回_source字段。

组装好的请求体大致如下：

{
  "size" : 10000,
  "query" : {
    "bool" : {
      "must" : [
        {
          "bool" : {
            "should" : [
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : 100000030,
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "147",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "149",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "144",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "146",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "141",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "145",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "142",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "143",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "should" : [
                    {
                      "match_phrase" : {
                        "gid" : {
                          "query" : "148",
                          "slop" : 0,
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              }
            ],
            "disable_coord" : false,
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        },
        {
          "range" : {
            "collect_time" : {
              "from" : "2017-06-07 15:55:13",
              "to" : "2017-06-08 14:02:10",
              "include_lower" : true,
              "include_upper" : true,
              "boost" : 1.0
            }
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : {
    "includes" : [
      "gid",
      "imsi",
      "imei",
      "collect_time",
      "imsi_addr"
    ],
    "excludes" : [
      "hits.hits._id",
      "hits.hits._index"
    ]
  },
  "aggregations" : {
    "aggImsi" : {
      "terms" : {
        "field" : "imsi",
        "size" : 10000,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_count" : "desc"
          },
          {
            "_term" : "asc"
          }
        ]
      },
      "aggregations" : {
        "aggGid" : {
          "cardinality" : {
            "field" : "gid"
          }
        },
        "agg_values" : {
          "bucket_selector" : {
            "buckets_path" : {
              "count" : "aggGid"
            },
            "script" : {
              "inline" : "count >= 10",
              "lang" : "expression"
            },
            "gap_policy" : "skip"
          }
        }
      }
    }
  }
}

View Code

其中query部份是请求过滤部份，相当于where，_source是过滤返回字段，aggregations部份是聚合部份。

响应

跟请求体封装一样，返回体也是分为请求结果和聚合结果两大部份。

先看一个返回结果：

{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "aggImsi": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "460098340000699",
                    "doc_count": 27,
                    "aggGid": {
                        "value": 10
                    }
                },
                {
                    "key": "460021828267404",
                    "doc_count": 26,
                    "aggGid": {
                        "value": 10
                    }
                }
            ]
        }
    }
}

搜索结果在hits里边。所以得到搜索结果的方式是：

SearchHit[] searchHitsAddress = responseAddress.getHits().getHits();

        for (int i = 0; i < searchHitsAddress.length; i++) {
            Map<String, Object> imsiAddress = searchHitsAddress[i].getSource();
        }

聚合结果在aggregations里边。所以得到聚合结果的方式是：

Aggregations aggMapAddress = responseAddress.getAggregations();
StringTerms teamAgg = aggMapAddress.get("aggImsi");

过滤方式

因为搜索结果与聚合结果是分开的，所以过滤也分只过滤搜索结果，只过滤聚合结果，同时过滤搜索结果和聚合结果

根据官网给一个具体的例子。

有数据如下：

color brand num

红色丰田 1

红色宝马 2

黑色长安 1

如果不做任何的过滤，根据颜色进行分组那么将得到如下结果：

搜索结果：

全部3条数据

color brand num

红色丰田 1

红色宝马 2

黑色长安 1

聚合结果：

红色： 2

黑色： 1

只过滤搜索结果也即意味着不过滤聚合结果。

通过post_filter：

SearchRequestBuilder srbAddress = client.prepareSearch(index).setFetchSource(includes, includes3).setTypes(type)
                .setQuery(queryBuilderAll).addAggregation(aggregationAddress)
                .setPostFilter(QueryBuilders.matchQuery("color", "红色"))
                .setSize(size);

那么将得到如下结果：

搜索结果

两条数据

color brand num

红色丰田 1

红色宝马 2

聚合结果

红色： 2

黑色： 1

只过滤聚合结果是在聚合的基础之上进行过滤。比如根据颜色分组过滤掉每种颜色的数量在2以下的车。

通过PipelineAggregatorBuilders的bucketSelector。

Script script = new Script("params.count > 2");

BucketSelectorPipelineAggregationBuilder bs = PipelineAggregatorBuilders.bucketSelector("agg_values",
                bucketsPathsMap, Script.parse(parser));

得到的结果：

搜索结果

两条数据

color brand num

红色丰田 1

红色宝马 2

聚合结果

红色： 2

说到PipelineAggregatorBuilders，多说两句。上面使用的是bucketSelector。如果不想过滤，只想在聚合的基础上进行进一步的计算，可以使用bucketScript使用脚本语言进行计算。当然也可以直接使用方法进行计算。如maxBucket，minBucket，avgBucket，statsBucket等。

过滤搜索结果也过滤聚合结果，略。

使用聚合过的结果对搜索结果进行过滤，目前没有找到好的解决方案。

posted @ 2017-10-26 16:02 是奉壹呀阅读(610) 评论(0) 收藏举报

刷新页面返回顶部

我从二院来

苍茫之天涯，乃吾辈之所爱也；浩瀚之程序，亦吾之所爱也，然则何时而爱耶？必曰：先天下之忧而忧，后天下之爱而爱也！

elasticsearch使用记录

查询

响应

过滤方式

公告