elasticsearch聚合桶排序、分页实战
很多时候业务上需要分组排序分页的场景,类似于mysql的group by xxx limit 0 10。
so,当数据同步到es后,相同的需求场景也出现了。
背景:商品根据商品销量排序,销量数据是以sku存储的,商品列表展示spu。
实现方式有两种:
思路一:根据sku销量排序,分页,业务上不是很精准==>sort:根据sale_volume销量排序,collapse:根据spuId去重得到去重后的记录,配合"from": 0, "size": 10分页得到结果,cardinality:根据spuId得到去重统计结果,即列表spu数据的总数total。
思路二:根据spu销量排序,分页,业务上精准(相当于先计算spu销量,再排序分页)==> bucket_sort .有一些局限性,见文章底部分析。
直接上DSL
1、准备测试数据:
插入spu商品数据
POST /t_spu_001/_bulk {"index":{}} {"id":1508769405482586000,"org_code":"999999","spu_id":1508777903532560400,"sku_id":1508777903570309000,"shop_id":111111,"first_category_id":1508766351106527200,"second_category_id":1508766404416131000,"third_category_id":1508767024225210400,"keywords":null,"spu_name":"弱碱性苏打水娃哈哈苏打水350ml*24瓶整箱 甜味/无味/薄荷味弱碱性苏打水 柠檬味 350ml","sku_attribute":"娃哈哈苏打水350ml*1瓶","main_pic":"https://ruyishangcheng.oss-cn-shanghai.aliyuncs.com/sku-test/2022-03-29/2022-03-29T19:56:26.972/Wbo1k5wD_oTdeMqexkVP7g.jpg","publish_status":1,"price":5,"create_user":"1508762466629263362","create_time":1648553633000,"update_user":"1508762466629263362","update_time":1655802396000} {"index":{}} {"id":1508794393874944000,"org_code":"999999","spu_id":1508776205556666400,"sku_id":1508776205577638000,"shop_id":111111,"first_category_id":1508766351106527200,"second_category_id":1508766404416131000,"third_category_id":1508766664723026000,"keywords":null,"spu_name":"娃哈哈饮用纯净水4.5L(1*4聪明盖)(中心自提)","sku_attribute":"4.5L*4瓶","main_pic":"https://ruyishangcheng.oss-cn-shanghai.aliyuncs.com/sku-test/2022-03-29/2022-03-29T19:37:36.579/QQ截图20220328144035.jpg","publish_status":1,"price":3,"create_user":"1508761063194173441","create_time":1648559590000,"update_user":"1508761063194173441","update_time":1655968379000} {"index":{}} {"id":1508794478406946800,"org_code":"999999","spu_id":1508771759904804900,"sku_id":1508771759967719400,"shop_id":111111,"first_category_id":1508766351106527200,"second_category_id":1508766404416131000,"third_category_id":1508767081859141600,"keywords":null,"spu_name":"娃哈哈启力功能饮料启力维生素运动功能饮料250ml*24瓶(中心自提)","sku_attribute":"250ml*24瓶","main_pic":"https://ruyishangcheng.oss-cn-shanghai.aliyuncs.com/sku-test/2022-03-29/2022-03-29T19:57:22.095/QQ截图20220324204129.jpg","publish_status":1,"price":2,"create_user":"1508761063194173441","create_time":1648559610000,"update_user":"1508761063194173441","update_time":1655968377000} {"index":{}} {"id":1508794557406662700,"org_code":"999999","spu_id":1508771759904804900,"sku_id":1508771759959330800,"shop_id":111111,"first_category_id":1508766351106527200,"second_category_id":1508766404416131000,"third_category_id":1508767180966350800,"keywords":null,"spu_name":"娃哈哈非常可乐碳酸饮料530ml*12瓶 春晚同款(中心自提)","sku_attribute":"530ml*6瓶","main_pic":"https://ruyishangcheng.oss-cn-shanghai.aliyuncs.com/sku-test/2022-03-29/2022-03-29T20:04:50.007/QQ截图20220324132845.jpg","publish_status":1,"price":1,"create_user":"1508761063194173441","create_time":1655968441000,"update_user":"1508761063194173441","update_time":1655968441000}
插入销量库存数据
POST /t_stock_001/_bulk {"index":{}} {"id":1508777903672737800,"spu_id":1508777903532560400,"sku_id":1508777903570309000,"sku_code":"K00000011","shop_id":111111,"sale_volume":0,"stock":35,"create_time":1648555659000,"create_user":999999,"update_time":1658892138000,"update_user":999999,"org_code":"999999"} {"index":{}} {"id":1508778549868183600,"spu_id":1508776205556666400,"sku_id":1508776205577638000,"sku_code":"K00000010","shop_id":111111,"sale_volume":0,"stock":60,"create_time":1648555813000,"create_user":1508772831021121500,"update_time":1658892138000,"update_user":999999,"org_code":"999999"} {"index":{}} {"id":1508778707439796200,"spu_id":1508771759904804900,"sku_id":1508771759959330800,"sku_code":"K00000006","shop_id":111111,"sale_volume":10,"stock":55,"create_time":1648555850000,"create_user":1508772831021121500,"update_time":1658892138000,"update_user":999999,"org_code":"999999"} {"index":{}} {"id":1508778707628540000,"spu_id":1508771759904804900,"sku_id":1508771759967719400,"sku_code":"K00000008","shop_id":111111,"sale_volume":1,"stock":20,"create_time":1648555850000,"create_user":1508772831021121500,"update_time":1658892138000,"update_user":999999,"org_code":"999999"} {"index":{}} {"id":1509413147273220000,"spu_id":1508771759904804900,"sku_id":1508771759959330800,"sku_code":"K00000006","shop_id":111111,"sale_volume":10,"stock":991,"create_time":1648707113000,"create_user":1508762466629263400,"update_time":1658892138000,"update_user":999999,"org_code":"999999"} {"index":{}} {"id":1508784994321375200,"spu_id":1508784994041221000,"sku_id":1508784994078969900,"sku_code":"K00000014","shop_id":111111,"sale_volume":0,"stock":9998,"create_time":1648557349000,"create_user":999999,"update_time":1658892138000,"update_user":999999,"org_code":"999999"}
1、查出门店id为111111的门店下所有上架的在售商品
POST t_spu_001/_search { "query": { "bool": { "filter": [ { "term": { "shop_id": { "value": 111111 } } }, { "term": { "publish_status": { "value": 1 } } } ] } }, "collapse": { "field": "spu_id" }, "aggs": { "total_spu": { "cardinality": { "field": "spu_id" } } } }
total_spu就是商品列表总数。
2.1、(思路一实现方式)再根据步骤1查出来的spu和门店从销量库存表根据销量排序分页查询商品列表
利用es折叠collapse,近似聚合cardinality(类似distinct )实现。
POST /t_stock_001/_search { "size": 10, "query": { "bool": { "filter": [ { "terms": { "spu_id": [ "1508777903532560400", "1508776205556666400" ] } }, { "term": { "shop_id": { "value": 111111 } } } ] } }, "sort": [ { "sale_volume": { "order": "desc" } } ], "collapse": { "field": "spu_id" }, "aggs": { "total_spu": { "cardinality": { "field": "spu_id" } } } }
java代码:
1 NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder(); 2 BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder(); 3 if (CollectionUtils.isNotEmpty(spuIds)) { 4 boolQueryBuilder.filter(QueryBuilders.termsQuery("spu_id", spuIds)); 5 } 6 if (Objects.nonNull(shopId)) { 7 boolQueryBuilder.filter(QueryBuilders.termQuery("shop_id", shopId)); 8 } 9 builder.withQuery(boolQueryBuilder); 10 //去重 11 builder.withCollapseField("spu_id"); 12 //取去重后的count(大数据量下有准确性和性能问题):precision_threshold默认4000,4000以内可确保100%准确性 13 builder.withAggregations(AggregationBuilders.cardinality("total_spu").field("spu_id")); 14 15 //排序(销量) 16 builder.withSorts(Collections.singleton(SortBuilders.fieldSort("sale_volume").order(SortOrder.DESC))); 17 18 //分页 19 builder.withPageable(PageRequest.of(dto.getPage() - 1, dto.getPageSize())); 20 builder.withTrackScores(true); 21 NativeSearchQuery searchQuery = builder.build();
2.2、(思路二实现方式)再根据步骤1查出来的spu和门店从销量库存表根据销量排序分页查询商品列表
利用es聚合桶排序bucket_sort实现。
如果需要限制商品列表最多展示多少屏,则使用最大页数限制。
POST /t_stock_001/_search { "size": 10, "query": { "bool": { "filter": [ { "terms": { "spu_id": [ "1508777903532560400", "1508776205556666400" ] } }, { "term": { "shop_id": { "value": 111111 } } } ] } }, "aggs": { "spu_id": { "terms": { "field": "spu_id", "size": 1000 }, "aggs": { "spuCount": { "sum": { "field": "sale_volume" } }, "selfSort": { "bucket_sort": { "sort": [{ "spuCount": "asc" }], "from": 0, "size": 6 } } } } } }
java代码:
1 NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder(); 2 BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder(); 3 if (CollectionUtils.isNotEmpty(spuIds)) { 4 boolQueryBuilder.filter(QueryBuilders.termsQuery("spu_id", spuIds)); 5 } 6 if (Objects.nonNull(shopId)) { 7 boolQueryBuilder.filter(QueryBuilders.termQuery("shop_id", shopId)); 8 } 9 10 productStockBuilder.withQuery(boolQueryBuilder); 11 //排序:按销量 12 productStockBuilder.withAggregations(AggregationBuilders.terms("spu").field("spu_id") 13 .size(1000) 14 .shardSize(1) 15 .subAggregation(AggregationBuilders.sum("spuCount").field("sale_volume")) 16 .subAggregation(new BucketSortPipelineAggregationBuilder("spu_bucket_sort", 17 Collections.singletonList(new FieldSortBuilder("spuCount").unmappedType("long").order(SortOrder.DESC))) 18 .from(page - 1) 19 .size(pageSize))); 20 21 NativeSearchQuery searchQuery = builder.build(); 22 searchQuery.setTrackTotalHits(true);
官方文档
collapse + cardinality 说明:
1、collapse:去重得到去重后的记录,配合"from": 0, "size": 1分页得到结果
2、cardinality:得到去重统计结果
bucket_sort部分解释:
- 最外层的size=0,表示该查询不返回详情,只返回聚合结果;
- query中使用一个must列表对数据进行过滤;
- terms实现分桶的功能,类似于sql中的分组功能;
- terms中的shard_size表示每个分片返回的数据量,size表示返回的桶的数据,会收到bucket_sort中size的限制;
- value_count实现计数的一个功能;
- sort指定排序的字段和排序的升降序,可以使用聚合后的字段;
- 使用bucket_sort的功能,from、size分别表示从第几条数据开始,取多少条数据。
特别注意:
- 在terms中使用bucket_sort功能的时候,terms中分组的size大小设置应该大于bucket_sort中的from+size的大小,否则会因为terms中size的大小限制了返回的数据。
- bucket_sort的sort排序是针对父聚合返回的结果进行排序的,比如上述terms返回的结果为1000条,那么bucket_sort仅对这1000条进行排序。