Eliminates these repeated computation in multi aggregations query

 

 

https://github.com/pingcap/tispark/commit/dcca23bfa1aa0c356a4280d82bc8301c0de08318

 

scala> spark.sql ("select count(number),avg(number) from person").explain
== Physical Plan ==
*HashAggregate(keys=[], functions=[sum(count(number#0L)#42L), sum(sum(number#0L)#43L), sum(count(number#0L)#45L)])
+- Exchange SinglePartition
   +- *HashAggregate(keys=[], functions=[partial_sum(count(number#0L)#42L), partial_sum(sum(number#0L)#43L), partial_sum(count(number#0L)#45L)])
      +- Scan CoprocessorRDD[count(number#0L)#42L,sum(number#0L)#43L,count(number#0L)#45L]
  
	  

	  
scala> spark.sql ("select count(number),avg(number) from person").explain()
== Physical Plan ==
*HashAggregate(keys=[], functions=[sum(count(number#0L)#20L), sum(sum(number#0L)#21L), sum(count(number#0L)#20L)])
+- Exchange SinglePartition
   +- *HashAggregate(keys=[], functions=[partial_sum(count(number#0L)#20L), partial_sum(sum(number#0L)#21L), partial_sum(count(number#0L)#20L)])
      +- Scan CoprocessorRDD[count(number#0L)#20L,sum(number#0L)#21L]
	  

  

 

posted @ 2017-10-30 20:56  papering  阅读(221)  评论(0编辑  收藏  举报