day07 sparksql 生成Physical Plan

  1. 案例sql

select A,B  from testdata2 where A>2

 对应的执行计划:

== Analyzed Logical Plan ==
Project [A#23, B#24]
+- Filter (A#23 > 2)
   +- SubqueryAlias testdata2
      +- View (`testData2`, [a#23,b#24])
         +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
            +- ExternalRDD [obj#22]

== Optimized Logical Plan ==
Project [A#23, B#24]
+- Filter (A#23 > 2)
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
      +- ExternalRDD [obj#22]

== Physical Plan ==
Project [A#23, B#24]
+- Filter (A#23 > 2)
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
      +- Scan[obj#22]

== executedPlan ==
*(1) Project [A#23, B#24]
+- *(1) Filter (A#23 > 2)
   +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
      +- Scan[obj#22]

 

1.1 该sql涉及的策略以及策略执行后得到的物理执行计划

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$============================
List(PlanLater Project [A#23, B#24])

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(Project [A#23, B#24]
+- PlanLater Filter (A#23 > 2)
)

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(Filter (A#23 > 2)
+- PlanLater SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
)

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
+- PlanLater ExternalRDD [obj#22]
)

========================涉及的策略:class org.apache.spark.sql.execution.SparkStrategies$BasicOperators$============================
List(Scan[obj#22]
)

打印起作用的策略以及策略作用后的结果代码如下:

    val candidates = strategies.iterator.flatMap{ stra =>
      val my_plan = stra(plan)
      if(my_plan.size>0){
        println("========================涉及的策略:"+stra.getClass+"============================")
        println(my_plan)
        println()
      }
      my_plan
    }

 

2 . 代码作用细节:

2.1 代码入口:

spark.sql("xxx").queryExecution.sparkPlan   // 最终调用QueryPlanner中的plan方法

 

 

  • 物理执行计划分析器SparkPlanner  

   在得到优化后的逻辑执行计划后,SparkPlanner会对Optimized Logical Plan进行转换,生成Physical plans,SparkPlanner的继承关系如下    

 

 

 

 

 

 

 

 

 

 

 

 

posted @ 2023-04-03 22:45  一y样  阅读(47)  评论(0编辑  收藏  举报