spark 内核调度理解
上图对于spark运行机制,可以概括为以下几步来理解。
1. Create DAG of RDDs to represent computation
2. Create logical execution plan for DAG
1). Pipeline as much as possible
2). Split into "stages" based on need to reorganize data
3. Schedule and execute individual tasks
1) Split each stage into tasks
2) A task is data + computation
3) Execute all tasks within a stage before moving on