spark 内核调度理解

 
 
 
上图对于spark运行机制,可以概括为以下几步来理解。
 
1. Create DAG of RDDs to represent computation
2. Create logical execution plan for DAG
   1). Pipeline as much as possible
   2). Split into "stages" based on need to reorganize data
3. Schedule and execute individual tasks
   1) Split each stage into tasks
   2) A task is data + computation
   3) Execute all tasks within a stage before moving on

posted on 2015-03-05 15:41  Ai_togic  阅读(155)  评论(0编辑  收藏  举报

导航