Spark中集群相关概念
来源:http://spark.apache.org/docs/latest/cluster-overview.html
Term | Meaning |
---|---|
Application |
User program built on Spark. Consists of a driver program and executors on the cluster. (用户基于Spark构建的程序,由一个driver和集群中多个executor组成) |
Application jar |
A jar containing the user's Spark application. In some cases users will want to create an "uber jar" containing their application along with its dependencies. The user's jar should never include Hadoop or Spark libraries, however, these will be added at runtime. (包含用户Spark应用程序的jar文件。某些情况下用户会连同应用程序的依赖创建一个“超级jar”。这个jar文件不应该包含任何Hadoop或Spark库,因为它们会在运行时被加载) |
Driver program |
The process running the main() function of the application and creating the SparkContext (运行应用程序main()函数和创建SparkContext的进程) |
Cluster manager |
An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN) (一个获取集群资源的外部服务,例如standalone,Mesos,YARN) |
Deploy mode |
Distinguishes where the driver process runs. In "cluster" mode, the framework launches the driver inside of the cluster. In "client" mode, the submitter launches the driver outside of the cluster. (指明driver进程的运行位置。在cluster模式中,由框架在集群中启动driver。在client模式中,由提交者在集群外启动driver) |
Worker node |
Any node that can run application code in the cluster (集群中任何可以运行应用程序的节点) |
Executor |
A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors. (一个在worker 节点为application启动的进程,通过它运行tasks和将数据保存在内存或磁盘中。每一个application都有它自己的executors) |
Task |
A unit of work that will be sent to one executor (一个被发送到executor的工作单元) |
Job |
A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. (一个由Spark Action算子触发(例如save,collect)的多任务并行计算。可以在driver日志中看到这个词) |
Stage |
Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs. (每一个job根据tasks之间的依赖关系,划分为一组小的task,这组task就被称为stage。可以在driver日志中看到这个词) |
(渣翻-_-||)
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【译】Visual Studio 中新的强大生产力特性
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
· 字符编码:从基础到乱码解决