set spark.sql.adaptive.repartition.enabled=true;
set spark.sql.shuffle.partitions=2000;
set spark.sql.adaptive.shuffle.targetPostShuffleInputSize=67108864;
1.常用配置
配置任务可用executor数量
每个Executor占用内存
每个Executor的core数目 spark.executor.cores
The maximum memory size of container to running driver
is determined by
the sum of
spark.driver.memoryOverhead
spark.driver.memory.
The maximum memory size of container to running executor
is determined by
the sum of
spark.executor.memory,
spark.executor.memoryOverhead,
spark.memory.offHeap.size
spark.executor.pyspark.memory.
Shuffle Behavior
Memory Management
spark.memory.fraction
在Spark中,执行和存储共享一个统一的区域M
代表整体JVM堆内存中M的百分比(默认0.6)。
剩余的空间(40%)是为用户数据结构、Spark内部metadata预留的,并在稀疏使用和异常大记录的情况下避免OOM错误
spark.memory.storageFraction
Note: Non-heap memory includes off-heap memory (when spark.memory.offHeap.enabled=true)
and memory used by other driver processes (e.g. python process that goes with a PySpark driver)
and memory used by other non-driver processes running in the same container
spark.executor.memoryOverhead
This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc.
spark.memory.offHeap.size
spark.memory.offHeap.enabled
源码
package org.apache.spark.deploy.yarn
DRIVER_MEMORY_OVERHEADEXECUTOR_MEMORY : Amount of memory to use per executor process
EXECUTOR_MEMORY_OVERHEAD: The amount of off-heap memory to be allocated per executor in cluster mode
EXECUTOR_CORES = ConfigBuilder("spark.executor.cores")
EXECUTOR_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.executor.memoryOverhead")
// Executor memory in MB.protectedval executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt
// Additional memory overhead.protectedval memoryOverhead: Int = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse(
math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN)).toInt
// Resource capability requested for each executorsprivate[yarn] val resource = Resource.newInstance(executorMemory + memoryOverhead, executorCores)
package org.apache.spark.memory;
public enumMemoryMode { ON_HEAP, OFF_HEAP}
private[spark] abstractclassMemoryManager(
conf: SparkConf,
numCores: Int,
onHeapStorageMemory: Long,
onHeapExecutionMemory: Long) extendsLogging{
# Tracks whether Tungsten memory will be allocated on the JVM heap or off-heap using sun.misc.Unsafe.
finalval tungstenMemoryMode: MemoryMode = {
if (conf.getBoolean("spark.memory.offHeap.enabled", false)) {
require(conf.getSizeAsBytes("spark.memory.offHeap.size", 0) > 0,
"spark.memory.offHeap.size must be > 0 when spark.memory.offHeap.enabled == true")
require(Platform.unaligned(),
"No support for unaligned Unsafe. Set spark.memory.offHeap.enabled to false.")
MemoryMode.OFF_HEAP
} else {
MemoryMode.ON_HEAP
}
}
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· PostgreSQL 和 SQL Server 在统计信息维护中的关键差异
· C++代码改造为UTF-8编码问题的总结
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· 【.NET】调用本地 Deepseek 模型
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
· 上周热点回顾(2.17-2.23)
· 如何使用 Uni-app 实现视频聊天(源码,支持安卓、iOS)