org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

先来看一下报错内容

复制代码
20/07/17 10:20:07 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
20/07/17 10:20:07 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
20/07/17 10:20:07 WARN yarn.YarnAllocator: Cannot find executorId for container: container_1594881950724_0016_01_000264
20/07/17 10:20:07 INFO yarn.YarnAllocator: Completed container container_1594881950724_0016_01_000264 (state: COMPLETE, exit status: -100)
20/07/17 10:20:07 INFO yarn.YarnAllocator: Container marked as failed: container_1594881950724_0016_01_000264. Exit status: -100. Diagnostics: Container released by application.
20/07/17 10:20:08 INFO yarn.YarnAllocator: Launching container container_1594881950724_0016_01_000265 on host mip-test-hdp134 for executor with ID 264
20/07/17 10:20:08 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: Opening proxy : mip-test-hdp134:23855
20/07/17 10:20:08 ERROR yarn.YarnAllocator: Failed to launch executor 264 on container container_1594881950724_0016_01_000265
org.apache.spark.SparkException: Exception while starting container container_1594881950724_0016_01_000265 on host mip-test-hdp134
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65)
    at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$2.run(YarnAllocator.scala:546)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
    at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
    at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:122)
    ... 5 more
20/07/17 10:20:10 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
20/07/17 10:20:10 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
20/07/17 10:20:10 INFO util.ShutdownHookManager: Shutdown hook called
复制代码

重点是

Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

一番搜索之后得到的解决方案是

在yarn-site.xml中添加如下配置

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>spark_shuffle,mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
    <value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

之后重启yarn。

然而,只做这两个操作是不够的,需要检查一下${HADOOP_HOME}/share/hadoop/yarn/lib目录下是否有spark-*-yarn-shuffle.jar,其中*代表spark版本号,如果没有需要从spark的安装目录下拷贝过来。

spark-*-yarn-shuffle.jar在spark的yarn目录下(也有人说是在jar目录下,可能不同的spark版本有差别吧,未深究)

 

 

 

 

 


参考:

1. spark提交至yarn的的动态资源分配

  https://www.cnblogs.com/hejunhong/p/12335258.html

 

2. Spark任务异常The auxService spaark_shuffle does not exist

https://blog.csdn.net/weixin_39588015/article/details/79365277?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase

 

posted @   梦醒江南·Infinite  阅读(2461)  评论(0编辑  收藏  举报
编辑推荐:
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
阅读排行:
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【译】Visual Studio 中新的强大生产力特性
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
· 字符编码:从基础到乱码解决
历史上的今天:
2018-07-17 VS2015 生成事件 命令参数
点击右上角即可分享
微信分享提示