org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
先来看一下报错内容
20/07/17 10:20:07 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead) 20/07/17 10:20:07 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests. 20/07/17 10:20:07 WARN yarn.YarnAllocator: Cannot find executorId for container: container_1594881950724_0016_01_000264 20/07/17 10:20:07 INFO yarn.YarnAllocator: Completed container container_1594881950724_0016_01_000264 (state: COMPLETE, exit status: -100) 20/07/17 10:20:07 INFO yarn.YarnAllocator: Container marked as failed: container_1594881950724_0016_01_000264. Exit status: -100. Diagnostics: Container released by application. 20/07/17 10:20:08 INFO yarn.YarnAllocator: Launching container container_1594881950724_0016_01_000265 on host mip-test-hdp134 for executor with ID 264 20/07/17 10:20:08 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: Opening proxy : mip-test-hdp134:23855 20/07/17 10:20:08 ERROR yarn.YarnAllocator: Failed to launch executor 264 on container container_1594881950724_0016_01_000265 org.apache.spark.SparkException: Exception while starting container container_1594881950724_0016_01_000265 on host mip-test-hdp134 at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125) at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65) at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$2.run(YarnAllocator.scala:546) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205) at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:122) ... 5 more 20/07/17 10:20:10 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM 20/07/17 10:20:10 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook called before final status was reported.) 20/07/17 10:20:10 INFO util.ShutdownHookManager: Shutdown hook called
重点是
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
一番搜索之后得到的解决方案是
在yarn-site.xml中添加如下配置
<property> <name>yarn.nodemanager.aux-services</name> <value>spark_shuffle,mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property>
之后重启yarn。
然而,只做这两个操作是不够的,需要检查一下${HADOOP_HOME}/share/hadoop/yarn/lib目录下是否有spark-*-yarn-shuffle.jar,其中*代表spark版本号,如果没有需要从spark的安装目录下拷贝过来。
spark-*-yarn-shuffle.jar在spark的yarn目录下(也有人说是在jar目录下,可能不同的spark版本有差别吧,未深究)
参考:
1. spark提交至yarn的的动态资源分配
https://www.cnblogs.com/hejunhong/p/12335258.html
2. Spark任务异常The auxService spaark_shuffle does not exist
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【译】Visual Studio 中新的强大生产力特性
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
· 字符编码:从基础到乱码解决
2018-07-17 VS2015 生成事件 命令参数