Hive 执行作业时报错 [ Diagnostics: File file:/ *** reduce.xml does not exist FileNotFoundException: File file:/ ]
本篇文章旨在阐述本人在某一特定情况下遇到 Hive 执行 MapReduce 作业的问题的探索过程与解决方案。不对文章的完全、绝对正确性负责。
Hive 的配置文件 hive-site.xml 中的 hive.exec.scratchdir 的目录地址要放在 HDFS 上。
本人在使用 Hive 执行 MapReduce 作业时,突然发现所有作业均无法执行。下达 HQL 命令的控制台只有短短几行输出。控制台输出内容如下:
1 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. 2 Query ID = chorm_20190310001344_e4ed74d8-4048-4918-aa6f-d3a1a2a60698 3 Total jobs = 1 4 Launching Job 1 out of 1 5 Number of reduce tasks determined at compile time: 1 6 In order to change the average load for a reducer (in bytes): 7 set hive.exec.reducers.bytes.per.reducer=<number> 8 In order to limit the maximum number of reducers: 9 set hive.exec.reducers.max=<number> 10 In order to set a constant number of reducers: 11 set mapreduce.job.reduces=<number> 12 Starting Job = job_1552147755103_0003, Tracking URL = http://m254:8088/proxy/application_1552147755103_0003/ 13 Kill Command = /usr/bigdata/hadoop/bin/hadoop job -kill job_1552147755103_0003 14 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 15 2019-03-10 00:13:47,528 Stage-1 map = 0%, reduce = 0% 16 Ended Job = job_1552147755103_0003 with errors 17 Error during job, obtaining debugging information... 18 FAILED: Execution Error, return code 2 from 19 MapReduce Jobs Launched: 20 Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL 21 Total MapReduce CPU Time Spent: 0 msec
1. 检查 Hadoop 与 YARN 是否正常工作。
Hadoop 好检查。直接通过网页 UI 与 CLI 来检查即可。本人这里确认了 Hadoop 没有问题。
随后是 YARN ,也是通过网页 UI 来检查。同时不要忘记检查集群中各机器的对应进程是否在正常运行。本人这块经检查也没有问题。
最后再检查一下 MapReduce ,我这里直接通过 Hadoop 自带的 example.jar 来跑一个 wordcount 例子来检查。经检查也 OK 。
经过上面 3 步的检查,排除了 Hadoop 的问题。
2. 检查 Hive
3. 查看 YARN 中这个作业的日志
打开 http://yarn-host:8080 网页,找到那条错误的作业记录,点进去,发现有如下错误信息:
1 Diagnostics: 2 Application application_1552147755103_0003 failed 2 times due to AM Container for appattempt_1552147755103_0003_000002 exited with exitCode: -1000 3 For more detailed output, check application tracking page:http://m254:8088/cluster/app/application_1552147755103_0003Then, click on links to logs of each attempt. 4 Diagnostics: File file:/var/bigdata/hive/scratchdir/chorm/46b600b8-9250-48c8-8284-a2f6b649bcae/hive_2019-03-10_00-13-44_741_6711852286526745896-1/-mr-10005/cd1fe621-e494-4ddd-b8f8-a9c80e052c6c/reduce.xml does not exist 5 File file:/var/bigdata/hive/scratchdir/chorm/46b600b8-9250-48c8-8284-a2f6b649bcae/hive_2019-03-10_00-13-44_741_6711852286526745896-1/-mr-10005/cd1fe621-e494-4ddd-b8f8-a9c80e052c6c/reduce.xml does not exist 6 at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus( 7 at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal( 8 at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus( 9 at org.apache.hadoop.fs.FilterFileSystem.getFileStatus( 10 at org.apache.hadoop.yarn.util.FSDownload.copy( 11 at org.apache.hadoop.yarn.util.FSDownload.access$000( 12 at org.apache.hadoop.yarn.util.FSDownload$ 13 at org.apache.hadoop.yarn.util.FSDownload$ 14 at Method) 15 at 16 at 17 at 18 at 19 at 20 at java.util.concurrent.Executors$ 21 at 22 at java.util.concurrent.ThreadPoolExecutor.runWorker( 23 at java.util.concurrent.ThreadPoolExecutor$ 24 at 25 26 Failing this attempt. Failing the application.
从上面日志中发现一条重要信息 reduce.xml does not exists! 。reduce.xml 似乎是 MapReduce 作业相关的文件,所以判断可能是某个或某些和作业执行相关的配置文件缺失,导致作业无法进行下去。然后猛然想起之前自己有改动过 Hive 的配置信息,将属性 hive.exec.scratchdir 指向的目录从 HDFS 上改到了本地文件系统中。然后这条属性就是和 Hive 的作业执行相关的。这条属性指向的目录专门用于存储 Hive 的 MapReduce 作业的阶段执行计划和中间产物的。 Hive 的作业一般都会在集群中执行,现在我将它指向某台机器的本地目录,导致作业中间文件无法在集群中共享,所以作业肯定是不能正常执行的。
在将 hive.exec.scratchdir 属性指向的目录重新设定到 HDFS 中以后,Hive 的 MapReduce 作业就能正常执行了。