大叔经验分享(18)hive2.0以后通过beeline执行sql没有进度信息
一 问题
在hive1.2中使用hive或者beeline执行sql都有进度信息,但是升级到hive2.0以后,只有hive执行sql还有进度信息,beeline执行sql完全silence,在等待结果的过程中完全不知道执行到哪了
1 hive执行sql过程(有进度信息)
hive> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20181227162003_bd82e3e2-2736-42b4-b1da-4270ead87e4d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1544593827645_22873, Tracking URL = http://rm1:8088/proxy/application_1544593827645_22873/
Kill Command = /export/App/hadoop-2.6.1/bin/hadoop job -kill job_1544593827645_22873
2018-12-27 16:20:27,650 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 116.9 sec
MapReduce Total cumulative CPU time: 1 minutes 56 seconds 900 msec
Ended Job = job_1544593827645_22873
MapReduce Jobs Launched:
Stage-Stage-1: Map: 29 Reduce: 1 Cumulative CPU: 116.9 sec HDFS Read: 518497 HDFS Write: 197 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 56 seconds 900 msec
OK
104
Time taken: 24.437 seconds, Fetched: 1 row(s)
2 beeline执行sql过程(无进度信息)
0: jdbc:hive2://thrift1:10000> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
+------+--+
| c0 |
+------+--+
| 104 |
+------+--+
1 row selected (23.965 seconds)
二 代码分析
hive执行sql的详细过程详见:https://www.cnblogs.com/barneywill/p/10185168.html
hive中执行sql最终都会调用到Driver.run,run会调用execute,下面直接看execute代码:
org.apache.hadoop.hive.ql.Driver
public int execute(boolean deferClose) throws CommandNeedRetryException { ... if (jobs > 0) { logMrWarning(mrJobs); console.printInfo("Query ID = " + queryId); console.printInfo("Total jobs = " + jobs); } ... private void logMrWarning(int mrJobs) { if (mrJobs <= 0 || !("mr".equals(HiveConf.getVar(conf, ConfVars.HIVE_EXECUTION_ENGINE)))) { return; } String warning = HiveConf.generateMrDeprecationWarning(); LOG.warn(warning); warning = "WARNING: " + warning; console.printInfo(warning); // Propagate warning to beeline via operation log. OperationLog ol = OperationLog.getCurrentOperationLog(); if (ol != null) { ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n"); } }
可见在hive命令中看到的进度信息是通过console.printInfo输出的;
注意到一个细节,在beeline中虽然没有进度信息,但是有一个warning信息:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
这个warning信息是通过如下代码输出的:
OperationLog ol = OperationLog.getCurrentOperationLog(); if (ol != null) { ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n"); }
所以如果让beeline执行sql也有进度信息,就要通过相同的方式输出;
三 hive进度信息位置
熟悉的进度信息在这里:
org.apache.hadoop.hive.ql.Driver
public int execute(boolean deferClose) throws CommandNeedRetryException { ... console.printInfo("Query ID = " + queryId); console.printInfo("Total jobs = " + jobs); private TaskRunner launchTask(Task<? extends Serializable> tsk, String queryId, boolean noName, String jobname, int jobs, DriverContext cxt) throws HiveException { ... console.printInfo("Launching Job " + cxt.getCurJobNo() + " out of " + jobs);
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
private void setNumberOfReducers() throws IOException { ReduceWork rWork = work.getReduceWork(); // this is a temporary hack to fix things that are not fixed in the compiler Integer numReducersFromWork = rWork == null ? 0 : rWork.getNumReduceTasks(); if (rWork == null) { console .printInfo("Number of reduce tasks is set to 0 since there's no reduce operator"); } else { if (numReducersFromWork >= 0) { console.printInfo("Number of reduce tasks determined at compile time: " + rWork.getNumReduceTasks()); } else if (job.getNumReduceTasks() > 0) { int reducers = job.getNumReduceTasks(); rWork.setNumReduceTasks(reducers); console .printInfo("Number of reduce tasks not specified. Defaulting to jobconf value of: " + reducers); } else { if (inputSummary == null) { inputSummary = Utilities.getInputSummary(driverContext.getCtx(), work.getMapWork(), null); } int reducers = Utilities.estimateNumberOfReducers(conf, inputSummary, work.getMapWork(), work.isFinalMapRed()); rWork.setNumReduceTasks(reducers); console .printInfo("Number of reduce tasks not specified. Estimated from input data size: " + reducers); } console .printInfo("In order to change the average load for a reducer (in bytes):"); console.printInfo(" set " + HiveConf.ConfVars.BYTESPERREDUCER.varname + "=<number>"); console.printInfo("In order to limit the maximum number of reducers:"); console.printInfo(" set " + HiveConf.ConfVars.MAXREDUCERS.varname + "=<number>"); console.printInfo("In order to set a constant number of reducers:"); console.printInfo(" set " + HiveConf.ConfVars.HADOOPNUMREDUCERS + "=<number>"); } }
大部分都在下边这个类里:
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper
public void jobInfo(RunningJob rj) { if (ShimLoader.getHadoopShims().isLocalMode(job)) { console.printInfo("Job running in-process (local Hadoop)"); } else { if (SessionState.get() != null) { SessionState.get().getHiveHistory().setTaskProperty(queryState.getQueryId(), getId(), Keys.TASK_HADOOP_ID, rj.getID().toString()); } console.printInfo(getJobStartMsg(rj.getID()) + ", Tracking URL = " + rj.getTrackingURL()); console.printInfo("Kill Command = " + HiveConf.getVar(job, HiveConf.ConfVars.HADOOPBIN) + " job -kill " + rj.getID()); } } private MapRedStats progress(ExecDriverTaskHandle th) throws IOException, LockException { ... StringBuilder report = new StringBuilder(); report.append(dateFormat.format(Calendar.getInstance().getTime())); report.append(' ').append(getId()); report.append(" map = ").append(mapProgress).append("%, "); report.append(" reduce = ").append(reduceProgress).append('%'); ... String output = report.toString(); ... console.printInfo(output); ... public static String getJobEndMsg(JobID jobId) { return "Ended Job = " + jobId; }
看起来改动工作量不小,哈哈
---------------------------------------------------------------- 结束啦,我是大魔王先生的分割线 :) ----------------------------------------------------------------
- 由于大魔王先生能力有限,文中可能存在错误,欢迎指正、补充!
- 感谢您的阅读,如果文章对您有用,那么请为大魔王先生轻轻点个赞,ありがとう
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人