大叔经验分享（18）hive2.0以后通过beeline执行sql没有进度信息

一问题

在hive1.2中使用hive或者beeline执行sql都有进度信息，但是升级到hive2.0以后，只有hive执行sql还有进度信息，beeline执行sql完全silence，在等待结果的过程中完全不知道执行到哪了

1 hive执行sql过程（有进度信息）

hive> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20181227162003_bd82e3e2-2736-42b4-b1da-4270ead87e4d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1544593827645_22873, Tracking URL = http://rm1:8088/proxy/application_1544593827645_22873/
Kill Command = /export/App/hadoop-2.6.1/bin/hadoop job -kill job_1544593827645_22873
2018-12-27 16:20:27,650 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 116.9 sec
MapReduce Total cumulative CPU time: 1 minutes 56 seconds 900 msec
Ended Job = job_1544593827645_22873
MapReduce Jobs Launched:
Stage-Stage-1: Map: 29 Reduce: 1 Cumulative CPU: 116.9 sec HDFS Read: 518497 HDFS Write: 197 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 56 seconds 900 msec
OK
104
Time taken: 24.437 seconds, Fetched: 1 row(s)

2 beeline执行sql过程（无进度信息）

0: jdbc:hive2://thrift1:10000> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
+------+--+
| c0 |
+------+--+
| 104 |
+------+--+
1 row selected (23.965 seconds)

二代码分析

hive执行sql的详细过程详见：https://www.cnblogs.com/barneywill/p/10185168.html

hive中执行sql最终都会调用到Driver.run，run会调用execute，下面直接看execute代码：

org.apache.hadoop.hive.ql.Driver

  public int execute(boolean deferClose) throws CommandNeedRetryException {
...
      if (jobs > 0) {
        logMrWarning(mrJobs);
        console.printInfo("Query ID = " + queryId);
        console.printInfo("Total jobs = " + jobs);
      }
...
  private void logMrWarning(int mrJobs) {
    if (mrJobs <= 0 || !("mr".equals(HiveConf.getVar(conf, ConfVars.HIVE_EXECUTION_ENGINE)))) {
      return;
    }
    String warning = HiveConf.generateMrDeprecationWarning();
    LOG.warn(warning);
    warning = "WARNING: " + warning;
    console.printInfo(warning);
    // Propagate warning to beeline via operation log.
    OperationLog ol = OperationLog.getCurrentOperationLog();
    if (ol != null) {
      ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n");
    }
  }

可见在hive命令中看到的进度信息是通过console.printInfo输出的；
注意到一个细节，在beeline中虽然没有进度信息，但是有一个warning信息：

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

这个warning信息是通过如下代码输出的：

    OperationLog ol = OperationLog.getCurrentOperationLog();
    if (ol != null) {
      ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n");
    }

所以如果让beeline执行sql也有进度信息，就要通过相同的方式输出；

三 hive进度信息位置

熟悉的进度信息在这里：

org.apache.hadoop.hive.ql.Driver

  public int execute(boolean deferClose) throws CommandNeedRetryException {
...
        console.printInfo("Query ID = " + queryId);
        console.printInfo("Total jobs = " + jobs);

  private TaskRunner launchTask(Task<? extends Serializable> tsk, String queryId, boolean noName,
      String jobname, int jobs, DriverContext cxt) throws HiveException {
...
      console.printInfo("Launching Job " + cxt.getCurJobNo() + " out of " + jobs);

org.apache.hadoop.hive.ql.exec.mr.MapRedTask

  private void setNumberOfReducers() throws IOException {
    ReduceWork rWork = work.getReduceWork();
    // this is a temporary hack to fix things that are not fixed in the compiler
    Integer numReducersFromWork = rWork == null ? 0 : rWork.getNumReduceTasks();

    if (rWork == null) {
      console
          .printInfo("Number of reduce tasks is set to 0 since there's no reduce operator");
    } else {
      if (numReducersFromWork >= 0) {
        console.printInfo("Number of reduce tasks determined at compile time: "
            + rWork.getNumReduceTasks());
      } else if (job.getNumReduceTasks() > 0) {
        int reducers = job.getNumReduceTasks();
        rWork.setNumReduceTasks(reducers);
        console
            .printInfo("Number of reduce tasks not specified. Defaulting to jobconf value of: "
            + reducers);
      } else {
        if (inputSummary == null) {
          inputSummary =  Utilities.getInputSummary(driverContext.getCtx(), work.getMapWork(), null);
        }
        int reducers = Utilities.estimateNumberOfReducers(conf, inputSummary, work.getMapWork(),
                                                          work.isFinalMapRed());
        rWork.setNumReduceTasks(reducers);
        console
            .printInfo("Number of reduce tasks not specified. Estimated from input data size: "
            + reducers);

      }
      console
          .printInfo("In order to change the average load for a reducer (in bytes):");
      console.printInfo("  set " + HiveConf.ConfVars.BYTESPERREDUCER.varname
          + "=<number>");
      console.printInfo("In order to limit the maximum number of reducers:");
      console.printInfo("  set " + HiveConf.ConfVars.MAXREDUCERS.varname
          + "=<number>");
      console.printInfo("In order to set a constant number of reducers:");
      console.printInfo("  set " + HiveConf.ConfVars.HADOOPNUMREDUCERS
          + "=<number>");
    }
  }

大部分都在下边这个类里：

org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper

  public void jobInfo(RunningJob rj) {
    if (ShimLoader.getHadoopShims().isLocalMode(job)) {
      console.printInfo("Job running in-process (local Hadoop)");
    } else {
      if (SessionState.get() != null) {
        SessionState.get().getHiveHistory().setTaskProperty(queryState.getQueryId(),
            getId(), Keys.TASK_HADOOP_ID, rj.getID().toString());
      }
      console.printInfo(getJobStartMsg(rj.getID()) + ", Tracking URL = "
          + rj.getTrackingURL());
      console.printInfo("Kill Command = " + HiveConf.getVar(job, HiveConf.ConfVars.HADOOPBIN)
          + " job  -kill " + rj.getID());
    }
  }

  private MapRedStats progress(ExecDriverTaskHandle th) throws IOException, LockException {
...
      StringBuilder report = new StringBuilder();
      report.append(dateFormat.format(Calendar.getInstance().getTime()));

      report.append(' ').append(getId());
      report.append(" map = ").append(mapProgress).append("%, ");
      report.append(" reduce = ").append(reduceProgress).append('%');
...
      String output = report.toString();
...
      console.printInfo(output);
...
  
  public static String getJobEndMsg(JobID jobId) {
    return "Ended Job = " + jobId;
  }

看起来改动工作量不小，哈哈

posted @ 2018-12-27 16:53 匠人先生阅读(5568) 评论(2) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Thinking in BigData

匠人先生

大叔经验分享（18）hive2.0以后通过beeline执行sql没有进度信息

一问题

1 hive执行sql过程（有进度信息）

2 beeline执行sql过程（无进度信息）

二代码分析

三 hive进度信息位置

公告

Thinking in BigData

匠人先生

大叔经验分享（18）hive2.0以后通过beeline执行sql没有进度信息

一 问题

1 hive执行sql过程（有进度信息）

2 beeline执行sql过程（无进度信息）

二 代码分析

三 hive进度信息位置

公告

一问题

二代码分析