hive源码(四)前三篇总结

第一篇基本上都是一些异常处理、常用方式处理。

第二篇基本上钩子函数处理比较多,重点就是把SQL转成了AST标准语法树

第三篇目前来说有几点:AST转QB、QB转Operators    Operators优化相关的debug了一下,但是感觉前面的还是有点不熟悉,就先暂停回顾一下

第一篇的总结

   org.apache.hadoop.hive.cli.CliDriver 类主要做了什么?
    初始化日志、变量的设置(hive -d)、设置数据库、-i -S参数 
等待输入参数;结尾 匹配里面的字符串quit source ! 剩余的就是-e 一句一句的执行就行了 curl+c杀掉任务处理

总体来说第一篇还是比较简单的
    第一个类org.apache.hadoop.hive.cli.CliDriver为什么跳到第二个类org.apache.hadoop.hive.ql.Driver
    org.apache.hadoop.hive.cli.CliDriver 类 processCmd方法里面有一句     
CommandProcessor proc = CommandProcessorFactory.get(tokens, (HiveConf) conf)  往里面进
     CommandProcessor result = getForHiveCommand(cmd, conf);  往里面进             
getForHiveCommandInternal(cmd, conf, false);  往里面进             
重点方法:             CommandProcessor result = getForHiveCommand(cmd, conf);             
if (result != null) {               
return result;             
}             
if (isBlank(cmd[0])) {               
return null;           
  } else {       
        return DriverFactory.newDriver(conf);//重点方法             
}             
newDriver(getNewQueryState(conf), null, null);  //往里面进                 
if (!enabled) {                   
return new Driver(queryState, userName, queryInfo);  //就是这个小泥鳅                     
}

第二篇的总结

    hive执行任务的那一串uuid是从哪生成的
    org.apache.hadoop.hive.ql.Driver类compile 方法  545行  
String queryId = queryState.getQueryId();
    SQL是怎么转换为 ASTNode
利用Antlr将SQL转换为KW_**  TOK_**类似的树形结构     下图有样例
Antlr配置文件如下:
    FromClauseParser.g:from后面的sql的匹配规则   SelectClauseParser.g:select后面的sql的匹配规则
    IdentifiersParser.g:函数的解析   HiveParser.g:语法解析的
上面基本上就是一些关键字,下面主要是函数都是一个关键字组成的语法:一个init和after函数  
init其实大部分是打印日志的    after主要枚举了可以执行的语法

样例SQL:select id,user_id from ( select id from data group by id)a inner join( select user_id from data group by user_id)b on a.id = b.user_id

下面这个方法,实在是不适合debug,代码嵌套太深,里面的方法太类似了。不断的去匹配 下一个关键字  然后调用对应的方法。
其实能从里面看到是抽象语法树的结构的,嫌弃麻烦的直接跳过,因为这个代码不是人共写的,所以还好

类位置:
org.apache.hadoop.hive.ql.Driver类 compile 方法  616 行  
tree = ParseUtils.parse(command, ctx);//继续进去
    return parse(command, ctx, null);//继续进去
        ASTNode tree = pd.parse(command, ctx, viewFullyQualifiedName);//继续进去
            org.apache.hadoop.hive.ql.parse类 parse 方法  195 行  //目标函数
            TokenRewriteStream tokens = new TokenRewriteStream(lexer);  //重点方法  下面是详解过程
            tokens 里面有一个变量 tokens 保存里面的单词空格等格式,一个单词一个数组单元,组成一个字符串数组
            r = parser.statement();220  //在这个里面进行解析 往里面进
                org.apache.hadoop.hive.ql.parse.HiveParser  类 statement 方法  1361 行 
                //explain开头 对应一种处理方式      其他对应第二种处理方式
                execStatement3=execStatement();//重点方法继续跟进
                    select->1 load->2 export->3 import->4 dump->5 load->6 status->7 use->8 delete->9 update->10 start->11 merge->12
                    上面的每一个方法都有一个对应的***Statement()方法,此处只看:queryStatementExpression();
                    queryStatementExpression25=queryStatementExpression();//接着往里面进
                        qqueryStatementExpressionBody1001=queryStatementExpressionBody();  // queryStatementExpression()方法里面的重点方法,继续跟进
                            from->1 insert,map,reduce,lparen->2
                            regularBody1003=regularBody();//构建方法,继续跟进
                                from->1 insert,map,reduce,lparen->2
                                selectStatement1016=selectStatement();//继续跟进
                                    a=atomSelectStatement();
                                        s=selectClause();
                                            selectClause 878
                                            selectList4=selectList();   //循环每个节点
                                                selectItem7=selectItem();//最底层的ID属性
                                                selectItem9=selectItem();//最底层的USER_ID属性
                                        f=fromClause();  39792
                                            fromSource13=fromSource(); //1370行
                                                joinSource18=joinSource(); 
                                                    atomjoinSource32=atomjoinSource();//1903行
                                                        subQuerySource23=subQuerySource();//1696行
                                                            queryStatementExpression123=gHiveParser.queryStatementExpression();//4789行
                                                                queryStatementExpressionBody1001=queryStatementExpressionBody();//38788行
                                                                    regularBody1003=regularBody();//38900行
                                                                        selectStatement1016=selectStatement();//39690行
                                                                            a=atomSelectStatement();//40044行
                                                                                s=selectClause();//39777行
                                                                                    selectList4=selectList();//1004行
                                                                                        selectItem7=selectItem();//1209行
                                                                                            expression23=gHiveParser.expression();//1720行
                                                                                                precedenceOrExpression167=precedenceOrExpression();//6870行
                                                                                                    precedenceAndExpression268=precedenceAndExpression();//10791
                                                                                                        precedenceNotExpression264=precedenceNotExpression();//10650
                                                                                                            precedenceEqualExpression262=precedenceEqualExpression();//10514
                                                                                                                precedenceSimilarExpression259=precedenceSimilarExpression();//10254
                                                                                                                    precedenceSimilarExpressionMain231=precedenceSimilarExpressionMain();//9040
                                                                                f=fromClause();//39792
                                                                                    fromSource13=fromSource();//1370
                                                                                        joinSource18=joinSource();//1527
                                                                                            atomjoinSource32=atomjoinSource();//1903
                                                                                                tableSource19=tableSource();//1602
                                                                                g=groupByClause();//39831
                                                                                    groupby_expression3=groupby_expression();//905
                                                                                        rollupOldSyntax5=rollupOldSyntax();//1277
                                                                                            expr=expressionsNotInParenthesis(falsefalse);//1630
                                                                                                first=expression();//2365
                                                                                                    precedenceOrExpression167=precedenceOrExpression();//6870
                                                                                                        precedenceAndExpression268=precedenceAndExpression();//10791
                                                                                                            precedenceNotExpression264=precedenceNotExpression();//10650
                                                                                                                precedenceEqualExpression262=precedenceEqualExpression();//10541
                                                                                                                    precedenceSimilarExpression259=precedenceSimilarExpression();//10254
                                                                                                                        precedenceSimilarExpressionMain231=precedenceSimilarExpressionMain();//9040
                                                                                                                            a=precedenceBitwiseOrExpression();//9144
                                                                                                                                precedenceAmpersandExpression219=precedenceAmpersandExpression();//8661
                                                                                                                                    precedenceConcatenateExpression215=precedenceConcatenateExpression();//8524
                                                                                                                                        precedencePlusExpression212=precedencePlusExpression();//8314
                                                                                                                                            precedenceStarExpression208=precedenceStarExpression();//8175
                                                                                                                                                precedenceBitwiseXorExpression204=precedenceBitwiseXorExpression();//8032
                                                                                                                                                    precedenceUnarySuffixExpression200=precedenceUnarySuffixExpression();7889
                                                                                                                                                        precedenceUnaryPrefixExpression197=precedenceUnaryPrefixExpression();
                                                                                                                                                            precedenceFieldExpression196=precedenceFieldExpression();//7671
                                                                                                                                                                atomExpression179=atomExpression();7172
                                                                                                                                                                    tableOrColumn177=gHiveParser.tableOrColumn();7095
                                                            identifier126=gHiveParser.identifier();4815
                                                    joinToken33=joinToken();//1922
                                                    joinSourcePart34=joinSourcePart();//1927
                                                        subQuerySource41=subQuerySource();//2067
                                                            queryStatementExpression123=gHiveParser.queryStatementExpression();//4789
                                                                queryStatementExpressionBody1001=queryStatementExpressionBody();//38788
                                                                .......和上面的一样了
                                                    expression36=gHiveParser.expression();1947
                                                        precedenceOrExpression167=precedenceOrExpression();6870
                                                            precedenceAndExpression268=precedenceAndExpression();10791
                                                                .......和上面的一样了

抽象语法树样例:
    TOK_QUERY
        TOK_FROM
            TOK_JOIN
                TOK_SUBQUERY
                    TOK_QUERY
                        TOK_FROM
                            TOK_TABREF
                                TOK_TABNAME
                                    data
                        TOK_INSERT
                            TOK_DESTINATION
                                TOK_DIR
                                    TOK_TMP_FILE
                            TOK_SELECT
                                TOK_SELEXPER
                                    TOK_TABLE_OR_COL
                                        ID
                            TOK_GROUPBY
                                    TOK_TABLE_OR_COL
                                        ID
                    a
                TOK_SUBQUERY
                    TOK_QUERY
                        TOK_FROM
                            TOK_TABREF
                                TOK_TABNAME
                                    data
                        TOK_INSERT
                            TOK_DESTINATION
                                TOK_DIR
                                    TOK_TMP_FILE
                            TOK_SELECT
                                TOK_SELEXPER
                                    TOK_TABLE_OR_COL
                                        user_id
                            TOK_GROUPBY
                                    TOK_TABLE_OR_COL
                                        user_id
                    b
                =
                    .
                        TOK_TABLE_OR_COL
                            a
                        id
                    .
                        TOK_TABLE_OR_COL
                            b
                        user_id
        TOK_INSERT
            TOK_DESTINATION
                TOK_DIR
                    TOK_TMP_FILE
            TOK_SELECT
                TOK_SELEXPR
                    TOK_TABLE_OR_COL
                        id
                TOK_SELEXPR
                    TOK_TABLE_OR_COL
                        user_id    

  org.apache.hadoop.hive.ql.Driver 类怎么跳转到第三个类org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer?
    SemanticAnalyzerFactory.get 方法:

//重要方法 继续调用
    1.BaseSemanticAnalyzer sem = getInternal(queryState, tree);
    if(queryState.getHiveOperation() == null) {
          String query = queryState.getQueryString();
          if(query != null && query.length() > 30) {
            query = query.substring(0, 30);
          }
String msg = "Unknown HiveOperation for query='" + query + "' queryId=" + queryState.getQueryId();
LOG.debug(msg);
}
    return sem;

2.SemanticAnalyzerFactory.getInternal 方法
//里面大批量的case when 最后有一句 前提是使用select语句进行调试
    default: {
        SemanticAnalyzer semAnalyzer = HiveConf.getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_CBO_ENABLED) ? new CalcitePlanner(queryState) : new SemanticAnalyzer(queryState);

        return semAnalyzer;
}
返回的是 CalcitePlanner 类
//tree => token => type (954) text(TOK_QUERY)

第三篇的总结

AST树如何转换为QB
SQL样例:select id,user_id from ( select id from data group by id)a inner joinselect user_id from data group by user_id)b on a.id = b.user_id

其实上一篇已经介绍了方法的入口,在这贴一个最终的结果吧。

QB对象部分属性如下:
    qbp对象其实还是保存的AST的语法结构的。1.把最终结果的提取出来了 2.简化了很多树的节点的
aliasToSubq
    a  QBExpr 对象(里面有一整个QB对象)
    b  QBExpr 对象
aliases
    a
    b
qbp
    joinExpr   TOK_JOIN
        TOK_SUBQUERY
            TOK_QUERY
                TOK_FROM
                    data
                TOK_INSERT
                    TOK_DESTINATION
                        TOK_TMP_FILE
                    TOK_SELEXPR
                        id
                    TOK_GROUPBY
                        id
            a
        TOK_SUBQUERY
            TOK_QUERY
                TOK_FROM
                    data
                TOK_INSERT
                    TOK_DESTINATION
                        TOK_TMP_FILE
                    TOK_SELEXPR
                        user_id
                    TOK_GROUPBY
                        user_id
            b
        =
            id
            user_id
    queryFromExpr  joinExpr的父类结构
QB转Operator
同样的代码已经介绍了,这就贴一个最终的结果吧。
parentOperators 属性如下
    FS[18]   conf file:/tmp/root/cc21a753-af66-4519-ae70-85fc1c2b9e60/当前写出路径
    SEL[17]  Column[_col0]  Column[_col1]
        JOIN[16]  outputColumnNames _col0 aliasToOpInfo a |   _col1  aliasToOpInfo  b
            RS[13]  valueCols Column[_col0]
                FIL[12]  
                    SEL[5]  _col0
                        GBY[4]  KEY._col0
                            RS[3]  _col0
                                GBY[2]  Column[id]
                                    SEL[1]  Column[id]  Column[user_id]  Column[pending_reward]  Column[description]  Column[BLOCK__OFFSET__INSIDE__FILE]  Column[INPUT__FILE__NAME]  Column[ROW__ID]
                                        TS[0]  dbName=default tableName=data BLOCKOFFSET FILENAME ROWID
            RS[15]  Column[_col0]
                FIL[14]  
                    SEL[11]  _col0
                        GBY[10]  KEY._col0
                            RS[9]  _col0
                                GBY[8]  Column[user_id]
                                    SEL[7]  Column[id]  Column[user_id]  Column[pending_reward]  Column[description]  Column[BLOCK__OFFSET__INSIDE__FILE]  Column[INPUT__FILE__NAME]  Column[ROW__ID]
                                        TS[6]  dbName=default tableName=data BLOCKOFFSET FILENAME ROWID

    上面的树状结构其实是比较符合正常的SQL理解的。其实这一块我有一个疑问:AST为什么不直接转换成Operator呢。(可能是因为太直接了不好吧,分层的原理)

posted @ 2022-07-30 22:17  Kotlin  阅读(169)  评论(0编辑  收藏  举报
Live2D