Calcite(二): 从list到tree的转换1

  语法解析有个核心目标,那就是需要构建抽象语法树。虽然说语法解析框架可以很容易的识别出各节点的结构,但还需要我们按照自行需求,转换成期望的树结构,才能够方便使用。

  基本上,所有的表达式,都会进行嵌套组合,从而才能够发挥其强大的作用。但,往往我们在做解析的时候,又是线性的解析。所以,最初,我们能得到的必然是一个个token列表。所以,如何将一个个平铺的列表,转换成树状结构,将会是一个比较重要的话题。我们今天就来看看calcite中的其中一非常小的点,它是如何将一个list转换为tree的吧。

 

1. 工具类入口

  calcite 树的转化过程: list -> tree,是一个相对独立的过程,所以被写到工具类中去了。其最终结果是用 SqlNode 承载的。

  // org.apache.calcite.sql.parser.SqlParserUtil#toTree
  /**
   * Converts a list of {expression, operator, expression, ...} into a tree,
   * taking operator precedence and associativity into account.
   */
  public static @Nullable SqlNode toTree(List<@Nullable Object> list) {
    if (list.size() == 1
        && list.get(0) instanceof SqlNode) {
      // Short-cut for the simple common case
      return (SqlNode) list.get(0);
    }
    LOGGER.trace("Attempting to reduce {}", list);
    final OldTokenSequenceImpl tokenSequence = new OldTokenSequenceImpl(list);
    final SqlNode node = toTreeEx(tokenSequence, 0, 0, SqlKind.OTHER);
    LOGGER.debug("Reduced {}", node);
    return node;
  }
  // org.apache.calcite.sql.parser.SqlParserUtil#toTreeEx
  /**
   * Converts a list of {expression, operator, expression, ...} into a tree,
   * taking operator precedence and associativity into account.
   *
   * @param list        List of operands and operators. This list is modified as
   *                    expressions are reduced.
   * @param start       Position of first operand in the list. Anything to the
   *                    left of this (besides the immediately preceding operand)
   *                    is ignored. Generally use value 1.
   * @param minPrec     Minimum precedence to consider. If the method encounters
   *                    an operator of lower precedence, it doesn't reduce any
   *                    further.
   * @param stopperKind If not {@link SqlKind#OTHER}, stop reading the list if
   *                    we encounter a token of this kind.
   * @return the root node of the tree which the list condenses into
   */
  public static SqlNode toTreeEx(SqlSpecialOperator.TokenSequence list,
      int start, final int minPrec, final SqlKind stopperKind) {
    PrecedenceClimbingParser parser = list.parser(start,
        token -> {
          if (token instanceof PrecedenceClimbingParser.Op) {
            PrecedenceClimbingParser.Op tokenOp = (PrecedenceClimbingParser.Op) token;
            final SqlOperator op = ((ToTreeListItem) tokenOp.o()).op;
            return stopperKind != SqlKind.OTHER
                && op.kind == stopperKind
                || minPrec > 0
                && op.getLeftPrec() < minPrec;
          } else {
            return false;
          }
        });
    final int beforeSize = parser.all().size();
    // 将list形式的token转换成树形式的token
    parser.partialParse();
    final int afterSize = parser.all().size();
    // 将树形token转换成SqlNode表示
    final SqlNode node = convert(parser.all().get(0));
    // 将转换掉的token占位全部清空,将在第一个位置处替换为 SqlNode
    list.replaceSublist(start, start + beforeSize - afterSize + 1, node);
    return node;
  }

  // org.apache.calcite.sql.parser.SqlParserUtil#convert
  private static SqlNode convert(PrecedenceClimbingParser.Token token) {
    switch (token.type) {
    case ATOM:
      return requireNonNull((SqlNode) token.o);
    case CALL:
      final PrecedenceClimbingParser.Call call =
          (PrecedenceClimbingParser.Call) token;
      final List<@Nullable SqlNode> list = new ArrayList<>();
      for (PrecedenceClimbingParser.Token arg : call.args) {
        list.add(convert(arg));
      }
      final ToTreeListItem item = (ToTreeListItem) call.op.o();
      if (list.size() == 1) {
        SqlNode firstItem = list.get(0);
        if (item.op == SqlStdOperatorTable.UNARY_MINUS
            && firstItem instanceof SqlNumericLiteral) {
          return SqlLiteral.createNegative((SqlNumericLiteral) firstItem,
              item.pos.plusAll(list));
        }
        if (item.op == SqlStdOperatorTable.UNARY_PLUS
            && firstItem instanceof SqlNumericLiteral) {
          return firstItem;
        }
      }
      return item.op.createCall(item.pos.plusAll(list), list);
    default:
      throw new AssertionError(token);
    }
  }

  以上就是其转换list到tree的框架代码了,关键词是:优先级,转换,。。。

 

2. 具体的list->tree过程

  树的转换过程,主要是将list进行合并组合的过程。大体是按照每个符号的优先级,将其前后元素作为其操作数,合并。比如:a > 1 or b < 2, 会被构建 >a1 or b < 2, >a1 or <b2, (or)(>a1)(<b2) 。  而要选出优先级最高的元素,优先从其开始做树合并,才是正确的选择。确定最高优先级元素过程示意图如下:

 

   具体解析过程如下:

    // org.apache.calcite.sql.parser.SqlParserUtil.OldTokenSequenceImpl#parser
    @Override public PrecedenceClimbingParser parser(int start,
        Predicate<PrecedenceClimbingParser.Token> predicate) {
      final PrecedenceClimbingParser.Builder builder =
          new PrecedenceClimbingParser.Builder();
      for (Object o : Util.skip(list, start)) {
        if (o instanceof ToTreeListItem) {
          final ToTreeListItem item = (ToTreeListItem) o;
          final SqlOperator op = item.getOperator();
          if (op instanceof SqlPrefixOperator) {
            builder.prefix(item, op.getLeftPrec());
          } else if (op instanceof SqlPostfixOperator) {
            builder.postfix(item, op.getRightPrec());
          } else if (op instanceof SqlBinaryOperator) {
            builder.infix(item, op.getLeftPrec(),
                op.getLeftPrec() < op.getRightPrec());
          } else if (op instanceof SqlSpecialOperator) {
            builder.special(item, op.getLeftPrec(), op.getRightPrec(),
                (parser, op2) -> {
                  final List<PrecedenceClimbingParser.Token> tokens =
                      parser.all();
                  final SqlSpecialOperator op1 =
                      (SqlSpecialOperator) requireNonNull((ToTreeListItem) op2.o, "op2.o").op;
                  SqlSpecialOperator.ReduceResult r =
                      op1.reduceExpr(tokens.indexOf(op2),
                          new TokenSequenceImpl(parser));
                  return new PrecedenceClimbingParser.Result(
                      tokens.get(r.startOrdinal),
                      tokens.get(r.endOrdinal - 1),
                      parser.atom(r.node));
                });
          } else {
            throw new AssertionError();
          }
        } else {
          builder.atom(requireNonNull(o, "o"));
        }
      }
      return builder.build();
    }

  // org.apache.calcite.util.PrecedenceClimbingParser#partialParse
  public void partialParse() {
    for (;;) {
      // 每次循环,找到一个符号,将树收缩,若没有找到,则说明树已全部构建完成
      // 按照优先级,会先将 > < = 这些符号替换完,然后再替换 and or 等等
      // 比如: a > 1 or b < 2, 会被构建 >a1 or b < 2, >a1 or <b2, (or)(>a1)(<b2)
      // 所以,优先级的定义非常重要,它是在符号定义的时候就确定下来的
      Op op = highest();
      if (op == null) {
        return;
      }
      final Token t;
      switch (op.type) {
      case POSTFIX: {
        Token previous = requireNonNull(op.previous, () -> "previous of " + op);
        t = call(op, ImmutableList.of(previous));
        replace(t, previous.previous, op.next);
        break;
      }
      case PREFIX: {
        Token next = requireNonNull(op.next, () -> "next of " + op);
        t = call(op, ImmutableList.of(next));
        replace(t, op.previous, next.next);
        break;
      }
      case INFIX: {
        Token previous = requireNonNull(op.previous, () -> "previous of " + op);
        // 构造token关系,如 = a b
        Token next = requireNonNull(op.next, () -> "next of " + op);
        // 替换首尾节点关系
        // 此处的call, 会将left,right 置为-1, 以便在后续的遍历中, 不再找出当前节点
          // replace 将call的next, previous 设置为下一跳节点, 将call设置到整个树的尾部, 即整个树形结构收缩
        t = call(op, ImmutableList.of(previous, next));
        replace(t, previous.previous, next.next);
        // switch 的break, 转到下一次for循环
        break;
      }
      case SPECIAL: {
        Result r = ((SpecialOp) op).special.apply(this, (SpecialOp) op);
        requireNonNull(r, "r");
        replace(r.replacement, r.first.previous, r.last.next);
        break;
      }
      default:
        throw new AssertionError();
      }
      // debug: System.out.println(this);
    }
  }
  // org.apache.calcite.util.PrecedenceClimbingParser#replace
  private void replace(Token t, @Nullable Token previous, @Nullable Token next) {
    t.previous = previous;
    t.next = next;
    // 如果上一节点不为空,则将上一节点的下
    if (previous == null) {
      first = t;
    } else {
      previous.next = t;
    }
    if (next == null) {
      last = t;
    } else {
      next.previous = t;
    }
  }

  // org.apache.calcite.sql.parser.SqlParserUtil#replaceSublist
  /**
   * Replaces a range of elements in a list with a single element. For
   * example, if list contains <code>{A, B, C, D, E}</code> then <code>
   * replaceSublist(list, X, 1, 4)</code> returns <code>{A, X, E}</code>.
   */
  public static <T> void replaceSublist(
      List<T> list,
      int start,
      int end,
      T o) {
    requireNonNull(list, "list");
    Preconditions.checkArgument(start < end);
    // 从后往前remove, 保证remove的准确性
    for (int i = end - 1; i > start; --i) {
      list.remove(i);
    }
    list.set(start, o);
  }

  

3. 各符号定义

  符号定义时,就将优先级定义好了。以便在后续构建时使用。其基本都被定义在 SqlStdOperator 中。 以加减乘除为例,加减会是同一个优先级,乘除是另一个高优先级的操作。

      // org.apache.calcite.sql.fun.SqlStdOperatorTable#AND
    public static final SqlBinaryOperator AND =
      new SqlBinaryOperator(
          "AND",
          SqlKind.AND,
          24,        // AND 优先级24
          true,
          ReturnTypes.BOOLEAN_NULLABLE_OPTIMIZED,
          InferTypes.BOOLEAN,
          OperandTypes.BOOLEAN_BOOLEAN);
  /**
   * Arithmetic division operator, '<code>/</code>'.
   */
  public static final SqlBinaryOperator DIVIDE =
      new SqlBinaryOperator(
          "/",
          SqlKind.DIVIDE,
          60,        // 除号的优先级比较高
          true,
          ReturnTypes.QUOTIENT_NULLABLE,
          InferTypes.FIRST_KNOWN,
          OperandTypes.DIVISION_OPERATOR);
  /**
   * Arithmetic multiplication operator, '<code>*</code>'.
   */
  public static final SqlBinaryOperator MULTIPLY =
      new SqlMonotonicBinaryOperator(
          "*",
          SqlKind.TIMES,
          60,
          true,
          ReturnTypes.PRODUCT_NULLABLE,
          InferTypes.FIRST_KNOWN,
          OperandTypes.MULTIPLY_OPERATOR);
  /**
   * Infix arithmetic minus operator, '<code>-</code>'.
   *
   * <p>Its precedence is less than the prefix {@link #UNARY_PLUS +}
   * and {@link #UNARY_MINUS -} operators.
   */
  public static final SqlBinaryOperator MINUS =
      new SqlMonotonicBinaryOperator(
          "-",
          SqlKind.MINUS,
          40,
          true,

          // Same type inference strategy as sum
          ReturnTypes.NULLABLE_SUM,
          InferTypes.FIRST_KNOWN,
          OperandTypes.MINUS_OPERATOR);
  /**
   * Logical equals operator, '<code>=</code>'.
   */
  public static final SqlBinaryOperator EQUALS =
      new SqlBinaryOperator(
          "=",
          SqlKind.EQUALS,
          30,            // =号的优先级比较小
          true,
          ReturnTypes.BOOLEAN_NULLABLE,
          InferTypes.FIRST_KNOWN,
          OperandTypes.COMPARABLE_UNORDERED_COMPARABLE_UNORDERED);

  /**
   * Logical less-than-or-equal operator, '<code>&lt;=</code>'.
   */
  public static final SqlBinaryOperator LESS_THAN_OR_EQUAL =
      new SqlBinaryOperator(
          "<=",
          SqlKind.LESS_THAN_OR_EQUAL,
          30,
          true,
          ReturnTypes.BOOLEAN_NULLABLE,
          InferTypes.FIRST_KNOWN,
          OperandTypes.COMPARABLE_ORDERED_COMPARABLE_ORDERED);

   最终的树形结果示例如下:

 

  即由操作数和操作符组成的树结构,即可认为它是一种基于栈的编译结构,对于我们表达语义,比较清晰。

  实际上,树结构只是一种表现形式,它需要在不同的场合应用不同的结构,灵活变换,方能如鱼得水。比如整个sql语句,在calcite的树结构中,又不是这样的了。

 

posted @ 2021-08-29 17:01  阿牛20  阅读(543)  评论(0编辑  收藏  举报