g++如何判断>>是模板结束还是右移操作符

intro

在使用模板声明中,有一个经典的问题就是如何区分模板声明中的">>"是右移操作符还是一个模板声明的结束标志。好在新的C++标准削弱了这个很强的限制,而是允许reasonable的、根据上下文对符号进行不同的解析。

C++11 improves the specification of the parser so that multiple right angle brackets will be interpreted as closing the template argument list where it is reasonable. This can be overridden by using parentheses around parameter expressions using the “>”, “>=” or “>>” binary operators

这个问题由来已久,是C++的传统遗留问题,不少使用模板的程序员可能都会踩到这个坑,这个看起来不太合理的现象跟经典的C++编译器实现模型有关:经典的编译器都是由词法分析器和语法分析器两大部分组成,词法分析器负责token识别,当遇到>>时,词法分析器已经直接判定为是一个逻辑右移操作符,并把这个属性传递给语法分析器。这里关键的问题在于”模板“是一个语法概念,词法分析的时候无法感知到语法上下文。

C++ standard

但是这个"reasonable"并不是一个定量的描述,具体而精确的描述还是要看C++标准的说明:

When parsing a template-argument-list, the first non-nested > is taken as the ending delimiter rather than
a greater-than operator. Similarly, the first non-nested >> is treated as two consecutive but distinct > tokens,
the first of which is taken as the end of the template-argument-list and completes the template-id.
[Note 2: The second > token produced by this replacement rule can terminate an enclosing template-id construct or it
can be part of a different construct (e.g., a cast). —end note]

[Example 2:  
 template<int i> class X { /* ... */ };  
 X< 1>2 > x1;  
 X<(1>2)> x2;  
 // syntax error  
 // OK  
 template<class T> class Y { /* ... */ };  
 Y<X<1>> x3;  
 // OK, same as Y<X<1> > x3;  
 Y<X<6>>1>> x4;  
 Y<X<(6>>1)>> x5;  
 —end example]  
 // syntax error  

这个规则描述就简单明了:非嵌套的(non-nested )的>都应该被解析为模板结束符。反过来说,如果想在模板参数中使用逻辑右移操作符就必须包括在括弧中。

可能实现

有了C++的标准,实现就简单很多了:在处理模板声明时,遇到开始引导符(<)之后,开始找对应的结束符(>);但是因为>>也会结束声明,所以在模板声明的时候,应该需要记录当前遇到的启示引导符的层数,如果层数大于1,扫描的时候应该同时也需要关注>>。

但是这种实现看起来比较直观,但是处理起来感觉有些麻烦,因为通常gcc在语法解析的时候都是指定一个期望的结束符。例如,如果遇到了左括号,那么单单期望的就是一个有括号。同样,在遇到模板起始引导符左尖括号之后,单单期望一个右尖括号是最简单的。如果期望多个字符,这个感觉实现上要为这种特殊情况修改接口(即函数参数不是一个字符而是一个字符集,并且由于右移是两个字符,所以单单使用字符串作为参数还有定界的问题。

这里补充说明下放在括弧内为什么会很简单:因为当遇到左括号之后,期望的结束符就暂时变成右括号了,等遇到右括号之后再继续期望右尖括号。

gcc实现

在读取模板argument列表结束符时,不仅判断了常规的>操作符,而且也判断了可能的>>操作符(注意:在括弧内的>不会在这里处理,简单理解就是左括弧会有自己的开始和结束逻辑)。

///@file: gcc\cp\parser.cc
/* Returns TRUE iff the next token is the "," or ">" (or `>>', in
   C++0x) ending a template-argument.  */

static bool
cp_parser_next_token_ends_template_argument_p (cp_parser *parser)
{
  cp_token *token;

  token = cp_lexer_peek_token (parser->lexer);
  return (token->type == CPP_COMMA
          || token->type == CPP_GREATER
          || token->type == CPP_ELLIPSIS
	  || ((cxx_dialect != cxx98) && token->type == CPP_RSHIFT)
	  /* For better diagnostics, treat >>= like that too, that
	     shouldn't appear non-nested in template arguments.  */
	  || token->type == CPP_RSHIFT_EQ);
}

下面的代码也是巧妙的解决了一个token被识别且仅被识别两次的问题,正如注释所说:

change the current token to a `>', but don't consume it

当遇到>>操作符时,会把这个token的类型修改为单个>,但是不消耗这个token(也就是这个token依然在词法分析器的输出中),这样下次还会读取到这个token,只是类型从>>变成了单个>,在第二次处理这个token的时候,因为它已经是一个常规token,所以会被消耗。

这种处理方法的巧妙之处是通过修改token的状态来记录了作为结束符的次数,避免了开始糟糕程序员(例如我)考虑的通过计数来表示模板嵌套层数的问题。


/* Parse a template-argument-list, as well as the trailing ">" (but
   not the opening "<").  See cp_parser_template_argument_list for the
   return value.  */

static tree
cp_parser_enclosed_template_argument_list (cp_parser* parser)
{
  tree arguments;
  tree saved_scope;
  tree saved_qualifying_scope;
  tree saved_object_scope;
  bool saved_greater_than_is_operator_p;

  /* [temp.names]

     When parsing a template-id, the first non-nested `>' is taken as
     the end of the template-argument-list rather than a greater-than
     operator.  */
  saved_greater_than_is_operator_p
    = parser->greater_than_is_operator_p;
  parser->greater_than_is_operator_p = false;
  /* Parsing the argument list may modify SCOPE, so we save it
     here.  */
  saved_scope = parser->scope;
  saved_qualifying_scope = parser->qualifying_scope;
  saved_object_scope = parser->object_scope;
  /* We need to evaluate the template arguments, even though this
     template-id may be nested within a "sizeof".  */
  cp_evaluated ev;
  /* Parse the template-argument-list itself.  */
  if (cp_lexer_next_token_is (parser->lexer, CPP_GREATER)
      || cp_lexer_next_token_is (parser->lexer, CPP_RSHIFT)
      || cp_lexer_next_token_is (parser->lexer, CPP_GREATER_EQ)
      || cp_lexer_next_token_is (parser->lexer, CPP_RSHIFT_EQ))
    {
      arguments = make_tree_vec (0);
      SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (arguments, 0);
    }
  else
    arguments = cp_parser_template_argument_list (parser);
  /* Look for the `>' that ends the template-argument-list. If we find
     a '>>' instead, it's probably just a typo.  */
  if (cp_lexer_next_token_is (parser->lexer, CPP_RSHIFT))
    {
      if (cxx_dialect != cxx98)
        {
          /* In C++0x, a `>>' in a template argument list or cast
             expression is considered to be two separate `>'
             tokens. So, change the current token to a `>', but don't
             consume it: it will be consumed later when the outer
             template argument list (or cast expression) is parsed.
             Note that this replacement of `>' for `>>' is necessary
             even if we are parsing tentatively: in the tentative
             case, after calling
             cp_parser_enclosed_template_argument_list we will always
             throw away all of the template arguments and the first
             closing `>', either because the template argument list
             was erroneous or because we are replacing those tokens
             with a CPP_TEMPLATE_ID token.  The second `>' (which will
             not have been thrown away) is needed either to close an
             outer template argument list or to complete a new-style
             cast.  */
	  cp_token *token = cp_lexer_peek_token (parser->lexer);
          token->type = CPP_GREATER;
        }
      else if (!saved_greater_than_is_operator_p)
	{
	  /* If we're in a nested template argument list, the '>>' has
	    to be a typo for '> >'. We emit the error message, but we
	    continue parsing and we push a '>' as next token, so that
	    the argument list will be parsed correctly.  Note that the
	    global source location is still on the token before the
	    '>>', so we need to say explicitly where we want it.  */
	  cp_token *token = cp_lexer_peek_token (parser->lexer);
	  gcc_rich_location richloc (token->location);
	  richloc.add_fixit_replace ("> >");
	  error_at (&richloc, "%<>>%> should be %<> >%> "
		    "within a nested template argument list");

	  token->type = CPP_GREATER;
	}
      else
	{
	  /* If this is not a nested template argument list, the '>>'
	    is a typo for '>'. Emit an error message and continue.
	    Same deal about the token location, but here we can get it
	    right by consuming the '>>' before issuing the diagnostic.  */
	  cp_token *token = cp_lexer_consume_token (parser->lexer);
	  error_at (token->location,
		    "spurious %<>>%>, use %<>%> to terminate "
		    "a template argument list");
	}
    }

栗子

右移操作符第一次作为单个>结束了函数模板参数,并在其中变成了一个常规的>操作符,从而错误提示和单独的一个(第4行)>错误提示相同。

tsecer@harry: cat -n right_shift_template_ending.cpp 
     1  template <typename T>
     2  struct A{};
     3  A<int>> a;
     4  >
     5
tsecer@harry: gcc -c right_shift_template_ending.cpp 
right_shift_template_ending.cpp:3:6: error: expected unqualified-id before ‘>’ token
 A<int>> a;
      ^~
right_shift_template_ending.cpp:4:1: error: expected unqualified-id before ‘>’ token
 >
 ^

outro

gcc的实现非常简洁优雅。很多时候,自己实现的功能可能只是“能用”,远达不到优秀的级别,所以使用成熟功能不仅可以避免潜在的错误,而且效率也会更高。

posted on 2024-09-07 16:28  tsecer  阅读(2)  评论(0编辑  收藏  举报

导航