antlr, C++, Demo

Parsing | Detailed Pedia

A Guide To Parsing: Algorithms And Terminology

LR Parsing - Coding Ninjas CodeStudio

The Lemon LALR(1) Parser Generator (sqlite.org)

Why you should not use (f)lex, yacc and bison bison:

  • %define api.pure purity Language(s): C Request a pure (reentrant) parser program. yychar = yylex (&yylval);
  • %define api.parser.class {name}  Language(s): C++, Java, D. The name of the parser class. [Rust比D好在哪里?]
  • bison simple.yy -o simple.cc; g++ -std=c++14 simple.cc -o simple
    • ‘variant’ (C++) This is similar to union, but special storage techniques are used to allow any kind of C++ object to be used.
  • %define api.value.type {string}和#define YYSTYPE string一样好使;string里放0x00没问题
    void do_expr() {
      reuse_tk = yying = 1; assert(yyparse() == 0); yying = 0;
      int n = ec.size(); memcpy(code + code_end, ec.c_str(), n), code_end += n;
    }
    // #undef了NDEBUG

Write text parsers with yacc and lex | Practical parsing with Flex and Bison

https://www.gnu.org/software/bison/manual/

试了下antlr。痛定思痛,幸好我下载了现成的,虽然不慎2.7.7和4.7.2都装了 :-)。先啰嗦一段Debian的包管理程序的例子,假设已sudo bash:

# apt-get install aptitude (如果没装aptitude的话)
# aptitude search antlr
antlr
antlr3
antlr4
libantlr4-runtime-dev
...
# apt-get install antlr4 libantlr4-runtime-dev
# dpkg -l | grep antlr
ii  antlr                          2.7.7+dfsg-10                  all          language tool for constructing recognizers, compilers etc
ii  antlr4                         4.7.2-5                        all          ANTLR Parser Generator
ii  libantlr-java                  2.7.7+dfsg-10                  all          language tool for constructing recognizers, compilers etc (java library)
ii  libantlr3-runtime-java         3.5.2-9                        all          Runtime library for ANTLR 3
ii  libantlr4-runtime-dev:amd64    4.9+dfsg-1.1                   amd64        ANTLR Parser Generator - C++ runtime support (development files)
ii  libantlr4-runtime-java         4.7.2-5                        all          Runtime library for ANTLR 4
ii  libantlr4-runtime4.9:amd64     4.9+dfsg-1.1                   amd64        ANTLR Parser Generator - C++ runtime support (shared library)
# dpkg -L antlr4
/usr/bin/antlr4
...
# dpkg -L libantlr4-runtime-dev
/usr/include/antlr4-runtime
/usr/include/antlr4-runtime/ANTLRErrorListener.h
...
/usr/lib/x86_64-linux-gnu/libantlr4-runtime.a
...

然后用这里的例子:

$ cat main.cpp
#include <iostream>
#include "antlr4-runtime.h"
#include "ExpressionLexer.h"
#include "ExpressionParser.h"
int main(int argc, const char* argv[]) {
  // Provide the input text in a stream
  antlr4::ANTLRInputStream input("6*(2+3)");
  // Create a lexer from the input
  ExpressionLexer lexer(&input);
  // Create a token stream from the lexer
  antlr4::CommonTokenStream tokens(&lexer);
  // Create a parser from the token stream
  ExpressionParser parser(&tokens);
  // Display the parse tree
  std::cout << parser.expr()->toStringTree() << std::endl;
  return 0;
}
$ cat Expression.g4
grammar Expression;
// Parser rule
expr : '-' expr
     | expr ( '*' | '/' ) expr
     | expr ( '+' | '-' ) expr
     | '(' expr ')' | INT | ID ;
// Lexer rules
INT : [0-9]+;
ID  : [a-z]+;
WS  : [ \t\r\n]+ -> skip;
$ antlr4 -Dlanguage=Cpp Expression.g4 (不快)
$ g++ -I/usr/include/antlr4-runtime *.cpp -lantlr4-runtime (相当慢)
$ a.out
(6*(2+3) (6 6) * ((2+3) ( (2+3 (2 2) + (3 3)) ))) [不输出30?!]
$ wc -l *.cpp
  475 total
$ cat /usr/bin/antlr4
#!/bin/sh
CLASSPATH=/usr/share/java/stringtemplate4.jar:/usr/share/java/antlr4.jar:/usr/share/java/antlr4-runtime.jar:/usr/share/java/antlr3-runtime.jar/:/usr/share/java/treelayout.jar
exec java -cp $CLASSPATH org.antlr.v4.Tool "$@"

我还是继续用bison和讨厌Java:

prog : /*empty*/ | func | func prog ;
func : LEX_NAME '(' opt_params ')' '{' opt_decls opt_stats '}' ;
opt_params : /*empty*/ | params ;
opt_decls : /*empty*/ | decls ;
params : parameter | parameter ',' params ;
parameter : LEX_INT LEX_NAME
opt_stats : /* empty */ | stats
stats : /* empty */ | stat | stat ';' stats ;
stat
  : expr ';'
  | NAME '=' expr ';'
  | NAME '(' '(' opt_params ')' ';' // func call
  : '{' stats '}'
  | if_statement
  | LEX_WHILE '(' expr ')' stats
  // How many fors? 8. 000, 001, ... 111
  | LEX_FOR '(' ';' ';' ')' stats
  | LEX_FOR '(' expr ';' ';' ')' stats
  | LEX_FOR '(' ';' expr ';' ')' stats
  | LEX_FOR '(' ';' ';' expr ')' stats
  | LEX_FOR '(' expr ';' expr ';' ')' stats
  | LEX_FOR '(' ';' expr ';' expr ')' stats
  | LEX_FOR '(' expr ';' ';' expr ')' stats
  | LEX_FOR '(' expr ';' expr ';' expr ')' 
  // 虽然可以opt_expr : /*empty*/ | expr ';' ; 不就8行嘛。
  // 上面的语法是错的,正在改。想来个混合型,bison处理expr,上层用原来的自顶向下。
  | LEX_RETURN ';'
  | LEX_RETURN expr ';'
  | LEX_BREAK ';'
  ;

讨厌antlr和Java的理由:

bison生成的.cpp更长,但编译起来很快。

$ cat expr.y
%{
%}
%%
expr : ;
$ bison -o expr.cpp expr.y
$ wc -l expr.cpp
1281 expr.cpp
posted @ 2023-01-03 17:41  Fun_with_Words  阅读(43)  评论(0编辑  收藏  举报









 张牌。