antlr, C++, Demo
A Guide To Parsing: Algorithms And Terminology
LR Parsing - Coding Ninjas CodeStudio
The Lemon LALR(1) Parser Generator (sqlite.org)
Why you should not use (f)lex, yacc and bison bison:
- %define api.pure purity Language(s): C Request a pure (reentrant) parser program. yychar = yylex (&yylval);
- %define api.parser.class {name} Language(s): C++, Java, D. The name of the parser class. [Rust比D好在哪里?]
- bison simple.yy -o simple.cc; g++ -std=c++14 simple.cc -o simple
- ‘variant’ (C++) This is similar to union, but special storage techniques are used to allow any kind of C++ object to be used.
- %define api.value.type {string}和#define YYSTYPE string一样好使;string里放0x00没问题
void do_expr() { reuse_tk = yying = 1; assert(yyparse() == 0); yying = 0; int n = ec.size(); memcpy(code + code_end, ec.c_str(), n), code_end += n; } // #undef了NDEBUG
Write text parsers with yacc and lex | Practical parsing with Flex and Bison
https://www.gnu.org/software/bison/manual/
- https://www.gnu.org/software/bison/manual/bison.html.gz
- https://www.gnu.org/software/bison/manual/bison.pdf
试了下antlr。痛定思痛,幸好我下载了现成的,虽然不慎2.7.7和4.7.2都装了 :-)。先啰嗦一段Debian的包管理程序的例子,假设已sudo bash:
# apt-get install aptitude (如果没装aptitude的话) # aptitude search antlr antlr antlr3 antlr4 libantlr4-runtime-dev ... # apt-get install antlr4 libantlr4-runtime-dev # dpkg -l | grep antlr ii antlr 2.7.7+dfsg-10 all language tool for constructing recognizers, compilers etc ii antlr4 4.7.2-5 all ANTLR Parser Generator ii libantlr-java 2.7.7+dfsg-10 all language tool for constructing recognizers, compilers etc (java library) ii libantlr3-runtime-java 3.5.2-9 all Runtime library for ANTLR 3 ii libantlr4-runtime-dev:amd64 4.9+dfsg-1.1 amd64 ANTLR Parser Generator - C++ runtime support (development files) ii libantlr4-runtime-java 4.7.2-5 all Runtime library for ANTLR 4 ii libantlr4-runtime4.9:amd64 4.9+dfsg-1.1 amd64 ANTLR Parser Generator - C++ runtime support (shared library) # dpkg -L antlr4 /usr/bin/antlr4 ... # dpkg -L libantlr4-runtime-dev /usr/include/antlr4-runtime /usr/include/antlr4-runtime/ANTLRErrorListener.h ... /usr/lib/x86_64-linux-gnu/libantlr4-runtime.a ...
然后用这里的例子:
$ cat main.cpp #include <iostream> #include "antlr4-runtime.h" #include "ExpressionLexer.h" #include "ExpressionParser.h" int main(int argc, const char* argv[]) { // Provide the input text in a stream antlr4::ANTLRInputStream input("6*(2+3)"); // Create a lexer from the input ExpressionLexer lexer(&input); // Create a token stream from the lexer antlr4::CommonTokenStream tokens(&lexer); // Create a parser from the token stream ExpressionParser parser(&tokens); // Display the parse tree std::cout << parser.expr()->toStringTree() << std::endl; return 0; } $ cat Expression.g4 grammar Expression; // Parser rule expr : '-' expr | expr ( '*' | '/' ) expr | expr ( '+' | '-' ) expr | '(' expr ')' | INT | ID ; // Lexer rules INT : [0-9]+; ID : [a-z]+; WS : [ \t\r\n]+ -> skip; $ antlr4 -Dlanguage=Cpp Expression.g4 (不快) $ g++ -I/usr/include/antlr4-runtime *.cpp -lantlr4-runtime (相当慢) $ a.out (6*(2+3) (6 6) * ((2+3) ( (2+3 (2 2) + (3 3)) ))) [不输出30?!] $ wc -l *.cpp 475 total $ cat /usr/bin/antlr4 #!/bin/sh CLASSPATH=/usr/share/java/stringtemplate4.jar:/usr/share/java/antlr4.jar:/usr/share/java/antlr4-runtime.jar:/usr/share/java/antlr3-runtime.jar/:/usr/share/java/treelayout.jar
exec java -cp $CLASSPATH org.antlr.v4.Tool "$@"
我还是继续用bison和讨厌Java:
prog : /*empty*/ | func | func prog ; func : LEX_NAME '(' opt_params ')' '{' opt_decls opt_stats '}' ; opt_params : /*empty*/ | params ; opt_decls : /*empty*/ | decls ; params : parameter | parameter ',' params ; parameter : LEX_INT LEX_NAME opt_stats : /* empty */ | stats stats : /* empty */ | stat | stat ';' stats ; stat : expr ';' | NAME '=' expr ';' | NAME '(' '(' opt_params ')' ';' // func call : '{' stats '}' | if_statement | LEX_WHILE '(' expr ')' stats // How many fors? 8. 000, 001, ... 111 | LEX_FOR '(' ';' ';' ')' stats | LEX_FOR '(' expr ';' ';' ')' stats | LEX_FOR '(' ';' expr ';' ')' stats | LEX_FOR '(' ';' ';' expr ')' stats | LEX_FOR '(' expr ';' expr ';' ')' stats | LEX_FOR '(' ';' expr ';' expr ')' stats | LEX_FOR '(' expr ';' ';' expr ')' stats | LEX_FOR '(' expr ';' expr ';' expr ')' // 虽然可以opt_expr : /*empty*/ | expr ';' ; 不就8行嘛。 // 上面的语法是错的,正在改。想来个混合型,bison处理expr,上层用原来的自顶向下。 | LEX_RETURN ';' | LEX_RETURN expr ';' | LEX_BREAK ';' ;
讨厌antlr和Java的理由:
- 基本能看懂的C编译器,只有361行!
- 手撕正则表达式 字符界面画图?我记得Java刚出来时,有FTP客户端用awt还是啥自己画UI控件,界面很漂亮,只是速度慢。现在Java做的界面好像都很寒酸。
bison生成的.cpp更长,但编译起来很快。
$ cat expr.y %{ %} %% expr : ; $ bison -o expr.cpp expr.y $ wc -l expr.cpp 1281 expr.cpp
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· 三行代码完成国际化适配,妙~啊~
· .NET Core 中如何实现缓存的预热?
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
2022-01-03 Flash简介
2022-01-03 Why Are Introductory Classes Called "101"?
2022-01-03 Verilog里的function, loop和#define
2022-01-03 A Child's History of England.88
2022-01-03 alarm, alert
2022-01-03 alloy
2022-01-03 allocate, allot