临时笔记, 有意思的东西
http://www.antlr.org/article/needlook.html
http://citeseer.comp.nus.edu.sg/440034.html
Tomita(GLR) Parser
Packrat parser (use TDPL)
http://java.sun.com/docs/books/jls/first_edition/html/19.doc.html
http://www.cs.berkeley.edu/~smcpeak/elkhound/
http://www.mollypages.org/page/grammar/index.mp
http://lambda.uta.edu/cse5317/notes/node20.html
http://pages.cpsc.ucalgary.ca/~robin/class/411/LR.1.html
http://en.wikipedia.org/wiki/Memoization
http://en.wikipedia.org/wiki/Comparison_of_parser_generators
http://en.wikipedia.org/wiki/Parsing_expression_grammar
> The author states that he wrote the GLR parser generator solely to
> handle C++ language spec [and someone lapped it up to handle Java].
>
> What exactly is it about OO languages that an LALR(1) parser cannot
> handle?
As the moderator noted, there is nothing about "OO" languages that
LALR(1) parsers cannot handle, but C++ itself is problematic. There
are LALR(1) and LL grammars for Java.
One of the problems with C++, is that expressions and declarations can
look exactly the same (technically, any language containing those the
or of the two productions is ambiguous) and C++ gets around that by
saying, if it looks like a declaration, it is a declaration (forcing
the "or" to be resolved in a particular declaration (and resolving the
ambiguity). However, that resolution is not expressed gramatically,
and one can not take two random context free rules and difference them
and expect the result to be a context free language, which is what the
C++ ambiguity resolution requires one to do.
In contrast, GLR grammars are not required to be unambiguous. Any
ambiguity is resolved by producing a resulting parse-forest that
represents all the potential mabiguous choices and requiring a later
"semantic" pass to choose which parse tree in the forst is the desired
one. Thus, with a GLR parser, one can disambiguate the C++ problem by
selecting the parse tree that treats all the ambiguous expression/
declaration sub-trees as declarations.
The only problem with GLR as a technology is that are no "warnings"
from the grammar processing tool that the language is ambiguous.
Well, there are warnings that the language is not LR (or LALR) or
whatever technology the GLR parser uses as a base. However, some of
those grammars will actually not be ambiguous and some of the will be
ambiguous. However, in any case, once your GLR generator has given a
warning, one either must prove that the language actually isn't
ambiguous or write your semantic phase assuming that the language is
ambiguous and disambiguate the resulting forest.
It is worth mentioning that there are other ways of handling ambiguous
grammars. In particular, one can use predicates to resolve
ambiguities. Predicates allow one to take the difference of two
productions in a controlled manner. In particular, it is possible to
write a syntactic rules that says, try to parse this as a declaration
and if it isn't parse it as an expression. The difference between the
predicated and the GLR solution is that predicated grammars are still
deterministic. There are no hidden ambiguities in a predicated
grammar. If your predicated parser generator gives you an error, you
still have an unresolved ambiguity and if it doesn't the resulting
parser will always construct a parse tree (and not a forest).
I would be remiss if I also did not point out backtracking parsers,
which are another solution to the problem. In fact, all the
implementations of predicated parsers that I know of, use some form of
backtracking in their implementation. General backtrakcing parsers
share the characteristic with GLR parsers that they can parse
ambiguous grammars. Backtracking parsers generally also produce a
parse tree (although in theory they could also produce a forest).
Backtracking parsers have their own deficits though. Many
backtracking parsers will loop forever on some ambiguous grammars.
(Predicated backtraking parsers do not generally have this problem,
although they do not make the same linear time guarantees that pure LL
and LR parsers do(see note)--of course, any parser generator that can
handle a significant class of ambiguous must be inherently non-linear
for some grammars, and GLR parsers have a cubic worst case, same as
Earley parsers.) In addition, most backtracking parsers resolve
ambiguities by selecting one parse tree out of the forest to return.
This is generally done by the order of the rules in the grammar (which
determines the order the rules are tried in in ambiguous cases). If
one looks closely, this is very similar to using predicates
"implicitly" in the grammar. The key difference being that the tool
inserts the predicates rather than the user and does so without
warning and usually without the run-time termination guarantees.
I would like to mention that it is possible to build a predicated
parser using GLR technology, although I don't know of anyone
attempting to do so right now. From thought-experiments I have done
considering whether to implement such a tool, it seems like there
would be some advantages to building such a tool.
Again, I do not want to imply that these are the only techniques for
dealing with ambiguity. For example, Ralph Boland is pursing some
generalization of LR technology that I gather will handle a wider
class of languages and I don't think his technique is any of the
above.
Note: Bryan Ford recently published a paper on a "predicated" parsing
technique that made extensive use of memoization and lazy evaluation
to achieve (if I recall correctly) a linear time guarantee. His
technique shares a characterisitic with general backtracking parsers
in that the order of rules determines what is matched and the the
entire tree is disambiguated that way. He uses an "ordered" or clause
to implement this.
-Chris
Chris Clark said (in part):
> It is worth mentioning that there are other ways of handling ambiguous
> grammars. In particular, one can use predicates to resolve
> ambiguities. Predicates allow one to take the difference of two
> productions in a controlled manner. In particular, it is possible to
> write a syntactic rules that says, try to parse this as a declaration
> and if it isn't parse it as an expression. The difference between the
> predicated and the GLR solution is that predicated grammars are still
> deterministic. There are no hidden ambiguities in a predicated
> grammar. If your predicated parser generator gives you an error, you
> still have an unresolved ambiguity and if it doesn't the resulting
> parser will always construct a parse tree (and not a forest)
I agree with Chris about the use of predicates.
Interestingly, the use of predicates alone can some Type 1 power to a
grammar.
For instance:
L1 = {a^n b^n c+} // clearly a type 2 language
L2 = {a+ b^n c^n} // clearly a type 2 language
L1 intersect L2 = {a^n b^n c^n} // a type 1 language
Current research in the area of this class of grammars can be found here:
http://www.cs.queensu.ca/home/okhotin/
See the section on "Boolean grammars." Intersection can get quite a bit of
power out of a formalism.
My most recent paper deals with several difficult to parse languages of the
classical sort, including the particularly nasty to parse:
L = {a^m b^n c^mn}
The only grammar I've seen expressed for that one in classical form is in
Type 0 due to a length increasing production:
(1) <S> ::= <H><S> | <H><B>
(2) <B> ::= <B><B> | <C>
(3) <H><B> ::= <A><X><N><B>
(4) <N><B> ::= <B><N>
(5) <B><M> ::= <M><B>
(6) <N><C> ::= <M>c
(7) <N>c ::= <M>cc
(8) <X><M><B><B> ::= <B><X><N><B>
(9) <X><B><M>c ::= <B>c
(10) <H><A> ::= <A><H>
(11) <A> ::= a
(12) <B> ::= b
Because production (9) is length increasing, the grammar is in Type 0 form,
even though the language itself is Type 1. I'd like to see that grammar
normalized to a Type 1 -- but haven't been able to find one.
The longest derivation I've been able to do with that by hand is aabcc,
which is:
(11) aabcc --> <A>abcc
(11) <A>abcc --> <A><A>bcc
(12) <A><A>bcc --> <A><A><B>cc
(9) <A><A><B>cc --> <A><A><X><B><M>cc
(7) <A><A><X><B><M>cc --> <A><A><X><B><N>c
(4) <A><A><X><B><N>c --> <A><A><X><N><B>c
(3) <A><A><X><N><B>c --> <A><H><B>c
(9) <A><H><B>c --> <A><H><X><B><M>c
(6) <A><H><X><B><M>c --> <A><H><X><B><N><C>
(4) <A><H><X><B><N><C> --> <A><H><X><N><B><B>
(2) <A><H><X><N><B><B> --> <A><H><X><N><B>
(10) <A><H><X><N><B> --> <H><A><X><N><B>
(3) <H><A><X><N><B> --> <H><H><B>
(1) <H><H><B> --> <H><S>
(1) <H><S> --> <S>
I started on aaabbcccccc -- but got lost in the shuffle. :-(
If anyone wants to try a purely "predicate" approach to the above
language -- I'd love to see it. (Or if anyone would care to post the
derivation of aaabbcccccc, I'd really love to see that, too.)
The $-grammar I wrote for that one accepts strings in O(n^2.3), and has 6
productions and 2 of those are predicates, but also makes use of 2
phi-expressions, which implies a total of 4 predicates (since there is an
implied predicate with every phi-expression) and at least 2 name-indexed
tries.
Also of note is that I allow for a-expressions in predicates, which allows
for substrings that have been parsed to be concatenated to form entirely new
input that is then passed to the predicates. (The $-grammar for a^m b^n c^mn
uses this.) This is similar to what is known as "length-increasing".
Anyway, a few nights ago, I was asking myself about this formation in C++:
class Foo
{
int inline_function(int x)
{
return __y * x; // __y is used before being seen
}
int __y;
}; // this class is legal
class Bar
{
int inline_function(int x)
{
return __y * x; // __y is an undeclared variable
}
}; // this class is not legal because __y never gets declared
$-calculus was able to handle this ... without any code, accepting Foo and
rejecting Bar -- using all of the above mentioned techniques.
Although the resulting $-grammar uses more than 1 explicit predicate, it
does make use of phi-expressions, and these have implied predicates -- so I
was not able to do it without extensive overall use of predication.
Anyway -- I wrote it up and am looking for somewhere that is looking for a
3.5 page paper on such things. Any thoughts?
guard?]
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 周边上新:园子的第一款马克杯温暖上架
· Open-Sora 2.0 重磅开源!
· .NET周刊【3月第1期 2025-03-02】
· 分享 3 个 .NET 开源的文件压缩处理库,助力快速实现文件压缩解压功能!
· Ollama——大语言模型本地部署的极速利器