转载地址:http://lists.canonical.org/pipermail/kragen-hacks/1999-October/000201.html
The C grammar in K&R 2nd Ed is fairly simple, only about 5 pages.
Here it is, translated to BNF. Here ( ) groups, ? means optional, |
is alternation, + means one or more, * means zero or more, space means
sequence, and "x" means literal x. As a special abbreviation, x% means
x ("," x)* -- that is, a non-null comma-separated list of x's.
I did this with the idea of writing a bare-bones recursive-descent parser
for the language. Accordingly, I have eschewed left recursion, and in
general have eschewed recursion as a method of iteration, preferring
explicit iteration. I think the only recursion remaining is where
recursion is really necessary. This resulted in the elimination of
many nonterminals.
I don't know if I will actually carry the implementation as code through,
though.
Discarded nonterminals: external-declaration struct-or-union
struct-declaration-list specifier-qualifier-list struct-declarator-list
enumerator-list init-declarator-list direct-declarator type-qualifier-list
parameter-list identifier-list initializer-list direct-abstract-declarator
labeled-statement expression-statement declaration-list statement-list
primary-expression typedef-name selection-statement
iteration-statement jump-statement argument-expression-list
unary-operator asssignment-operator
Renamed symbols: compound-statement -> block
40 nonterminals; I discarded 25. Also, I turned typedef-name into a terminal.
Original grammar has a total of 65 nonterminals.
C grammar begins here:
Terminals:
typedef-name integer-constant character-constant floating-constant
enumeration-constant identifier
translation-unit: (function-definition | declaration)+
function-definition:
declaration-specifiers? declarator declaration* block
declaration: declaration-specifiers init-declarator% ";"
declaration-specifiers:
(storage-class-specifier | type-specifier | type-qualifier)+
storage-class-specifier:
("auto" | "register" | "static" | "extern" | "typedef")
type-specifier: ("void" | "char" | "short" | "int" | "long" | "float" |
"double" | "signed" | "unsigned" | struct-or-union-specifier |
enum-specifier | typedef-name)
type-qualifier: ("const" | "volatile")
struct-or-union-specifier:
("struct" | "union") (
identifier? "{" struct-declaration+ "}" |
identifier
)
init-declarator: declarator ("=" initializer)?
struct-declaration:
(type-specifier | type-qualifier)+ struct-declarator%
struct-declarator: declarator | declarator? ":" constant-expression
enum-specifier: "enum" (identifier | identifier? "{" enumerator% "}")
enumerator: identifier ("=" constant-expression)?
declarator:
pointer? (identifier | "(" declarator ")") (
"[" constant-expression? "]" |
"(" parameter-type-list ")" |
"(" identifier%? ")"
)*
pointer:
("*" type-qualifier*)*
parameter-type-list: parameter-declaration% ("," "...")?
parameter-declaration:
declaration-specifiers (declarator | abstract-declarator)?
initializer: assignment-expression | "{" initializer% ","? "}"
type-name: (type-specifier | type-qualifier)+ abstract-declarator?
abstract-declarator:
pointer ("(" abstract-declarator ")")? (
"[" constant-expression? "]" |
"(" parameter-type-list? ")"
)*
statement:
((identifier | "case" constant-expression | "default") ":")*
(expression? ";" |
block |
"if" "(" expression ")" statement |
"if" "(" expression ")" statement "else" statement |
"switch" "(" expression ")" statement |
"while" "(" expression ")" statement |
"do" statement "while" "(" expression ")" ";" |
"for" "(" expression? ";" expression? ";" expression? ")" statement |
"goto" identifier ";" |
"continue" ";" |
"break" ";" |
"return" expression? ";"
)
block: "{" declaration* statement* "}"
expression:
assignment-expression%
assignment-expression: (
unary-expression (
"=" | "*=" | "/=" | "%=" | "+=" | "-=" | "<<=" | ">>=" | "&=" |
"^=" | "|="
)
)* conditional-expression
conditional-expression:
logical-OR-expression ( "?" expression ":" conditional-expression )?
constant-expression: conditional-expression
logical-OR-expression:
logical-AND-expression ( "||" logical-AND-expression )*
logical-AND-expression:
inclusive-OR-expression ( "&&" inclusive-OR-expression )*
inclusive-OR-expression:
exclusive-OR-expression ( "|" exclusive-OR-expression )*
exclusive-OR-expression:
AND-expression ( "^" AND-expression )*
AND-expression:
equality-expression ( "&" equality-expression )*
equality-expression:
relational-expression ( ("==" | "!=") relational-expression )*
relational-expression:
shift-expression ( ("<" | ">" | "<=" | ">=") shift-expression )*
shift-expression:
additive-expression ( ("<<" | ">>") additive-expression )*
additive-expression:
multiplicative-expression ( ("+" | "-") multiplicative-expression )*
multiplicative-expression:
cast-expression ( ("*" | "/" | "%") cast-expression )*
cast-expression:
( "(" type-name ")" )* unary-expression
unary-expression:
("++" | "--" | "sizeof" ) * (
"sizeof" "(" type-name ")" |
("&" | "*" | "+" | "-" | "~" | "!" ) cast-expression |
postfix-expression
)
postfix-expression:
(identifier | constant | string | "(" expression ")") (
"[" expression "]" |
"(" assignment-expression% ")" |
"." identifier |
"->" identifier |
"++" |
"--"
)*
constant:
integer-constant |
character-constant |
floating-constant |
enumeration-constant
C grammar ends here.
Notes:
Empty struct declarations (struct foo { }) are not legal in the grammar.
Neither are empty enum declarations (enum foo { }) or empty declaration
lists (int;).
Some comments in the book indicate that the book's expression grammar
captures both precedence and associativity. This was a matter of
some concern to me; making iteration happen with Kleene stars instead
of recursion eliminates the information on associativity. But the
book appears to be incorrect; its grammar captures precedence, but
none of the *-expression nonterminals are right-recursive, and most
of them are left-recursive. So if you parse according to the grammar,
all your operators will associate from left to right.
The split between cast-expression and unary-expression exists mainly to
try to keep you from incrementing or decrementing the results of casts,
I think, but it is ineffective, because an extra set of parens is all
you need. In other words, --(int)x doesn't parse with this grammar,
but --((int)x) does.
There are obviously many constraints on the language that the grammar
cannot express. In particular, constant-expression is subject to some
constraints, and many operators require modifiable lvalues for one of
their operands. It appears that some attempt to capture this has been
made in this grammar, but it would require a much larger grammar to
be successful.
There are also obviously many pieces of semantic information that the
original grammar conveyed by the name of the nonterminal that this
grammar does not convey.
I suspect this grammar still needs some work before I can use it for a
recursive-descent parser. I'm worried about how to tell labels from
variable names starting C statements (they are in separate namespaces,
so the typedef-name trick won't work) and how to tell casts from
parenthesized expressions.
For fun, I wrote the following, in the same language as the C grammar.
Grammar grammar begins here:
Terminals: identifier quoted-string blank-line
grammar:
blank-line*
terminals-decl
blank-line+
(definition blank-line+)*
definition?
terminals-decl: "Terminals" ":" identifier*
definition: identifier ":" alternation-regex
alternation-regex: simple-regex ("|" simple-regex)*
simple-regex:
(
(identifier | quoted-string | "(" alternation-regex ")")
("+" | "*" | "?" | "%")*
)*
Grammar grammar ends here.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 周边上新:园子的第一款马克杯温暖上架
· Open-Sora 2.0 重磅开源!
· .NET周刊【3月第1期 2025-03-02】
· 分享 3 个 .NET 开源的文件压缩处理库,助力快速实现文件压缩解压功能!
· [AI/GPT/综述] AI Agent的设计模式综述