编译原理答案@2.2
Exercises for Section 2.2
2.2.1
Consider the context-free grammar:
S -> S S + | S S * | a
- Show how the string
aa+a*
can be generated by this grammar. - Construct a parse tree for this string.
- What language does this grammar generate? Justify your answer.
Answer
S
->S
S * ->S
S + S * -> aS
+ S * -> a a +S
* -> a a + a *- L =
2.2.2
What language is generated by the following grammars? In each case justify your answer.
- S -> 0 S 1 | 0 1
- S -> + S S | - S S | a
- S -> S ( S ) S | ε
- S -> a S b S | b S a S | ε
- S -> a | S + S | S S | S * | ( S )
Answer
- L = {0n1n | n>=1}
- L =
- L =
- L =
- L = {Regular expressions used to describe regular languages} refer to wiki
2.2.3
Which of the grammars in Exercise 2.2.2 are ambiguous?
Answer
-
No
-
No
-
Yes
-
Yes
-
Yes
2.2.4
Construct unambiguous context-free grammars for each of
the following languages. In each case show that your grammar is correct.
- Arithmetic expressions in postfix notation.
- Left-associative lists of identifiers separated by commas.
- Right-associative lists of identifiers separated by commas.
- Arithmetic expressions of integers and identifiers with the four binary operators +, -, *, /.
- Add unary plus and minus to the arithmetic operators of 4.
Answer
1. E -> E E op | num
2. list -> list , id | id
3. list -> id , list | id
4. expr -> expr + term | expr - term | term
term -> term * factor | term / factor | factor
factor -> id | num | (expr)
5. expr -> expr + term | expr - term | term
term -> term * unary | term / unary | unary
unary -> + factor | - factor | factor
factor - > id | num | (expr)
2.2.5
-
Show that all binary strings generated by the following grammar have values divisible by 3. Hint. Use induction on the number of nodes in a parse tree.
num -> 11 | 1001 | num 0 | num num
-
Does the grammar generate all binary strings with values divisible by 3?
Answer
-
Proof
Any string derived from the grammar can be considered to be a sequence consisting of 11 and 1001, where each sequence element is possibly suffixed with a 0.
Let
n
be the set of positions where11
is placed.11
is said to be at positioni
if the first1
in11
is at positioni
, wherei
starts at 0 and
grows from least significant to most significant bit.Let
m
be the equivalent set for1001
.The sum of any string produced by the grammar is:
sum
= Σn (21 + 20) * 2 n + Σm (23 + 20) * 2m
= Σn 3 * 2 n + Σm 9 * 2m
This is clearly divisible by 3.
-
No. Consider the string "10101", which is divisible by 3, but cannot be
derived from the grammar.Readers seeking a more formal proof can read about it below:
Proof:
Every number divisible by 3 can be written in the form
3k
. We will considerk > 0
(though it would be valid to considerk
to be an arbitrary integer).Note that every part of num(11, 1001 and 0) is divisible by 3, if the grammar could generate all the numbers divisible by 3, we can get a production for binary k from num's production:
3k = num -> 11 | 1001 | num 0 | num num k = num/3 -> 01 | 0011 | k 0 | k k k -> 01 | 0011 | k 0 | k k
It is obvious that any value of
k
that has more than 2 consecutive bits set to 1 can never be produced. This can be confirmed by the example given in the beginning:10101 is 3*7, hence, k = 7 = 111 in binary. Because 111 has more than 2
consecutive 1's in binary, the grammar will never produce 21.
2.2.6
Construct a context-free grammar for roman numerals.
Note: we just consider a subset of roman numerals which is less than 4k.
Answer
-
via wikipedia, we can categorize the single roman numerals into 4 groups:
I, II, III | I V | V, V I, V II, V III | I X
then get the production:
digit -> smallDigit | I V | V smallDigit | I X smallDigit -> I | II | III | ε
-
and we can find a simple way to map roman to arabic numerals. For example:
- XII => X, II => 10 + 2 => 12
- CXCIX => C, XC, IX => 100 + 90 + 9 => 199
- MDCCCLXXX => M, DCCC, LXXX => 1000 + 800 + 80 => 1880
-
via the upper two rules, we can derive the production:
romanNum -> thousand hundred ten digit
thousand -> M | MM | MMM | ε
hundred -> smallHundred | C D | D smallHundred | C M
smallHundred -> C | CC | CCC | ε
ten -> smallTen | X L | L smallTen | X C
smallTen -> X | XX | XXX | ε
digit -> smallDigit | I V | V smallDigit | I X
smallDigit -> I | II | III | ε