Design Pattern----18.Behavioral.Interpreter.Pattern (Delphi Sample)
Intent
- Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in the language.
- Map a domain to a language, the language to a grammar, and the grammar to a hierarchical object-oriented design.
Problem
A class of problems occurs repeatedly in a well-defined and well-understood domain. If the domain were characterized with a “language”, then problems could be easily solved with an interpretation “engine”.
Discussion
The Interpreter pattern discusses: defining a domain language (i.e. problem characterization) as a simple language grammar, representing domain rules as language sentences, and interpreting these sentences to solve the problem. The pattern uses a class to represent each grammar rule. And since grammars are usually hierarchical in structure, an inheritance hierarchy of rule classes maps nicely.
An abstract base class specifies the method interpret()
. Each concrete subclass implements interpret()
by accepting (as an argument) the current state of the language stream, and adding its contribution to the problem solving process.
Structure
Interpreter suggests modeling the domain with a recursive grammar. Each rule in the grammar is either a ‘composite’ (a rule that references other rules) or a terminal (a leaf node in a tree structure). Interpreter relies on the recursive traversal of the Composite pattern to interpret the ‘sentences’ it is asked to process.
Example
The Intepreter pattern defines a grammatical representation for a language and an interpreter to interpret the grammar. Musicians are examples of Interpreters. The pitch of a sound and its duration can be represented in musical notation on a staff. This notation provides the language of music. Musicians playing the music from the score are able to reproduce the original pitch and duration of each sound represented.
Check list
- Decide if a “little language” offers a justifiable return on investment.
- Define a grammar for the language.
- Map each production in the grammar to a class.
- Organize the suite of classes into the structure of the Composite pattern.
- Define an
interpret(Context)
method in the Composite hierarchy. - The
Context
object encapsulates the current state of the input and output as the former is parsed and the latter is accumulated. It is manipulated by each grammar class as the “interpreting” process transforms the input into the output.
Rules of thumb
- Considered in its most general form (i.e. an operation distributed over a class hierarchy based on the Composite pattern), nearly every use of the Composite pattern will also contain the Interpreter pattern. But the Interpreter pattern should be reserved for those cases in which you want to think of this class hierarchy as defining a language.
- Interpreter can use State to define parsing contexts.
- The abstract syntax tree of Interpreter is a Composite (therefore Iterator and Visitor are also applicable).
- Terminal symbols within Interpreter’s abstract syntax tree can be shared with Flyweight.
- The pattern doesn’t address parsing. When the grammar is very complex, other techniques (such as a parser) are more appropriate.
Interpreter in Delphi
This session consists of the development of a small application to read and pretty-print XML and CSV files. Along the way, we explain and demonstrate the use of the following patterns: State, Interpreter, Visitor, Strategy, Command, Memento, and Facade.
Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in the grammar
The term “interpret” here is pretty broad. In a BASIC interpreter, it would mean to go through executing instructions in some runtime environment. However, we can use an interpreter for other things that require an understanding of the structure of a language.
This pattern works best if that structure is not too complex. Because we will define a class for every element of the grammar, the class hierarchy can get very large (usually very shallow and very wide). It can be a quite inefficient way to represent and work with the data. In my opinion, these are also the conditions under which recursive descent compilers are appropriate, so they can be a good match. But you would not write a Delphi compiler this way. The Dragon Book discusses more efficient methods.
However, for us, with our small grammar, the Interpreter pattern is fine.
Grammar
We will not be able to parse all XML documents. In particular, we will ignore DTDs, attributes, the structure of the contents of a prolog, escaped characters (e.g. <
) and empty element tags (e.g. <NothingHere/>
). We will be able to cope with empty files, though.
I’ve used a variant of Backus Naur Form (BNF) to define the grammar. Here an empty string is denoted by ε, zero or more occurrences by * (called Kleene closure, in case you’re interested), one or more by + (positive closure), and zero or one by 0..1 (no known aliases).
XmlDoc -> Prolog<sup>0..1</sup> TagList<sup>0..1</sup> Prolog -> <?xml PrologData?> TagList -> Node* Node -> StartTag [Data | TagList] EndTag StartTag -> <TagName> EndTag -> </TagName> PrologData -> [Any printable characters except <,>,/ and ? ]* Data -> [Any printable characters except <,> and / ]* TagName -> [Any printable characters except <,>,/,space ]
An example of the sort of file that we will be able to interpret is:
<?xml version="1.0"?> <List> <SomeStuff>Stuff 1 is here</SomeStuff> <SomeStuff>Stuff 2 is here</SomeStuff> <SomeStuff>Stuff 3 is here</SomeStuff> <SomeStuff>Stuff 4 is here</SomeStuff> </List>
Implementation
The Interpreter pattern normally requires a client to build the syntax tree. In our case, this will be the XML parser in XmlParser.pas. Time doesn’t permit a detailed description of this, but briefly, we have tokens which will be single characters, or the end of document marker. The lexical analyser class extracts these, and passes them to the XML parser. Like all recursive descent parsers, this has a procedure declared for each element of the grammar. These procedures check for appropriate tokens and report errors if necessary, and call procedures corresponding to other grammatical structures when they should appear in the source text. This parser adds nodes to the syntax tree as necessary.
The syntax tree is the essence of the Interpreter pattern. Astute readers will notice that it is a special case of the Composite pattern. The basis of the tree is the abstract expression class, which defines an abstract method to perform the Interpret operation. In our case this will be a search and replace (I told you the definition could be pretty broad). We will allow the operation just in data or in both tags and data. This is where the requirement to understand the structure of the document comes in.
We then go through our grammar defining subclasses, for each grammar element. There are two types of classes, although I don’t see the point of defining them in code, as there is not normally any difference between them that can be inherited. The first type is for terminal expressions, which are grammar elements that cannot be reduced further. In our grammar these are the PrologData
, Data
and TagName
elements. These classes in fact turned out to be so trivial that I ended up refactoring them out, and they are now just string properties of the relevant non-terminal expression classes.
There is one class for each of the other grammar elements. Besides implementing the SearchAndReplace
method, these classes have as properties instances of other expression classes from which they are constructed. The declarations of the interpreter classes are as follows (ignore the Accept routine for now).
1: // Forward declaration of base visitor class
2: TXmlInterpreterVisitor = class;3:4: // Abstract base expression class
5: TXmlExpression = class(TObject)6: private7: protected8: function DoSearchAndReplace(const TargetStr,SearchStr,ReplaceStr : string) : string;9: public10: // Declaring these methods abstract forces descendant classes to implement them
11: procedure SearchAndReplace(const SearchStr,ReplaceStr : string;12: DoTags : Boolean = False); virtual; abstract;
13: procedure Accept(Visitor : TxmlInterpreterVisitor); virtual; abstract;14: end;
15:16: TXmlStartTag = class(TXmlExpression)17: private18: FTagName : string;
19: protected20: public21: procedure SearchAndReplace(const SearchStr,ReplaceStr : string;22: DoTags : Boolean = False); override;23: procedure Accept(Visitor : TxmlInterpreterVisitor); override;
24: property TagName : string read FTagName write FTagName;
25: end;
26:27: TXmlEndTag = class(TXmlExpression)28: private29: FTagName : string;
30: protected31: public32: procedure SearchAndReplace(const SearchStr, ReplaceStr : string;33: DoTags : Boolean = False); override;34: procedure Accept(Visitor : TxmlInterpreterVisitor); override;
35: property TagName : string read FTagName write FTagName;
36: end;
37:38: TXmlTagList = class;39:40: TXmlNode = class(TXmlExpression)41: private42: FStartTag : TXmlStartTag;43: FData : string;
44: FTagList : TXmlTagList;45: FEndTag : TXmlEndTag;46: public47: destructor Destroy; override;
48: procedure SearchAndReplace(const SearchStr, ReplaceStr : string;49: DoTags : Boolean = False); override;50: procedure Accept(Visitor : TxmlInterpreterVisitor); override;
51: property StartTag : TXmlStartTag read FStartTag write FStartTag;52: property EndTag : TXmlEndTag read FEndTag write FEndTag;53: property Data : string read FData write FData;
54: property TagList : TXmlTagList read FTagList write FTagList;55: end;
56:57: TXmlTagList = class(TXmlExpression)58: private59: FList : TObjectList;60: function GetItem(Index : Integer) : TXmlNode;61: protected62: public63: constructor Create;
64: destructor Destroy; override;
65: function Add : TXmlNode;
66: procedure SearchAndReplace(const SearchStr,ReplaceStr : string;67: DoTags : Boolean = False); override;68: procedure Accept(Visitor : TxmlInterpreterVisitor); override;
69: property Items[Index : Integer] : TXmlNode read GetItem; default;
70: end;
71:72: TXmlProlog = class(TXmlExpression)73: private74: FData : string;
75: protected76: public77: procedure SearchAndReplace(const SearchStr,ReplaceStr : string;78: DoTags : Boolean = False); override;79: procedure Accept(Visitor : TxmlInterpreterVisitor); override;
80: property Data : string read FData write FData;
81: end;
82:83: TXmlDoc = class(TXmlExpression)84: private85: FProlog : TXmlProlog;86: FTagList : TXmlTagList;87: protected88: public89: destructor Destroy; override;
90: procedure Clear;
91: procedure SearchAndReplace(const SearchStr,ReplaceStr : string;92: DoTags : Boolean = False); override;93: procedure Accept(Visitor : TxmlInterpreterVisitor); override;
94: property Prolog : TXmlProlog read FProlog write FProlog;95: property TagList : TXmlTagList read FTagList write FTagList;96: end;
97:98: // Equates to Client in the Interpreter pattern
99: TXmlInterpreter = class(TObject)100: private101: FXmlDoc : TXmlDoc;102: protected103: public104: constructor Create;
105: destructor Destroy; override;
106: property XmlDoc : TXmlDoc read FXmlDoc write FXmlDoc;107: end;
108:109: EXmlInterpreterError = class(Exception);
Note how the class definitions follow the grammar. The only variation is TXmlTagList
which includes a function to add new nodes to the list. Oh, and TXmlDoc
has a method to allow us to clear the syntax tree. Note that any lists we define are of type TObjectList
, and they are constructed such that they will free the items in the list when they themselves are destroyed.
If we have a look at a couple of examples of the SearchAndReplace
method, we will see the power of this pattern. Here is the version in TXmlDoc
:
1: procedure TXmlDoc.SearchAndReplace(const SearchStr, ReplaceStr : string;2: DoTags : Boolean);3: begin
4: if Assigned(Prolog) then begin5: Prolog.SearchAndReplace(SearchStr,ReplaceStr,DoTags);6: end;
7:8: if Assigned(TagList) then begin9: TagList.SearchAndReplace(SearchStr,ReplaceStr,DoTags);10: end;
11: end;
12:
All we do is call the same method on the elements that make up this expression, that is, the Prolog
and TagList
properties. In this case, since they are optional, we check if they’re assigned first. This is the case in the other classes whenever there is a non-terminal expression. Whenever there is a terminal expression, we can actually perform the operation. For instance, here is the end tag method:
1: procedure TXmlEndTag.SearchAndReplace(const SearchStr, ReplaceStr : string;2: DoTags : Boolean);3: begin
4: if not DoTags then begin5: Exit;6: end;
7: TagName := DoSearchAndReplace(TagName,SearchStr,ReplaceStr);8: end;
9:
The DoSearchAndReplace
method is declared in the base class, as it is used in several places.
There is one last participant that you may sometimes need, which is the context. This holds any global information that the interpreter needs. We haven’t got any, so there isn’t one in the example. If you do have one, it is normally passed as a parameter to the interpret operation methods.
And that’s it. Just define classes for each expression in the grammar, which have properties corresponding to the sub-expressions. To implement an interpret operation define an abstract method in the base expression class. This forces descendant classes to implement it. In each implementation, call the method on all the sub-expression properties. For more complex operations than our example, there may have to be work done in some of the methods as well. And although we didn’t need it, sometimes it’s useful to have a reference to the parent node.
You can see that extending the grammar is pretty easy; you just need new classes, each of which is very similar, so the implementation is quite easy. At least, this is true until the number of classes gets too high – you would have hundreds in a Delphi interpreter, for instance. There are a number of issues about how to handle child operations, too, that Interpreter has in common with Composite, but we won’t go into them here.
Also, if you need to add several ways to interpret the syntax tree, then you will find that the code is practically identical for each. In that case, it’s time to use a different pattern.