今天突然想要记录一下自己的学习过程,一方面是为了以后更好的回顾,另一方面则是在记录的过程中希望看到自己的成长。因为我是根据一篇博客进行学习的,所以接下来可能会使用很多英语。
什么是编译器(compiler)或解释器(interpreter)呢?它们的功能是什么?它们之间又有什么曲别呢?
The goal of an interpreter or a compiler is to translate a source program in some high-level language into some other form taht a computer can recognize.For the purpose of this series,let's agree that if a translator translates a source program into machine language,it's a compiler.If a tanslator processes and executes the source program without translating it into machine language first,we call it intepreter.
We will start our first foray into interpreters and compilers by writing a simple interpreter of arithmetic expressions,as knowns as a calculator.Today 's goal is very simple and it's to make our calculator handle the addition of two single digit integer like 1+1.Here is the source code for our interpreter or calculator:
1 # Token types 2 # 3 # EOF (end-of-file) token is used to indicate that 4 # there is no more input left for lexical analysis 5 INTEGER, PLUS, EOF = 'INTEGER', 'PLUS', 'EOF' 6 7 8 class Token(object): 9 def __init__(self, type, value): 10 # token type: INTEGER, PLUS, or EOF 11 self.type = type 12 # token value: 0, 1, 2. 3, 4, 5, 6, 7, 8, 9, '+', or None 13 self.value = value 14 15 def __str__(self): 16 """String representation of the class instance. 17 18 Examples: 19 Token(INTEGER, 3) 20 Token(PLUS '+') 21 """ 22 return 'Token({type}, {value})'.format( 23 type=self.type, 24 value=repr(self.value) 25 ) 26 27 def __repr__(self): 28 return self.__str__() 29 30 31 class Interpreter(object): 32 def __init__(self, text): 33 # client string input, e.g. "3+5" 34 self.text = text 35 # self.pos is an index into self.text 36 self.pos = 0 37 # current token instance 38 self.current_token = None 39 40 def error(self): 41 raise Exception('Error parsing input') 42 43 def get_next_token(self): 44 """Lexical analyzer (also known as scanner or tokenizer) 45 46 This method is responsible for breaking a sentence 47 apart into tokens. One token at a time. 48 """ 49 text = self.text 50 51 # is self.pos index past the end of the self.text ? 52 # if so, then return EOF token because there is no more 53 # input left to convert into tokens 54 if self.pos > len(text) - 1: 55 return Token(EOF, None) 56 57 # get a character at the position self.pos and decide 58 # what token to create based on the single character 59 current_char = text[self.pos] 60 61 # if the character is a digit then convert it to 62 # integer, create an INTEGER token, increment self.pos 63 # index to point to the next character after the digit, 64 # and return the INTEGER token 65 if current_char.isdigit(): 66 token = Token(INTEGER, int(current_char)) 67 self.pos += 1 68 return token 69 70 if current_char == '+': 71 token = Token(PLUS, current_char) 72 self.pos += 1 73 return token 74 75 self.error() 76 77 def eat(self, token_type): 78 # compare the current token type with the passed token 79 # type and if they match then "eat" the current token 80 # and assign the next token to the self.current_token, 81 # otherwise raise an exception. 82 if self.current_token.type == token_type: 83 self.current_token = self.get_next_token() 84 else: 85 self.error() 86 87 def expr(self): 88 """expr -> INTEGER PLUS INTEGER""" 89 # set current token to the first token taken from the input 90 self.current_token = self.get_next_token() 91 92 # we expect the current token to be a single-digit integer 93 left = self.current_token 94 self.eat(INTEGER) 95 96 # we expect the current token to be a '+' token 97 op = self.current_token 98 self.eat(PLUS) 99 100 # we expect the current token to be a single-digit integer 101 right = self.current_token 102 self.eat(INTEGER) 103 # after the above call the self.current_token is set to 104 # EOF token 105 106 # at this point INTEGER PLUS INTEGER sequence of tokens 107 # has been successfully found and the method can just 108 # return the result of adding two integers, thus 109 # effectively interpreting client input 110 result = left.value + right.value 111 return result 112 113 114 def main(): 115 while True: 116 try: 117 # To run under Python3 replace 'raw_input' call 118 # with 'input' 119 text = raw_input('calc> ') 120 except EOFError: 121 break 122 if not text: 123 continue 124 interpreter = Interpreter(text) 125 result = interpreter.expr() 126 print(result) 127 128 129 if __name__ == '__main__': 130 main()
If we want our code to run successfully,we must follow some principles with our input:
1. Only single digit integers are allowed in the input
2.The only arithmetic operation supported at the moment is addition
3. No whitespace characters are allowed anyevery in the input
When we enter an expression 3+5 on the command line,we will get a string "3+5".In order for the interpreter to understand what to do with that string it first need to break the string into components called tokens.A token is a object that has a type and a value.The prograss of breaking the input string into tokens is called lexical analysis.So the first step our interpreter need to do is read the input of characters convert it into a stream of tokens.The part of interpreter taht does it is called lexical analyzer or lexer for short.
The method get_next_token of the interpreter class is our lexical analyzer.Every time we call it,we get the next token created from the input characters passed to our interpreter.After our interpreter has access to a stream of tokens made from the input characters,the interpreter needs to do something with it:it need to find the structure in the flat stream of tokens it gets from lexer get_next_token.Our interpreter except to find the following structure in that stream:INTEGER->PLUS->INTEGER.
That is,it tries to find a squence of tokens:integer followed by a plus sign followed by an integer.
The method responible for finding and interpreting that structure is expr.This method verifies that the sequence of tokens does indeed corresponed to the expected sequence of tokens,i.e INTEGER -> PLUS -> INTEGER.After it's successfully confirmed the structure,it generates the result by adding the value of the token on the left side of the PLUS and the right side of the PLUS,thus successfully interpreting the arithmmetic expression we passed to the interpreter.
The expr method itself use the helper method eat to verify the token type passed to the eat method matches the current token type.After matching the passed token type the eat method gets the next token and assigns it to current_token variable,thus effectively "eating" the currently matched token and advancing the imageinary pointer in the stream of tokens. If the structure in the stream of tokens doesn't correspond the excepted INTEGER PLUS INTEGER sequence of tokens the eat method throws an exception.
And the next goal we want to accomplish is below :
1 Modify the code to allow multiple-digit integers in the input,for example "12+3"
2 Add a method that skips whitespace characters so that your calculator can handle inputs with whitespace characters like " 12 +3"
3 Modify the code and instead of "+" handle "-" to evaluate subtractions like "7-5"