手写Pascal解释器（一）

一、编写解释器的动机
二、part1
三、part2
四、part3

一、编写解释器的动机

学习了Vue之后，我发现对字符串的处理对于编写一个程序框架来说是非常重要的，就拿Vue来说，我们使用该框架时可以通过如v-on:, v-model等html的属性时，我们能够在里面嵌入js代码，其实这块就已经使用了编译原理的知识来对输入的字符串进行解析，然后将它们嵌入到js代码中去，这也是我们在Vue中可以如此轻松地进行双向绑定，使用v-for进行列表渲染等等的技术基础。此外在做ccf csp的题目时，我也往往被一些字符串处理的题目给卡住，有时候虽然可以做出来，但有种只见树木不见森林之感。所以我希望可以快点先学习到一些编译原理的知识（而且自学往往比上课学习的效率和积极性高得多），为后面的学习打个基础。

至于为什么学习编译原理先学习编写解释器呢？之前我是直接啃“龙书”（《compiler》）来着，可是里面真的许多东西比较晦涩难懂，之后我先去逛下知乎看看大神们是怎么学习编译原理的，看到有位大佬说可以多抄几遍这个解释器项目，很多东西自然就理解了：学习编译原理有什么好的书籍? - 时雨的回答 - 知乎。然后我就点进去那个GitHub项目的链接，发现居然star数高达1.3k：
在这里插入图片描述
于是我决定先通过它来进行学习了。

二、part1

（补上gitee项目地址，欢迎clone项目：https://gitee.com/warrior__night/learn-writing-interpreter）

资料链接：https://ruslanspivak.com/lsbasi-part1/

part1的任务比较简单，它需要输入一个带有两个操作数的字符串，且只支持“+”号，如输入“1+2”，它会输出“3”。

项目的源码是使用python编写的，而我为了印象深刻些（也防止自己不加思考嗯抄），就使用java进行重新编写了。

编写了这样几个类，由于java是半静态语言，故各个Token都使用类来进行封装了（如TK_Interger，即为整形数字，继承自Token类，其他的类似），使用起来比较方便一些。

主要的类为Interpreter类，重点解读一些Interpreter类干了什么事：

类的成员变量：

private final String text;
private int pos;
private Token currentToken;

text是输入的表达式字符串，pos是当前指向哪个位置的字符，currentToken是当前的Token是什么。

构造函数：

public Interpreter(String text){
    this.text = text;
    this.pos = 0;
    this.currentToken = null;
}

接收一个表达式字符串，其他的变量都清零

抛出异常函数：

public void error() throws Exception {
    throw new Exception("Error parsing input");
}

当输入不符合当前规则时抛出“Error parsing input”异常

获取下一个Token的函数getNextToken：

public Token getNextToken() throws Exception {
	// 如果下标到了字符串的尽头，则返回TK_EOF
    if (pos > text.length()-1){
        return new TK_EOF();
    }

    char currentChar = text.charAt(pos);
    // 如果是数字，则返回TK_Interger（即数字Token），由于part1只考虑一个数字的情况，故只需解析一个数字
    if (Character.isDigit(currentChar)){
        Token token = new TK_Integer(Integer.parseInt(currentChar+""));
        // 指针移动到读取完Token的位置
        pos++;
        return token;
    }
	// 解析符号也是相同的过程
    if (currentChar == '+'){
        Token token = new TK_Plus();
        pos++;
        return token;
    }
    // 如果不是数字或者“+”，则抛出异常
    this.error();
    return null;
}

eat函数：

public void eat(Token.TokenType tokenType) throws Exception {
    if (currentToken.type == tokenType){
        currentToken = getNextToken();
    }
    else {
        this.error();
    }
}

判断当前读取到的Token和预想的是不是一样的类型，然后读取下一个Token。

完成整个解析过程的函数：

public int expr() throws Exception {
    currentToken = getNextToken();
	// 第一个数
    Token left = currentToken;
    // 查看第一个Token是否是数字，并读取下一个Token
    eat(Token.TokenType.INTEGER);
	
	// 查看这个Token是否是“+”
    eat(Token.TokenType.PLUS);

	// 第二个数
    Token right = currentToken;
    // 查看第二个Token是否是数字，并读取下一个Token
    eat(Token.TokenType.INTEGER);
	// 最后返回运算结果
    return (Integer)left.value + (Integer)right.value;
}

客户端类Main：

public class Main {
    public static void main(String[] args) throws Exception {
        Scanner scanner = new Scanner(System.in);
        while (true){
            System.out.print("calc> ");
            String text = scanner.nextLine();
            if (text.equals("exit"))
                break;
            Interpreter interpreter = new Interpreter(text); <--使用解释器进行解释
            int res = interpreter.expr();
            System.out.println("res: "+res);
        }
    }
}

运行结果：

运行结果符合预期，part1搞定！

三、part2

资料链接：https://ruslanspivak.com/lsbasi-part2/

part2相较于part1增加了“-”（减法）的支持，然后可以跳过表达式中间的空格。

增加的函数或修改：

成员变量添加currentChar，代表当前指针指向的字符：

...
private Character currentChar;

构造函数添加currentChar的初始化：

public Interpreter(String text){
    ...
    this.currentChar = text.charAt(pos);
}

添加指针向前的函数advance()：

private void advance(){
    pos++;
    if (pos>text.length()-1){
        this.currentChar = null;
    }
    else {
        this.currentChar = text.charAt(pos);
    }
}

添加跳过空格函数：

private void skipWhitespace(){
    while (this.currentChar!=null && this.currentChar==' '){
        this.advance();
    }
}

读取整形数字函数：

private int integer(){
    StringBuilder sb = new StringBuilder();
    while (currentChar!=null && Character.isDigit(currentChar)){
        sb.append(currentChar);
        this.advance();
    }
    return Integer.parseInt(sb.toString());
}

有了这个函数，我们就可以读出多位的数字了。

修改getNextToken()函数：

private Token getNextToken() throws Exception {
    while (currentChar != null){
    	// 如果是空格则直接跳过
        if (this.currentChar == ' '){
            this.skipWhitespace();
            continue;
        }
        // 如果是数字则交给integer函数读取处理
        if (Character.isDigit(currentChar)){
            return new TK_Integer(this.integer());
        }
        // 如果是“+”或者“-”则返回TK_Plus和TK_Minus
        if (currentChar=='+'){
            this.advance();
            return new TK_Plus();
        }
        if (currentChar=='-'){
            this.advance();
            return new TK_Minus();
        }
        // 如果是其他情况则抛出异常
        this.error();
    }
    // 如果为currentChar空则返回TK_EOF
    return new TK_EOF();
}

修改expr函数：

public int expr() throws Exception {
    currentToken = getNextToken();

    Token left = currentToken;
    eat(Token.TokenType.INTEGER);

    Token op = currentToken;
    if (op.type == Token.TokenType.PLUS)
        eat(Token.TokenType.PLUS);
    else
        eat(Token.TokenType.MINUS);

    Token right = currentToken;
    eat(Token.TokenType.INTEGER);

    if (op.type == Token.TokenType.PLUS)
        return (Integer)left.value + (Integer)right.value;
    else
        return (Integer)left.value - (Integer)right.value;
}

和part1的基本相同，这里不再赘述。

运行结果：

四、part3

资料链接：https://ruslanspivak.com/lsbasi-part3/

以下图片为大佬博客翻译的内容：

part3的任务是可以解析多个操作数的加减运算，如“1+2 -3 +4”等。

添加或修改的部分：

添加term()函数：

public int term() throws Exception {
    Token token = currentToken;
    this.eat(Token.TokenType.INTEGER);
    return (Integer) token.value;
}

其实就是eat+返回int的值

修改expr函数：

public int expr() throws Exception {
    currentToken = getNextToken();

	// 获取当前Token的值并移动指针
    int result = this.term();
    while (currentToken.type == Token.TokenType.PLUS || currentToken.type== Token.TokenType.MINUS){
        Token token = currentToken;
        if (token.type== Token.TokenType.PLUS){
        	// 判断并移动
            eat(Token.TokenType.PLUS);
            // 加上下一个数并移动指针
            result+=term();
        }
        else{
        	// 解析同上
            eat(Token.TokenType.MINUS);
            result-=term();
        }
    }
    return result;
}

运行结果：

下一篇：手写Pascal解释器（二）

posted @ 2021-08-04 13:25 CodeReaper 阅读(242) 评论(0) 编辑收藏举报

刷新页面返回顶部

Loading

CodeReaper

手写Pascal解释器（一）

一、编写解释器的动机

二、part1

三、part2

四、part3

公告