用 C 语言开发一门编程语言 — Q-表达式
2020-04-10 23:22 云物互联 阅读(378) 评论(0) 编辑 收藏 举报目录
文章目录
前文列表
《用 C 语言开发一门编程语言 — 交互式解析器l》
《用 C 语言开发一门编程语言 — 跨平台的可移植性》
《用 C 语言开发一门编程语言 — 语法解析器》
《用 C 语言开发一门编程语言 — 抽象语法树》
《用 C 语言开发一门编程语言 — 异常处理》
《用 C 语言开发一门编程语言 — S-表达式》
Q-表达式
Q-表达式(Quoted Expression,Q-Expression)跟 S-Expression 一样,也是 Lisp 表达式的一种。但 Q-Expression 不受到 Lisp 的求值机制的作用,这是通过应用 C 语言的宏特性来实现的。宏看起来类似于普通的函数,但不会对参数进行求值。有一个叫做 “引用” 的宏(`)可以用来禁止几乎所有表达式的求值,这个宏也是 Q-Expression 的灵感来源。
也就是说,当受到函数的作用时,Q-Expression 不会被求值,而是保持原样。这一特性让 Q-Expression 有着广泛的应用。Lisp 程序员经常使用 Q-Expression 来存储和管理其他的 Lisp 数据类型,例如:数字、符号或 S-Expression 等等。
读取并存储输入
实现 Q-Expression 语法解析器
Q-Expression 的语法和 S-Expression 非常相似,唯一的不同是 Q-Expression 包裹在大括号 {}
中,而 S-Expression 包裹在小括号 ()
中,Q-Expression 的语法规则如下:
mpc_parser_t* Number = mpc_new("number");
mpc_parser_t* Symbol = mpc_new("symbol");
mpc_parser_t* Sexpr = mpc_new("sexpr");
mpc_parser_t* Qexpr = mpc_new("qexpr");
mpc_parser_t* Expr = mpc_new("expr");
mpc_parser_t* Lispy = mpc_new("lispy");
mpca_lang(MPCA_LANG_DEFAULT,
" \
number : /-?[0-9]+/ ; \
symbol : '+' | '-' | '*' | '/' ; \
sexpr : '(' <expr>* ')' ; \
qexpr : '{' <expr>* '}' ; \
expr : <number> | <symbol> | <sexpr> | <qexpr> ; \
lispy : /^/ <expr>* /$/ ; \
",
Number, Symbol, Sexpr, Qexpr, Expr, Lispy);
mpc_cleanup(6, Number, Symbol, Sexpr, Qexpr, Expr, Lispy);
读取 Q-Expression
由于 Q-Expression 和 S-Expression 的形式基本一致,所以它们内部实现也大致是相同的。我们考虑重用 S-Expression 的数据结构来表示 Q-Expression。
同样的,首先在 lval 枚举类型中添加一个标识 Q-Expression 的类型:
enum { LVAL_ERR, LVAL_NUM, LVAL_SYM, LVAL_SEXPR, LVAL_QEXPR };
另外,还需为其编写一个构造函数:
/* A pointer to a new empty Qexpr lval */
lval* lval_qexpr(void) {
lval* v = malloc(sizeof(lval));
v->type = LVAL_QEXPR;
v->count = 0;
v->cell = NULL;
return v;
}
Q-Expression 的打印和删除逻辑也和 S-Expression 别无二致,我们只需照葫芦画瓢即可:
void lval_print(lval* v) {
switch (v->type) {
case LVAL_NUM: printf("%li", v->num); break;
case LVAL_ERR: printf("Error: %s", v->err); break;
case LVAL_SYM: printf("%s", v->sym); break;
case LVAL_SEXPR: lval_expr_print(v, '(', ')'); break;
case LVAL_QEXPR: lval_expr_print(v, '{', '}'); break;
}
}
void lval_del(lval* v) {
switch (v->type) {
case LVAL_NUM: break;
case LVAL_ERR: free(v->err); break;
case LVAL_SYM: free(v->sym); break;
/* If Qexpr or Sexpr then delete all elements inside */
case LVAL_QEXPR:
case LVAL_SEXPR:
for (int i = 0; i < v->count; i++) {
lval_del(v->cell[i]);
}
/* Also free the memory allocated to contain the pointers */
free(v->cell);
break;
}
free(v);
}
最后,再更新一下读取函数 lval_read,使其可以正确读取 Q-Expression:
if (strstr(t->tag, "qexpr")) { x = lval_qexpr(); }
因为 Q-Expression 重用了所有 S-Expression 的数据类型,所以我们也自然可以重用所有 S-Expression 的函数,例如 lval_add。在 lval_read 中添加一下代码识别花括号::
if (strcmp(t->children[i]->contents, "(") == 0) { continue; }
if (strcmp(t->children[i]->contents, ")") == 0) { continue; }
if (strcmp(t->children[i]->contents, "}") == 0) { continue; }
if (strcmp(t->children[i]->contents, "{") == 0) { continue; }
注意,因为 Q-Expression 没有任何求值方式,所以无需改动任何已有的求值函数。
实现 Q-Expression 的函数
在添加 Q-Expression 之后,我们还需要定义一系列的操作来管理它。类似于数学操作,这些操作定义了 Q-Expression 具体的行为:
- list 函数:接收一个或者多个参数,返回一个包含所有参数的 Q-Expression。
- head 函数:接受一个 Q-Expression,返回一个包含其第一个元素的 Q-Expression。
- tail 函数:接受一个 Q-Expression,返回一个除首元素外的 Q-Expression。
- join 函数:接受一个或者多个 Q-Expression,返回一个将其连在一起的 Q-Expression。
- eval 函数:接受一个 Q-Expression,将其看做一个 S-Expression,并运行。
如同我们前面加的数学运算符一样,这些新的操作符也需要加入到 symbol 语法规则中:
mpca_lang(MPCA_LANG_DEFAULT,
" \
number : /-?[0-9]+/ ; \
symbol : \"list\" | \"head\" | \"tail\" \
| \"join\" | \"eval\" | '+' | '-' | '*' | '/' ; \
sexpr : '(' <expr>* ')' ; \
qexpr : '{' <expr>* '}' ; \
expr : <number> | <symbol> | <sexpr> | <qexpr> ; \
lispy : /^/ <expr>* /$/ ; \
",
Number, Symbol, Sexpr, Qexpr, Expr, Lispy)
Head & Tail
注意,head 和 tail 函数在某些条件下是不能执行的。首先要保证输入的参数只有一个,并且类型为 Q-Expression。其次这个输入的 Q-Expression 不能为空。
- head 函数:接受一个 Q-Expression,返回一个包含其第一个元素的 Q-Expression。可以重复执行 pop 并 delete 在第二个数组元素上,直到数组为空。
- tail 函数:接受一个 Q-Expression,返回一个除首元素外的 Q-Expression。只需要 pop 并 delete 第一个数组元素,剩余元素组成的数组则为我们所需要的。
lval* builtin_head(lval* a) {
/* Check Error Conditions */
if (a->count != 1) {
lval_del(a);
return lval_err("Function 'head' passed too many arguments!");
}
if (a->cell[0]->type != LVAL_QEXPR) {
lval_del(a);
return lval_err("Function 'head' passed incorrect types!");
}
if (a->cell[0]->count == 0) {
lval_del(a);
return lval_err("Function 'head' passed {}!");
}
/* Otherwise take first argument */
lval* v = lval_take(a, 0);
/* Delete all elements that are not head and return */
while (v->count > 1) { lval_del(lval_pop(v, 1)); }
return v;
}
lval* builtin_tail(lval* a) {
/* Check Error Conditions */
if (a->count != 1) {
lval_del(a);
return lval_err("Function 'tail' passed too many arguments!");
}
if (a->cell[0]->type != LVAL_QEXPR) {
lval_del(a);
return lval_err("Function 'tail' passed incorrect types!");
}
if (a->cell[0]->count == 0) {
lval_del(a);
return lval_err("Function 'tail' passed {}!");
}
/* Take first argument */
lval* v = lval_take(a, 0);
/* Delete first element and return */
lval_del(lval_pop(v, 0));
return v;
}
使用 C 语言的宏特性对上述代码进行优化
虽然上述实现的 head 和 tail 函数能够实现我们所需要的功能,但是代码难懂且长。有大段的代码是进行错误处理,使得真正逻辑的实现部分不那么明显。要解决这个问题,我们可以使用 C 语言的宏。
这里我们定义一个宏名为 LASSERT 的宏来帮助处理异常。注意,通常宏名都是全大写,这样能够和 C 函数名区分开来。LASSERT 宏有三个参数:args,cond 和 err。宏名定义如下:
#define LASSERT(args, cond, err) \
if (!(cond)) { lval_del(args); return lval_err(err); }
如此的,我们就可以通过定义这三个参数来生成代码了。
对 head 和 tail 函数进行优化:
lval* builtin_head(lval* a) {
LASSERT(a, a->count == 1,
"Function 'head' passed too many arguments!");
LASSERT(a, a->cell[0]->type == LVAL_QEXPR,
"Function 'head' passed incorrect type!");
LASSERT(a, a->cell[0]->count != 0,
"Function 'head' passed {}!");
lval* v = lval_take(a, 0);
while (v->count > 1) { lval_del(lval_pop(v, 1)); }
return v;
}
lval* builtin_tail(lval* a) {
LASSERT(a, a->count == 1,
"Function 'tail' passed too many arguments!");
LASSERT(a, a->cell[0]->type == LVAL_QEXPR,
"Function 'tail' passed incorrect type!");
LASSERT(a, a->cell[0]->count != 0,
"Function 'tail' passed {}!");
lval* v = lval_take(a, 0);
lval_del(lval_pop(v, 0));
return v;
}
List & Eval
- list 函数比较简单。它只需将输入的一个或多个 S-Expression 转化为一个 Q-Expression。
- eval 函数更像是转化。它将一个 Q-Expression 转化为 S-Expression,然后使用 lval_eval 运行。
lval* builtin_list(lval* a) {
a->type = LVAL_QEXPR;
return a;
}
lval* builtin_eval(lval* a) {
LASSERT(a, a->count == 1,
"Function 'eval' passed too many arguments!");
LASSERT(a, a->cell[0]->type == LVAL_QEXPR,
"Function 'eval' passed incorrect type!");
lval* x = lval_take(a, 0);
x->type = LVAL_SEXPR;
return lval_eval(x);
}
Join
join 函数需要多个参数,其结构看起来更像先前定义的 builtin_op 函数。首先确保所有的参数都是 Q-Expression,然后将它们拼接起来。我们定义了 lval_join 函数,它将 y 中元素依次弹出并添加进 x 中,然后将 y 删除,返回 x。
lval* builtin_join(lval* a) {
for (int i = 0; i < a->count; i++) {
LASSERT(a, a->cell[i]->type == LVAL_QEXPR,
"Function 'join' passed incorrect type.");
}
lval* x = lval_pop(a, 0);
while (a->count) {
x = lval_join(x, lval_pop(a, 0));
}
lval_del(a);
return x;
}
lval* lval_join(lval* x, lval* y) {
/* For each cell in 'y' add it to 'x' */
while (y->count) {
x = lval_add(x, lval_pop(y, 0));
}
/* Delete the empty 'y' and return 'x' */
lval_del(y);
return x;
}
函数索引
最后,还需要一个函数,根据提供的 Symbol 来调用相应的方法。这里我们可以用 strcmp 和 strstr 函数来实现。
lval* builtin(lval* a, char* func) {
if (strcmp("list", func) == 0) { return builtin_list(a); }
if (strcmp("head", func) == 0) { return builtin_head(a); }
if (strcmp("tail", func) == 0) { return builtin_tail(a); }
if (strcmp("join", func) == 0) { return builtin_join(a); }
if (strcmp("eval", func) == 0) { return builtin_eval(a); }
if (strstr("+-/*", func)) { return builtin_op(a, func); }
lval_del(a);
return lval_err("Unknown Function!");
}
同时修改早先 lval_eval_sexpr 函数来调用新的 buildin:
/* Call builtin with operator */
lval* result = builtin(v, f->sym);
lval_del(f);
return result;
源代码
#include <stdio.h>
#include <stdlib.h>
#include "mpc.h"
#define LASSERT(args, cond, err) \
if (!(cond)) { lval_del(args); return lval_err(err); }
#ifdef _WIN32
#include <string.h>
static char buffer[2048];
char *readline(char *prompt) {
fputs(prompt, stdout);
fgets(buffer, 2048, stdin);
char *cpy = malloc(strlen(buffer) + 1);
strcpy(cpy, buffer);
cpy[strlen(cpy) - 1] = '\0';
return cpy;
}
void add_history(char *unused) {}
#else
#ifdef __linux__
#include <readline/readline.h>
#include <readline/history.h>
#endif
#ifdef __MACH__
#include <readline/readline.h>
#endif
#endif
/* Create Enumeration of Possible lval Types */
enum {
LVAL_NUM,
LVAL_ERR,
LVAL_SYM,
LVAL_SEXPR,
LVAL_QEXPR
};
/* Declare New lval Struct */
typedef struct lval {
int type;
long num;
/* Count and Pointer to a list of "lval*" */
struct lval** cell;
int count;
/* Error and Symbol types have some string data */
char *err;
char *sym;
} lval;
/* Construct a pointer to a new Number lval */
lval *lval_num(long x) {
lval *v = malloc(sizeof(lval));
v->type = LVAL_NUM;
v->num = x;
return v;
}
/* Construct a pointer to a new Error lval */
lval *lval_err(char *msg) {
lval *v = malloc(sizeof(lval));
v->type = LVAL_ERR;
v->err = malloc(strlen(msg) + 1);
strcpy(v->err, msg);
return v;
}
/* Construct a pointer to a new Symbol lval */
lval *lval_sym(char *sym) {
lval *v = malloc(sizeof(lval));
v->type = LVAL_SYM;
v->sym = malloc(strlen(sym) + 1);
strcpy(v->sym, sym);
return v;
}
/* A pointer to a new empty Sexpr lval */
lval *lval_sexpr(void) {
lval *v = malloc(sizeof(lval));
v->type = LVAL_SEXPR;
v->count = 0;
v->cell = NULL;
return v;
}
/* A pointer to a new empty Qexpr lval */
lval *lval_qexpr(void) {
lval *v = malloc(sizeof(lval));
v->type = LVAL_QEXPR;
v->count = 0;
v->cell = NULL;
return v;
}
void lval_del(lval *v) {
switch (v->type) {
/* Do nothing special for number type */
case LVAL_NUM:
break;
/* For Err or Sym free the string data */
case LVAL_ERR:
free(v->err);
break;
case LVAL_SYM:
free(v->sym);
break;
/* If Qexpr or Sexpr then delete all elements inside */
case LVAL_QEXPR:
case LVAL_SEXPR:
for (int i = 0; i < v->count; i++) {
lval_del(v->cell[i]);
}
/* Also free the memory allocated to contain the pointers */
free(v->cell);
break;
}
/* Free the memory allocated for the "lval" struct itself */
free(v);
}
lval *lval_add(lval *v, lval *x) {
v->count++;
v->cell = realloc(v->cell, sizeof(lval*) * v->count);
v->cell[v->count-1] = x;
return v;
}
lval *lval_read_num(mpc_ast_t *t) {
errno = 0;
long x = strtol(t->contents, NULL, 10);
return errno != ERANGE
? lval_num(x)
: lval_err("invalid number");
}
lval *lval_read(mpc_ast_t *t) {
/* If Symbol or Number return conversion to that type */
if (strstr(t->tag, "number")) {
return lval_read_num(t);
}
if (strstr(t->tag, "symbol")) {
return lval_sym(t->contents);
}
/* If root (>) or sexpr then create empty list */
lval *x = NULL;
if (strcmp(t->tag, ">") == 0) {
x = lval_sexpr();
}
if (strstr(t->tag, "sexpr")) {
x = lval_sexpr();
}
if (strstr(t->tag, "qexpr")) {
x = lval_qexpr();
}
/* Fill this list with any valid expression contained within */
for (int i = 0; i < t->children_num; i++) {
if (strcmp(t->children[i]->contents, "(") == 0) { continue; }
if (strcmp(t->children[i]->contents, ")") == 0) { continue; }
if (strcmp(t->children[i]->contents, "}") == 0) { continue; }
if (strcmp(t->children[i]->contents, "{") == 0) { continue; }
if (strcmp(t->children[i]->tag, "regex") == 0) { continue; }
x = lval_add(x, lval_read(t->children[i]));
}
return x;
}
void lval_print(lval *v);
void lval_expr_print(lval *v, char open, char close) {
putchar(open);
for (int i = 0; i < v->count; i++) {
/* Print Value contained within */
lval_print(v->cell[i]);
/* Don't print trailing space if last element */
if (i != (v->count-1)) {
putchar(' ');
}
}
putchar(close);
}
/* Print an "lval*" */
void lval_print(lval *v) {
switch (v->type) {
case LVAL_NUM: printf("%li", v->num); break;
case LVAL_ERR: printf("Error: %s", v->err); break;
case LVAL_SYM: printf("%s", v->sym); break;
case LVAL_SEXPR: lval_expr_print(v, '(', ')'); break;
case LVAL_QEXPR: lval_expr_print(v, '{', '}'); break;
}
}
/* Print an "lval" followed by a newline */
void lval_println(lval *v) {
lval_print(v);
putchar('\n');
}
lval *lval_pop(lval *v, int i) {
/* Find the item at "i" */
lval *x = v->cell[i];
/* Shift memory after the item at "i" over the top */
memmove(&v->cell[i], &v->cell[i+1],
sizeof(lval*) * (v->count-i-1));
/* Decrease the count of items in the list */
v->count--;
/* Reallocate the memory used */
v->cell = realloc(v->cell, sizeof(lval*) * v->count);
return x;
}
lval *lval_take(lval *v, int i) {
lval *x = lval_pop(v, i);
lval_del(v);
return x;
}
lval *builtin_op(lval *a, char *op) {
/* Ensure all arguments are numbers */
for (int i = 0; i < a->count; i++) {
if (a->cell[i]->type != LVAL_NUM) {
lval_del(a);
return lval_err("Cannot operate on non-number!");
}
}
/* Pop the first element */
lval *x = lval_pop(a, 0);
/* If no arguments and sub then perform unary negation */
if ((strcmp(op, "-") == 0) && a->count == 0) {
x->num = -x->num;
}
/* While there are still elements remaining */
while (a->count > 0) {
/* Pop the next element */
lval *y = lval_pop(a, 0);
if (strcmp(op, "+") == 0) { x->num += y->num; }
if (strcmp(op, "-") == 0) { x->num -= y->num; }
if (strcmp(op, "*") == 0) { x->num *= y->num; }
if (strcmp(op, "/") == 0) {
if (y->num == 0) {
lval_del(x);
lval_del(y);
x = lval_err("Division By Zero!");
break;
}
x->num /= y->num;
}
lval_del(y);
}
lval_del(a);
return x;
}
lval *lval_eval(lval *v);
lval *builtin(lval* a, char* func);
lval *lval_eval_sexpr(lval *v) {
/* Evaluate Children */
for (int i = 0; i < v->count; i++) {
v->cell[i] = lval_eval(v->cell[i]);
}
/* Error Checking */
for (int i = 0; i < v->count; i++) {
if (v->cell[i]->type == LVAL_ERR) {
return lval_take(v, i);
}
}
/* Empty Expression */
if (v->count == 0) { return v; }
/* Single Expression */
if (v->count == 1) { return lval_take(v, 0); }
/* Ensure First Element is Symbol */
lval *f = lval_pop(v, 0);
if (f->type != LVAL_SYM) {
lval_del(f);
lval_del(v);
return lval_err("S-expression Does not start with symbol!");
}
/* Call builtin with operator */
lval *result = builtin(v, f->sym);
lval_del(f);
return result;
}
lval *lval_eval(lval *v) {
/* Evaluate Sexpressions */
if (v->type == LVAL_SEXPR) {
return lval_eval_sexpr(v);
}
/* All other lval types remain the same */
return v;
}
lval *builtin_head(lval *a) {
LASSERT(a, a->count == 1,
"Function 'head' passed too many arguments!");
LASSERT(a, a->cell[0]->type == LVAL_QEXPR,
"Function 'head' passed incorrect type!");
LASSERT(a, a->cell[0]->count != 0,
"Function 'head' passed {}!");
/* Otherwise take first argument */
lval *v = lval_take(a, 0);
/* Delete all elements that are not head and return */
while (v->count > 1) {
lval_del(lval_pop(v, 1));
}
return v;
}
lval *builtin_tail(lval *a) {
LASSERT(a, a->count == 1,
"Function 'tail' passed too many arguments!");
LASSERT(a, a->cell[0]->type == LVAL_QEXPR,
"Function 'tail' passed incorrect type!");
LASSERT(a, a->cell[0]->count != 0,
"Function 'tail' passed {}!");
/* Take first argument */
lval *v = lval_take(a, 0);
/* Delete first element and return */
lval_del(lval_pop(v, 0));
return v;
}
lval *builtin_list(lval *a) {
a->type = LVAL_QEXPR;
return a;
}
lval *builtin_eval(lval *a) {
LASSERT(a, a->count == 1,
"Function 'eval' passed too many arguments!");
LASSERT(a, a->cell[0]->type == LVAL_QEXPR,
"Function 'eval' passed incorrect type!");
lval *x = lval_take(a, 0);
x->type = LVAL_SEXPR;
return lval_eval(x);
}
lval *lval_join(lval *x, lval *y) {
/* For each cell in 'y' add it to 'x' */
while (y->count) {
x = lval_add(x, lval_pop(y, 0));
}
/* Delete the empty 'y' and return 'x' */
lval_del(y);
return x;
}
lval *builtin_join(lval *a) {
for (int i = 0; i < a->count; i++) {
LASSERT(a, a->cell[i]->type == LVAL_QEXPR,
"Function 'join' passed incorrect type.");
}
lval *x = lval_pop(a, 0);
while (a->count) {
x = lval_join(x, lval_pop(a, 0));
}
lval_del(a);
return x;
}
lval *builtin(lval* a, char* func) {
if (strcmp("list", func) == 0) { return builtin_list(a); }
if (strcmp("head", func) == 0) { return builtin_head(a); }
if (strcmp("tail", func) == 0) { return builtin_tail(a); }
if (strcmp("join", func) == 0) { return builtin_join(a); }
if (strcmp("eval", func) == 0) { return builtin_eval(a); }
if (strstr("+-/*", func)) { return builtin_op(a, func); }
lval_del(a);
return lval_err("Unknown Function!");
}
int main(int argc, char *argv[]) {
/* Create Some Parsers */
mpc_parser_t *Number = mpc_new("number");
mpc_parser_t* Symbol = mpc_new("symbol");
mpc_parser_t* Sexpr = mpc_new("sexpr");
mpc_parser_t *Qexpr = mpc_new("qexpr");
mpc_parser_t *Expr = mpc_new("expr");
mpc_parser_t *Lispy = mpc_new("lispy");
/* Define them with the following Language */
mpca_lang(MPCA_LANG_DEFAULT,
" \
number : /-?[0-9]+/ ; \
symbol : \"list\" | \"head\" | \"tail\" \
| \"join\" | \"eval\" \
| '+' | '-' | '*' | '/' ; \
sexpr : '(' <expr>* ')' ; \
qexpr : '{' <expr>* '}' ; \
expr : <number> | <symbol> | <sexpr> | <qexpr> ; \
lispy : /^/ <expr>* /$/ ; \
",
Number, Symbol, Sexpr, Qexpr, Expr, Lispy);
puts("Lispy Version 0.1");
puts("Press Ctrl+c to Exit\n");
while(1) {
char *input = NULL;
input = readline("lispy> ");
add_history(input);
/* Attempt to parse the user input */
mpc_result_t r;
if (mpc_parse("<stdin>", input, Lispy, &r)) {
/* On success print and delete the AST */
lval *x = lval_eval(lval_read(r.output));
lval_println(x);
lval_del(x);
mpc_ast_delete(r.output);
} else {
/* Otherwise print and delete the Error */
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
free(input);
}
/* Undefine and delete our parsers */
mpc_cleanup(6, Number, Symbol, Sexpr, Qexpr, Expr, Lispy);
return 0;
}
编译
gcc -g -std=c99 -Wall parsing.c mpc.c -lreadline -lm -o parsing
运行:
$ ./parsing
Lispy Version 0.1
Press Ctrl+c to Exit
lispy> list 1 2 3 4
{1 2 3 4}
lispy> {head (list 1 2 3 4)}
{head (list 1 2 3 4)}
lispy> eval {head (list 1 2 3 4)}
{1}
lispy> tail {tail tail tail}
{tail tail}
lispy> eval (tail {tail tail {5 6 7}})
{6 7}
lispy> eval (head {(+ 1 2) (+ 10 20)})
3
lispy> ^C