lua5.3 闭包实现

上值

在介绍闭包前，先要介绍下什么是上值，上值就是外部函数的局部变量。注意，是针对局部变量而已的。

local function f1()
    local x = 1
    local function f2()
        print(x)
    end

    f2()
end

f1()

比如上面的代码，在 f1 中定义了一个局部变量 x，在 f2 函数中访问了外部函数的局部变量 x，那么我们称 x 是 f2 的上值。

闭包

在 lua 中，函数其实是由一个闭包组成的，那什么是闭包呢？闭包是由函数原型的引用和该函数所需访问的外部局部变量 upvalue 组成。在 lua 语言中，又分 c 函数闭包和 lua 函数闭包两种。

typedef union Closure {
  CClosure c; // c 函数闭包
  LClosure l; // lua函数闭包
} Closure;

c 函数闭包

对于 c 函数闭包，结构定义如下：

/*
** Type for C functions registered with Lua
*/
typedef int (*lua_CFunction) (lua_State *L);


#define ClosureHeader \
	CommonHeader; lu_byte nupvalues; GCObject *gclist

typedef struct CClosure {
  ClosureHeader;
  lua_CFunction f;
  TValue upvalue[1];  /* list of upvalues */
} CClosure;

c 闭包比较简单，由一个闭包公共头部 ClosureHeader，和一个指向 c 函数的指针 f，以及一个 upvalue 数组构成，数组大小为 nupvalues。比较有意思的是，结构体中的上值字段 upvalue[1] ，表示至少要有一个上值，这样吻合了函数闭包的定义。nupvalues 大于1，多出来的上值部分，会紧跟在结构体的后面。

CClosure | TValue[0] | TValue[1] | ... | TValue[nupvalues-1]

int f1(lua_State *L) {
    int i = lua_tointeger(L, lua_upvalueindex(1));
    const char *str = lua_tostring(L, lua_upvalueindex(2));
    printf("%d %s\n", i, str);
    return 0;
}


int main(int argc, char const *argv[]) {
    lua_State *L = luaL_newstate();
    
    // 创建一个c闭包函数，两个上值，1 和 hello
    lua_pushinteger(L, 1);
    lua_pushstring(L, "hello");
    lua_pushcclosure(L, f1, 2);

    lua_call(L, 0, 0);

    lua_close(L);
    return 0;
}

/**
输出结果：
1 hello
**/

以前，我们是要通过 lua_call 来指定参数个数，传递数据给 f1。现在我们换种实现，利用闭包来存储数据，同样达到在 f1 获取数据的效果。从上图，我们知道，c 函数闭包，可以通过上值的引用，方便传参，但主要的作用，是利用上值来引用一些上下文相关的对象，方便我们拿到它们。

lua 函数闭包

lua 函数闭包的上值结构，函数原型，与 c 函数闭包不同。先看看 lua 闭包结构：

typedef struct LClosure {
  ClosureHeader;
  // 函数原型
  struct Proto *p;
  // 上值数组
  UpVal *upvals[1];  /* list of upvalues */
} LClosure;

/*
** Function Prototypes
*/
typedef struct Proto {
  CommonHeader;
  // 固定参数的数量
  lu_byte numparams;  /* number of fixed parameters */
  // 是否有可变参数
  lu_byte is_vararg;
  // 该函数需要的栈大小
  lu_byte maxstacksize;  /* number of registers needed by this function */
  // upvalues数量
  int sizeupvalues;  /* size of 'upvalues' */
  // 常量数量
  int sizek;  /* size of 'k' */
  // 指令数量
  int sizecode;
  // 行信息数量
  int sizelineinfo;
  // 内嵌原型数量
  int sizep;  /* size of 'p' */
  // 本地变量的数量
  int sizelocvars;
  // 函数进入的行
  int linedefined;  /* debug information  */
  // 函数返回的行
  int lastlinedefined;  /* debug information  */
  // 常量数量
  TValue *k;  /* constants used by the function */
  // 指令数组
  Instruction *code;  /* opcodes */
  // 内嵌函数原型
  struct Proto **p;  /* functions defined inside the function */
  // 行信息
  int *lineinfo;  /* map from opcodes to source lines (debug information) */
  // 本地变量信息
  LocVar *locvars;  /* information about local variables (debug information) */
  // Upvalue信息
  Upvaldesc *upvalues;  /* upvalue information */
  // 使用该原型创建的最后闭包(缓存)
  struct LClosure *cache;  /* last-created closure with this prototype */
  // 源代码文件
  TString  *source;  /* used for debug information */
  // 灰对象列表，最后由g->gray串连起来
  GCObject *gclist;
} Proto;


/*
** Upvalues for Lua closures
*/
struct UpVal {
  TValue *v;  /* points to stack or to its own value */
  lu_mem refcount;  /* reference counter */
  union {
    struct {  /* (when open) */
      UpVal *next;  /* linked list */
      int touched;  /* mark to avoid cycles with dead threads */
    } open;
    TValue value;  /* the value (when closed) */
  } u;
};

Proto 函数原型存储了局部变量名信息 locvals，常量表 k，上值数组 upvalues，函数定义在哪个文件 source，以及函数定义的起始行号，结束行号，局部变量个数等信息。

UpVal 则是包含了一个指向外部局部变量的指针 v（如果函数返回了，则指向 u.value），一个上值可能被多个 lua 闭包引用，所以需要一个引用计数器，当被引用时+1，但闭包被 gc 时，引用 -1，减到 0 时，Upval 销毁。

先看一段 lua 闭包代码：

local function f1()
    local x = 1
    local function f2()
        print(x)
    end

    f2()
    return f2
end

local call = f1()
call()

在第 11 行，f1() 调用完后，x 就超出作用域了，它本来在栈上，函数返回后它也会从栈中删除掉，但是 f1 返回的函数 f2 还引用着这个 x 变量，这该怎么办呢？

在闭包函数内，如果要访问某个上值变量，则通过 UpVal.v 的指向来访问。如果局部变量还在栈上，即外层函数还未退出，那么闭包函数（内部函数）的上值 UpVal.v 指向栈中的局部变量，如果外层函数执行完，局部变量会被销毁，此时，会把栈中的局部变量拷贝到上值变量 UpVal.value 中，然后 UpVal.v 指向 UpVal.value。

可以理解为 UpVal 上值有两种状态，一种开，一种关。

open 开的时候，UpVal.v 指针指向栈中局部变量（局部变量还存活）。
closed 关的时候，即外层函数返回后，会把栈中的局部变量（已不能再被访问到）拷贝到 UpVal.value 中，并让 Upval.v = UpVal.value，这样 Upval.v 就不再依赖栈中的局部变量了。

如上面代码，第 8 行，第一次调用 f2 时，因为 f1 没返回，第 4 行访问的上值 x 还在栈上，则 f2 闭包函数中的 LClosure.upvals[1].v 指向栈中的 x。当执行到 11 行时，f1 返回，x也会从栈中销毁，但我们可以把这个变量拷贝出来，放到自己的上值变量中。伪代码如下：

LClosure.upvals[1].value = x; 
LClosure.upvals[1].v = LClosure.upvals[1].value

那如果有多个闭包函数访问相同的上值变量 x 呢，如下：

local function f1()
    local x = 1
    local function f2()
        print("f2", x)
    end

    local function f3()
        print("f3", x)
    end

    f2()
    f3()
    return f2, f3
end

local call2, call3 = f1()
call2() call3()

f1 返回时，f2, f3 会各自拷贝变量 x 到自己闭包的 UpVal 数组中吗。答案是否定的，不会。

我们可以看到多个 lua 闭包，引用同一个外部变量 x 时，它们是共同指向同一个上值对象 UpVal，不管外层函数 f1 是调用时，还是调用后。

lua 闭包创建

接下来看看闭包是如何被创建的，还是使用上面的代码例子，aaa.lua 文件：

local function f1()
    local x = 1
    local function f2()
        print("f2", x)
    end

    local function f3()
        print("f3", x)
    end

    f2()
    f3()
    return f2, f3
end

local call2, call3 = f1()
call2() call3()

用 luac -l -l aaa.lua 查看 f1 对应的指令列表：

-- f1 函数对应的指令列表
function <aaa.lua:1,14> (11 instructions at 0000000000738700)
0 params, 5 slots, 1 upvalue, 3 locals, 1 constant, 2 functions
        1       [2]     LOADK           0 -1    ; 1
        2       [5]     CLOSURE         1 0     ; 00000000007388e0
        3       [9]     CLOSURE         2 1     ; 00000000007387d0
        4       [11]    MOVE            3 1
        5       [11]    CALL            3 1 1
        6       [12]    MOVE            3 2
        7       [12]    CALL            3 1 1
        8       [13]    MOVE            3 1
        9       [13]    MOVE            4 2
        10      [13]    RETURN          3 3
        11      [14]    RETURN          0 1

第一行 function <aaa.lua:1,14> (11 instructions at 0000000000738700)表示函数定义在 aaa 文件中的第 1 至 14 行。函数地址为 0000000000738700，有11条指令。

0 params, 5 slots, 1 upvalue, 3 locals, 1 constants, 2 function，这表示有 0 个固定参数，5 个寄存器，有 1 个 upvalue， 2 个局部变量，1 个常量，2 个内嵌函数（f2，f3）。

接下来第3行开始，表示一条指令，1 [2] LOADK 0 -1 ; 1 1表示第1条指令，[2] 表示指令对应代码中的哪一行，LOADK 指令名，0 -1 表示 LOADK 指令用到的参数，表示从常量表中的 0 号槽位数值加载到寄存器中的 0 号位置（这里的寄存器是指栈中的槽位）。

在 f1 函数中，我们定义了两个函数 f2，f3，相应的我们也会生成两个 CLOSURE 指令，即创建 lua 函数闭包。

void luaV_execute (lua_State *L) {
    ...
    vmcase(OP_CLOSURE) {
        Proto *p = cl->p->p[GETARG_Bx(i)];
        LClosure *ncl = getcached(p, cl->upvals, base);  /* cached closure */
        if (ncl == NULL)  /* no match? */
            pushclosure(L, p, cl->upvals, base, ra);  /* create a new one */
        else
            setclLvalue(L, ra, ncl);  /* push cashed closure */
        checkGC(L, ra + 1);
        vmbreak;
    }
    ...
}

/*
** create a new Lua closure, push it in the stack, and initialize
** its upvalues. Note that the closure is not cached if prototype is
** already black (which means that 'cache' was already cleared by the
** GC).
*/
static void pushclosure (lua_State *L, Proto *p, UpVal **encup, StkId base,
                         StkId ra) {
  int nup = p->sizeupvalues;
  Upvaldesc *uv = p->upvalues;
  int i;
  LClosure *ncl = luaF_newLclosure(L, nup); // 创建一个新的 lua函数闭包
  ncl->p = p; // 闭包对象引用函数原型
  setclLvalue(L, ra, ncl);  /* anchor new closure in stack 新的闭包对象压入栈中，即 stack[ra] = ncl */
  for (i = 0; i < nup; i++) {  /* fill in its upvalues 通过函数原型的 nup 变量，确定当前函数会用到的上值个数 */
    if (uv[i].instack)  /* upvalue refers to local variable? 如果上值是在栈上的，那么就去查找这个上值，并引用它 */
      ncl->upvals[i] = luaF_findupval(L, base + uv[i].idx);
    else  /* get upvalue from enclosing function 如果上值已经不在栈上，那么就会在外层函数的上值数组中，这个时候引用它 */
      ncl->upvals[i] = encup[uv[i].idx];
    ncl->upvals[i]->refcount++; // 上值对象引用计数器+1
    /* new closure is white, so we do not need a barrier here */
  }
  if (!isblack(p))  /* cache will not break GC invariant? */
    p->cache = ncl;  /* save it on cache for reuse */
}

代码中第 6 行判断闭包是否有创建过，有就会复用，没有就会走 pushclosure 创建新的闭包，并且把闭包对象压入栈中。从代码中，我们可以看到， lua 闭包和 c 闭包一样，都有一个指向函数原型的引用指针，和上值数组两大部分。

luaF_findupval方法尝试从 openupval 链表中查找 x 变量对应的上值 UpVal 对象（根据函数原型的 Upvaldesc 信息），如果已经存在了，闭包就会引用这个上值对象，并对其引用计数+1，如果找不到就创建一个新的 UpVal 对象，然后加入到 openupval 链表中，原码如下：

UpVal *luaF_findupval (lua_State *L, StkId level) {
  UpVal **pp = &L->openupval;
  UpVal *p;
  UpVal *uv;
  lua_assert(isintwups(L) || L->openupval == NULL);
  while (*pp != NULL && (p = *pp)->v >= level) {
    lua_assert(upisopen(p));
    if (p->v == level)  /* found a corresponding upvalue? */
      return p;  /* return it */
    pp = &p->u.open.next;
  }
  // 如果未找到上值，创建一个新的上值加入链表中，放入到链表头部
  /* not found: create a new upvalue */
  uv = luaM_new(L, UpVal);
  uv->refcount = 0;
  uv->u.open.next = *pp;  /* link it to list of open upvalues */
  uv->u.open.touched = 1;
  *pp = uv;
  uv->v = level;  /* current value lives in the stack */
  if (!isintwups(L)) {  /* thread not in list of threads with upvalues? */
    L->twups = G(L)->twups;  /* link it to the list */
    G(L)->twups = L;
  }
  return uv;
}

openupval 链表采用的是头插法，新节点会插入到链表头部（类似栈，后进的先出），例如，如果 f3 还访问外层其他局部变量 y，z，访问顺序是 z，y，x 的话，则最最终示意图如下：

local function f1()
    local y = 2
    local z = 3
    local x = 1

    local function f2()
        print("f2", x)
    end

    local function f3()
    	local tmp1 = z -- 先访问外部变量z
    	local tmp2 = y -- 再访问外部变量y
        print("f3", x, tmp1, tmp2)  -- 访问外部变量x
    end

    f2()
    f3()
    return f2, f3
end

local call2, call3 = f1()
call2() call3()

图f

在解析阶段，生成函数体指令时，不管使用者有没有写 return 语句，最终都会生成一条 RETURN 指令。而在 RETURN 指令实现中，就会从 openupval 链表中，找到引用有该函数的所有局部变量对应的 UpVal 关闭掉。

简单来说，就是外层函数执行完毕的时候，会调用luaF_close将 openupval 中的这一层函数对应的所有 UpVal 关闭，其中，level 指向函数在栈中第一个变量的地址，函数返回了，大于 level 之上的变量作为上值，都要执行 close 操作，即复制栈中变量的值，到自身的 u.value 中，才能在子闭包里头访问到这个上值，代码如下：

vmcase(OP_RETURN) {
    int b = GETARG_B(i);
    if (cl->p->sizep > 0) luaF_close(L, base);
...

// 关闭栈中的upvalues，从level往后的upvalue，如果引用计数为0释放之，否则拷贝到UpVal自己身上
void luaF_close (lua_State *L, StkId level) {
  UpVal *uv;
  while (L->openupval != NULL && (uv = L->openupval)->v >= level) {
    lua_assert(upisopen(uv));
    L->openupval = uv->u.open.next;  /* remove from 'open' list */
    if (uv->refcount == 0)  /* no references? */
      luaM_free(L, uv);  /* free upvalue */
    else {
      setobj(L, &uv->u.value, uv->v);  /* move value to upvalue slot */
      uv->v = &uv->u.value;  /* now current value lives here */
      luaC_upvalbarrier(L, uv);
    }
  }
}

luaF_close 还会在其他地方执行，只要任何情况下留在栈中的局部变量被删除出栈，比如，for，while，if，do...end 等离开这些作用域，就会调这个函数。调完之后，UpVal 本身就会把局变量的值保存在自己身上了，这个过程对于函数是透明的，因为它总是间接的引用 upvalue。

下面对 for 循环作用域上值举例：

local t = {}
for i = 1, 5 do
    local x = i
    table.insert(t, function() print(x) end)
end

for _, f in ipairs(t) do
    f()
end
-- 输出：1 2 3 4 5

上面代码，在 for 循环中，创建了5次匿名函数，每次匿名函数，都引用一个局部变量 x，x 的值在每次for 循环时都不一样。

x 的作用域在 for 循环内，当进入下一次 for 循环时，x 就离开了作用域，此时匿名函数的 x 变成 closed 状态（调用 luaF_close ），所以每个匿名函数都保存着自己的 Upvalue，打印出的结果如上所示，通过画图则如下：

还有一种情况，如果闭包函数的上值不是指向外层函数，而是外层的外层函数呢，lua 又是怎么实现上值引用的呢？

看下面的例子：

local function f1()
    local x = 1

    local function f2()
        local function f3()
            print(x)
        end
    end
end

f3 引用的上值 x 是在最外层 f1 中定义的。f2 没有引用到 x。我们可以通过 luac -l -l 来观察 f2，以及 f3 对应的指令码：

-- f2 函数对应的指令列表
function <aaa.lua:4,8> (2 instructions at 0000000000ad88e0)
0 params, 2 slots, 2 upvalues, 1 local, 0 constants, 1 function
        1       [7]     CLOSURE         0 0     ; 0000000000ad89b0
        2       [8]     RETURN          0 1
constants (0) for 0000000000ad88e0:
locals (1) for 0000000000ad88e0:
        0       f3      2       3
upvalues (2) for 0000000000ad88e0: -- 使用的上值有 _ENV，x
        0       _ENV    0       0
        1       x       1       0

-- f3 函数对应的指令列表
function <aaa.lua:5,7> (4 instructions at 0000000000ad89b0)
0 params, 2 slots, 2 upvalues, 0 locals, 1 constant, 0 functions
        1       [6]     GETTABUP        0 0 -1  ; _ENV "print"
        2       [6]     GETUPVAL        1 1     ; x
        3       [6]     CALL            0 2 1
        4       [7]     RETURN          0 1
constants (1) for 0000000000ad89b0:
        1       "print"
locals (0) for 0000000000ad89b0:
upvalues (2) for 0000000000ad89b0:
        0       _ENV    0       0
        1       x       0       1

可以看到，f2 即使没有使用过变量 x，上值那列 upvalues (2) for 0000000000ad88e0:，也记录了 x。说明，闭包函数引用某一层的函数局部变量，那么闭包函数到这一层函数之间的所有嵌套函数都会记录这些上值，不管有没有使用到。

如果有仔细看 pushclosure实现就知道了，有一个判断 if (uv[i].instack)，上值是否是在外面一层函数中，如果是，就调用 luaF_findupval 查找或者创建一个上值。但如果 if 判断为假，表明上值在外层的外层，或者在更外层，这个时候，我们只需要让 UpVal *指向外层函数的 UpVal * 地址就好了，这样，就做到每一层 UpVal 都指向同一个地址，这个地址肯定是由第一层嵌套的函数先执行 if 里的 luaF_findupval创建了上值对象。比如，例子中的第一层嵌套函数是 f2，f3 的上值 UpVal 就不用再创建一个上值，或者去 openupval 链表中查找，直接指向 f2 的 UpVal 就好了，因为这个上值 x 对象一定是在执行 f2 的 CLOSURE 指令中创建过了。这里就不画图了，和上面的图f 类似。

这里怎么知道是第一层嵌套函数中要创建上值对象，而不是在里面的第二层嵌套函数创建呢？

答案解析阶段，Proto 函数原型上值描述对象使用的是 Upvaldesc，用来记录上值是否在栈上（instack 字段），如果在，标记为1，不在，标记为0，查看 lparser.c 源码中的singlevaraux 函数实现部分就知道了。一定是在第一层嵌套函数 instack 设置为 1，之后的内嵌函数 instack 设置为 0，这样就可以做上值对象只会在第一层内嵌函数中被创建一次，其内嵌函数只会引用这个上值对象，不再创建。如下图，假设 f1 函数内有3层内嵌函数，只会在第一层内嵌函数 f2 中创建上值对象 Upval，引用变量 x，之后的内嵌函数 f3, f4 只会引用这个上值对象。

小结

在 lua 语言中，闭包分为 c 函数闭包，lua 函数闭包，闭包函数主要是由引用外部局部变量的上值，以及指向一个函数原型的函数指针构成。因为C函数闭包相对简单，没有外层函数，所以 upvalue 就是一个 TValue 数组。

lua 闭包函数的 upvalue 实现就相对复杂一点，分为 open，closed 两种状态，外部函数在处在调用时，即 open 时，upvalue 引用栈中的局部变量。外部函数执行完返回后，即 closed 时，upvalue 指向一个 UpVal 对象。如果 UpVal 对象同时被 n 个闭包函数引用，那么 refcount 引用计数器为 n，只有在所有闭包函数被 gc 回收后，refcount 计数器才减为 0，回收这个上值 UpVal 对象，所以，这里使用的是引用计数来回收 Upval 对象。

参考：
1. 深入Lua：函数和闭包

2. 深入Lua：函数和闭包2

posted @ 2024-04-27 23:01 墨色山水阅读(298) 评论(0) 收藏举报

刷新页面返回顶部

lindx

lua5.3 闭包实现

上值

闭包

c 函数闭包

lua 函数闭包

lua 闭包创建

小结