Python与Javascript相互调用超详细讲解（一）基本原理 Part 1 - 通过子进程和进程间通信（IPC）

TL; DR
- python调javascript
- javascript调python
原理
优点
缺点

首先要明白的是，javascript和python都是解释型语言，它们的运行是需要具体的runtime的。

Python： 我们最常安装的Python其实是cpython，它有一个基于C的解释器。除此之外还有像pypy这种解释器，等等。基本上，不使用cpython作为python的runtime的最大问题就是通过pypi安装的那些外来包，甚至有一些cpython自己的原生包（像collections这种）都用不了。
JavaScript： 常见的运行引擎有google的V8，Mozilla的SpiderMonkey等等，这些引擎可以把JavaScript代码转换成机器码执行。基于这些基础的运行引擎，我们可以开发支持JS的浏览器（比如Chrome的JS运行引擎就是V8）；也可以开发功能更多的JS运行环境，比如Node.js，相当于我们不需要一个浏览器，也可以跑JS代码。有了Node.js，JS包管理也变得方便许多，如果我们想把开发好的Node.js包再给浏览器用，就需要把基于Node.js的源代码编译成浏览器支持的JS代码。

在本文叙述中，假定：

主语言： 最终的主程序所用的语言

副语言： 不是主语言的另一种语言

例如，python调用js，python就是主语言，js是副语言

TL; DR

适用于：

python和javascript的runtime（基本特指cpython[不是cython！]和Node.js）都装好了
副语言用了一些复杂的包（例如python用了numpy、javascript用了一点Node.js的C++扩展等）
对运行效率有要求的话:
- python与javascript之间的交互不能太多，传递的对象不要太大、太复杂，最好都是可序列化的对象
- javascript占的比重不过小。否则，python调js的话，启动Node.js子进程比实际跑程序还慢；js调python的话，因为js跑得快，要花很多时间在等python上。
因为IPC大概率会用线程同步输入输出，主语言少整啥多进程、多线程之类的并发编程

有库！有库！有库！

python调javascript

JSPyBridge： pip install javascript
- 优点：
  1. 作者还在维护，回issue和更新蛮快的。
  2. 支持比较新的python和node版本，安装简单
  3. 基本支持互调用，包括绑定或者传回调函数之类的。
- 缺点：没有合理的销毁机制，import javascript即视作连接JS端，会初始化所有要用的线程。如果python主程序想重启对JS的连接，或者主程序用了多进程，想在每个进程都连接一次JS，都很难做到，会容易出错。
PyExecJS：pip install PyExecJS，比较老的技术文章都推的这个包
- 优点： 支持除了Node.js以外的runtime，例如PhantomJS之类的
- 缺点： End of Life，作者停止维护了

javascript调python

（因为与我的项目需求不太符合，所以了解不太多）

JSPyBridge： npm i pythonia
node-python-bridge: npm install python-bridge
python-shell：npm install python-shell

原理

首先，该方法的前提是两种语言都要有安装好的runtime，且能通过命令行调用runtime运行文件或一串字符脚本。例如，装好cpython后我们可以通过python a.py来运行python程序，装好Node.js之后我们可以通过node a.js或者node -e "some script"等来运行JS程序。

当然，最简单的情况下，如果我们只需要调用一次副语言，也没有啥交互（或者最多只有一次交互），那直接找个方法调用CLI就OK了。把给副语言的输入用stdin或者命令行参数传递，读取命令的输出当作副语言的输出。
例如，python可以用subprocess.Popen，subprocess.call，subprocess.check_output或者os.system之类的，Node.js可以用child_process里的方法，exec或者fork之类的。需要注意的是，如果需要引用其他包，Node.js需要注意在node_modules所在的目录下运行指令，python需要注意设置好PYTHONPATH环境变量。

# Need to set the working directory to the directory where `node_modules` resides if necessary
>>> import subprocess
>>> a, b = 1, 2
>>> print(subprocess.check_output(["node", "-e", f"console.log({a}+{b})"]))
b'3\n'
>>> print(subprocess.check_output(["node", "-e", f"console.log({a}+{b})"]).decode('utf-8'))
3

// Need to set PYTHONPATH in advance if necessary
const a = 1;
const b = 2;
const { execSync } = require("child_process");
console.log(execSync(`python -c "print(${a}+${b})"`));
//<Buffer 33 0a>
console.log(execSync(`python -c "print(${a}+${b})"`).toString());
//3
//

如果有复杂的交互，要传递复杂的对象，有的倒还可以序列化，有的根本不能序列化，咋办？
这基本要利用进程间通信（IPC），通常情况下是用管道（Pipe）。在stdin，stdout和stderr三者之中至少挑一个建立管道。
假设我用stdin从python向js传数据，用stderr接收数据，模式大约会是这样的：
（以下伪代码仅为示意，没有严格测试过，实际使用建议直接用库）

新建一个副语言（假设为JS）文件python-bridge.js：该文件不断读取stdin并根据发来的信息不同，进行不同处理；同时如果需要打印信息或者传递object给主语言，将它们适当序列化后写入stdout或者stderr。

function sendObj(obj) {
    // deliver object, "$j2p" can be any prefix predefined and agreed upon with the Python side
    // just to tell python side that this is an object needs parsing
    process.stderr.write("$j2p sendObj "+JSON.stringify(obj)+"\n");
}

process.stdin.on('data', data => {
    data.split('\n').forEach(line => {
        // Deal with each line
        if (line.startsWith('$p2j')) {
            const [prefix, cmd, ...words] = line.split(" ");
            if (cmd === 'call') {
                // call some function
                const [funcname, ...argStr] = words;
                const args = JSON.parse(argStr.join(' '));
                sendObj(global[funcname](...args));
            }
        }
        // write message
        process.stdout.write(message + "\n");
    });
}
process.on('exit', () => {
    console.debug('** Node exiting');
});

在python中，用Popen异步打开一个子进程，并将子进程的stdin，stdout和stderr三者之中的至少一个，用管道连接。大概类似于:

cmd = ["node", "--trace-uncaught", f"{os.path.dirname(__file__)}/python-bridge.js"]
kwargs = dict(
    stdin=subprocess.PIPE,
    stdout=sys.stdout,
    stderr=subprocess.PIPE,
)
if os.name == 'nt':
    kwargs['creationflags'] = subprocess.CREATE_NO_WINDOW
subproc = subprocess.Popen(cmd, **kwargs)

在需要调用JS，或者需要给JS传递数据的时候，往subproc写入序列化好的信息，写入后需要flush，不然可能会先写入缓冲区：

subproc.stdin.write(f"$p2j call funcName {json.dumps([arg1, arg2])}".encode())
subproc.stdin.flush() # write immediately, not writing to the buffer of the stream

对管道化的stdout/stderr，新建一个线程，专门负责读取传来的数据并进行处理。是对象的重新转换成对象，是普通信息的直接打印回主进程的stderr或者stdout。

def read_stderr():
    while subproc.poll() is None:
        # when the subprocess is still alive, keep reading
        line = self.subproc.stderr.readline().decode('utf-8')
        if line.startswith('$j2p'):
            # receive special information
            _, cmd, line = line.split(' ', maxsplit=2)
            if cmd == 'sendObj':
                # For example, received an object
                obj = json.loads(line)
        else:
            # otherwise, write to stderr as it is
            sys.stderr.write(line)

stderr_thread = threading.Thread(target=read_stderr, args=(), daemon=True)
stderr_thread.start()

这里由于我们的stdout没有建立管道，所以node那边往stdout里打印的东西会直接打印到python的sys.stdout里，不用自己处理。

由于线程是异步进行的，什么时候知道一个函数返回的对象到了呢？答案是用线程同步手段，信号量（Semaphore）、条件（Condition），事件（Event）等等，都可以。以python的条件为例：

func_name_cv = threading.Condition()
# use a flag and a result object in case some function has no result
func_name_result_returned = False
func_name_result = None

def func_name_wrapper(arg1, arg2):
    # send arguments
    subproc.stdin.write(f"$p2j call funcName {json.dumps([arg1, arg2])}".encode())
    subproc.stdin.flush()
    # wait for the result
    with func_name_cv:
        if not func_name_result_returned:
            func_name_cv.wait(timeout=10)    # in seconds
        # when result finally returned, reset the flag
        func_name_result_returned = False
        return func_name_result

同时，需要在读stderr的线程read_stderr里解除对这个返回值的阻塞。需要注意的是，如果JS端因为意外而退出了，subproc也会死掉，这时候也要记得取消主线程中的阻塞。

def read_stderr():
    while subproc.poll() is None:
        # when the subprocess is still alive, keep reading
        # Deal with a line
        line = self.subproc.stderr.readline().decode('utf-8')
        if line.startswith('$j2p'):
            # receive special information
            _, cmd, line = line.split(' ', maxsplit=2)
            if cmd == 'sendObj':
                # acquire lock here to ensure the editing of func_name_result is mutex
                with func_name_cv:
                    # For example, received an object
                    func_name_result = json.loads(line)
                    func_name_result_returned = True
                    # unblock func_name_wrapper when receiving the result
                    func_name_cv.notify()
            else:
                # otherwise, write to stderr as it is
                sys.stderr.write(line)
    # If subproc is terminated (mainly due to error), still need to unblock func_name_wrapper
    func_name_cv.notify()

当然这是比较简单的版本，由于对JS的调用基本都是线性的，所以可以知道只要得到一个object的返回，那就一定是func_name_wrapper对应的结果。如果函数多起来的话，情况会更复杂。
此外，需要注意的是：func_name_cv.wait(timeout=10)一定要设置一个timeout，这个函数的单位是秒。否则，如果JS方出错了，你会得不到返回值，那么你就会一直被阻塞在那里。

如果想取消对JS的连接，首先应该先关闭子进程，然后等待读stdout/stderr的线程自己自然退出，最后一定不要忘记关闭管道。并且这三步的顺序不能换，如果先关了管道，读线程会因为stdout/stderr已经关了而出错。
```
subproc.terminate()
stderr_thread.join()
subproc.stdin.close()
subproc.stderr.close()
```

如果是通过这种原理javascript调用python，方法也差不多，javascript方是Node.js的话，用的是child_process里的指令。
这种方式还有一个坑是，管道的buffer大小是有上限的。意思是如果你在这一头写了一堆数据，那一头没来得及读出来，如果管道满了，它就可能会丢失一些信息导致出错。

优点

只需要正常装好两方的runtime就能实现交互，运行环境相对比较好配。
只要python方和javascript方在各自的runtime里正常运行没问题，那么连上之后运行也基本不会有问题。（除非涉及并发）
对两种语言的所有可用的扩展包基本都能支持。

缺点

当python与JavaScript交互频繁，且交互的信息都很大的时候，可能会很影响程序效率。因为仅仅通过最多3个管道混合处理普通要打印的信息、python与js交互的对象、函数调用等，通信开销很大。
传递数据的准确率可能受到管道buffer大小的影响。
要另起一个子进程运行副语言的runtime，会花一定时间和空间开销。

posted @ 2022-01-15 02:25 milliele 阅读(2673) 评论(0) 收藏举报

刷新页面返回顶部

Python与Javascript相互调用超详细讲解（一）基本原理 Part 1 - 通过子进程和进程间通信（IPC）

TL; DR

python调javascript

javascript调python

原理

优点

缺点

公告