IronPython 源码剖析系列（2）：IronPython 引擎的运作流程

原作：木野狐，2006-11-9，转载请注明出处。
上一篇：IronPython 源码剖析系列（1）：IronPython 编译器

Python 程序的执行是从 hosting 程序 ipy.exe 开始的，而他的入口点则在控制台这个类中：

class PythonCommandLine {
    [STAThread]
    static int Main(string[] rawArgs) {
        //

        // 创建 Python 引擎
        engine = new PythonEngine(options);

        // 创建 __main__ 模块
        CreateMainModule();

        //

        // 这里调用 Run 方法
        return Run(engine, args == null ? null : args.Count > 0 ? args[0] : null);

        //

    }

    // 运行引擎
    private static int Run(PythonEngine engine, string fileName) {
        try {
    // 输入语法：
    // ipy -c "print 'ok'"
            if (ConsoleOptions.Command != null) {
                // 直接执行一个字符串表示的 python 代码
                return RunString(engine, ConsoleOptions.Command);
            } else if (fileName == null) {
#if !IRONPYTHON_WINDOW
                // 交互式执行
                return RunInteractive(engine);
#else
                return 0;
#endif
            } else {
                // 执行文件内容
                return RunFile(engine, fileName);
            }
        } catch (System.Threading.ThreadAbortException tae) {
            if (tae.ExceptionState is PythonKeyboardInterruptException) {
                Thread.ResetAbort();
            }
            return -1;
        }
    }
}

在这里我们看到可以用三种主要的方式来执行 python 代码，分别是：

1. 交互式

具体来说就是在命令行状态下，先开启一个控制台，然后在 shell 中输入 python 代码执行。
执行情况如下所示：

H:\ipy2>ipy
IronPython 1.0 (1.0.61005.1977) on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>>> print "OK"
OK
>>>

2. 直接以参数的形式指定一个字符串表示的代码片段来执行

在控制台下输入如下命令，执行情况：

H:\ipy2>ipy -c "print 'ok'"
ok

H:\ipy2>

3. 通过源代码文件的方式执行

命令如下：

ipy b.py

注意这个命令还有个参数形式如下：

ipy -i b.py

这个命令的执行结果是，b.py 程序执行后，将自动打开一个 python 的 shell，以便允许在这里做一些操作。

下面我们依次来分析一下这几种情况下的执行流程。

交互式输入(1)和直接执行代码片段(2)的方式，实际的流程是类似的。见如下代码跟踪：

class PythonCommandLine {
    // 让 Engine 执行 string 命令
    private static int RunString(PythonEngine engine, string command) {
        // 一些初始化动作
        //

        // 执行
        engine.ExecuteToConsole(command);

        //

    }

    private static int RunInteractive(PythonEngine engine) {
        // 一些初始化动作
        //

        result = RunInteractive();

        //

    }

    private static int RunInteractive() {
        return RunInteractiveLoop();
    }

    // 循环的执行控制台交互
    private static int RunInteractiveLoop() {
        bool continueInteraction = true;
        int result = 0;
        while (continueInteraction) {
            result = TryInteractiveAction(
                delegate(out bool continueInteractionArgument) {
                    // 这个方法会读取一次交互输入，并通过 PythonEngine，
                    // 尝试用 Parser 解析输入的字符串。如失败则终止
                    continueInteractionArgument = DoOneInteractive();
                    return 0;
                },
                out continueInteraction);
        }

        return result;
    }

    // 做一次交互
    public static bool DoOneInteractive() {
        bool continueInteraction;
        // 读取一个语句并尝试解析之
        string s = ReadStatement(out continueInteraction);

        //

        // 执行读入的内容
        engine.ExecuteToConsole(s);

        return true;
    }
}

OK，这里我们看到情况 1 和 2 殊途同归，最终都调用了

engine.ExecuteToConsole(s);

这里的 PythonEngine (Python 引擎) 我们可以看作是整个 hosting 程序的核心调度器。

现在接着看下去，看看 engine 是如何执行以字符串方式传递过来的代码的。

public class PythonEngine : IDisposable {

    // 在控制台上执行一个字符串
    public void ExecuteToConsole(string text, EngineModule engineModule, IDictionary<string, object> locals) {
        ModuleScope moduleScope = GetModuleScope(engineModule, locals);

        CompilerContext context = DefaultCompilerContext("<stdin>");

        // 创建 Parser. 利用此 Parser 来解析输入的字符串。
        Parser p = Parser.FromString(Sys, context, text);
        bool isEmptyStmt = false;

        // 解析为语句
        Statement s = p.ParseInteractiveInput(false, out isEmptyStmt);

        if (s != null) {
            // 编译生成代码
            CompiledCode compiledCode = OutputGenerator.GenerateSnippet(context, s, true, false);
            Exception ex = null;

            // 如果有命令分派者，则交给他去执行。
            // 命令分派者的机制允许代码被执行在另一个线程中，比如 winform 的控件里，
            // 而不是固定在控制台
            if (consoleCommandDispatcher != null) {
                // 创建匿名委托
                CallTarget0 runCode = delegate() {
                    // 运行编译过的代码
                    try { compiledCode.Run(moduleScope); } catch (Exception e) { ex = e; }
                    return null;
                };
                // 交给命令分派者去执行
                consoleCommandDispatcher(runCode);

                // We catch and rethrow the exception since it could have been thrown on another thread
                // 捕获到异常，并重新抛出。因为它可能在另一个线程上被抛出了。
                if (ex != null)
                    throw ex;
            } else { // 否则在当前线程直接执行
                // 运行编译过的代码
                compiledCode.Run(moduleScope);
            }
        }
    }
}

这个方法比较短，我就全部贴上来了。
我们可以看到一个很清晰的执行步骤：

从输入的字符串开始
-> 解析器(Parser)
-> 解析的产物是语句(Statement)
-> 利用 OutputGenerator 的 GenerateSnippet 方法生成 CompiledCode.
-> 最终调用 compiledCode.Run(moduleScope)，在一个模块范围中执行编译过的代码。

解析器(Parser) 的作用是语法分析。在其内部，他会调用到词法分析器(Tokenizer)，词法分析器是完成词法分析，将源代码字符串解析为一个一个的标识符(Token). 解析器反复判断词法分析器分析的结果，将一个个的标识符构造为语句(Statement)，并构造出语法树。

在这里，语句(Statement) 分为很多种，比如 IfStatement, ForStatement 等，并且语句具备了可以执行的能力，其原理是通过其 Emit 方法，发送 IL 代码给代码生成器(CodeGen 或者 TypeGen)。另外由于有 SuiteStatement 等子类的帮助，语句自身就可以是一个复合的结构(Composition pattern)。

在得到语法树之后，Python 引擎调用了 OutputGenerator 这个生成器。其 GenerateSnippet 方法负责产生最终可调用的代码 CompiledCode, 这个方法比较琐碎，就不列举了。

CompiledCode 中，有一个供调用者使用的委托 CompiledCodeDelegate，这表明 CompiledCode 是真正可执行的对象了。

public class CompiledCode {
    // 这就是该 CompiledCode 得以执行的代码的委托
    private CompiledCodeDelegate code;

    // 执行
    internal object Run(ModuleScope moduleScope) {
        // 复制将要运行的模块范围
        moduleScope = (ModuleScope)moduleScope.Clone();

        // 在其中设定需要的静态数据
        moduleScope.staticData = staticData;

        // 通过委托调用该段代码
        return code(moduleScope);
    }
}

我们看到，编译过的代码需要在一个所谓的模块范围(ModuleScope) 中执行。那么这个模块范围又是什么东西呢？

IronPython 中，代表 python 语义上的模块的类是 PythonModule. 通常的文件形式的 IronPython 代码是被编译为 CompiledModule 来执行的，它对应于一个 PythonModule. 而代码片段 (包括交互输入和其他情况下的小段代码，统称代码片段(Code Snippet)) 本身作为字符串被传递的时候，并不具有执行环境(Context 或者说 Scope)的概念（所在的模块，全局变量之类）。所以 IronPython 的引擎内就设计了一个 ModuleScope 的概念，代表代码片段赖以执行的语义环境。

ModuleScope 包括一个语义上的 PythonModule, 以及附加的一些全局变量之类的信息。在默认情况下，代码片段在 IronPython 引擎负责创建的 __main__ 模块中工作。

这里需要注意的是，ModuleScope 并不唯一对应于 PythonModule. 一个 PythonModule 可以有多个 ModuleScope.

OK，以上我们看清了代码片段的执行是最终通过 CompiledCode 完成，下面继续看一下源代码文件是怎么被处理的。

我们从刚才跳过的 RunFile 方法开始看起，一路跟踪下去：

class PythonCommandLine {
private static int RunFile(PythonEngine engine, string fileName) {
//

#if !IRONPYTHON_WINDOW
        // 如果打开了 -i 选项
        if (ConsoleOptions.Introspection) {
            RunFileWithIntrospection(fileName);
        } else {
            OptimizedEngineModule engineModule = engine.CreateOptimizedModule(fileName, "__main__", true);
            engineModule.Execute();
        }
#else
        OptimizedEngineModule engineModule = engine.CreateOptimizedModule(fileName, "__main__", true);
        engineModule.Execute();
#endif
        result = 0;

    }

#if !IRONPYTHON_WINDOW
    // 执行文件后打开控制台
    public static void RunFileWithIntrospection(string fileName) {
        bool continueInteraction;
        TryInteractiveAction(
            delegate(out bool continueInteractionArgument) {
                // 创建模块
                OptimizedEngineModule engineModule = engine.CreateOptimizedModule(fileName, "__main__", true);
                engine.DefaultModule = engineModule;
                // 执行
                engineModule.Execute();
                continueInteractionArgument = true;
                return 0;
            },
            out continueInteraction);

        if (continueInteraction)
            // 如果指定了 -i 选项，则运行完文件后进入控制台
            RunInteractiveLoop();
    }
#endif

    // 用最优化代码创建 module. 其限制是，用户不能任意指定 globals 字典。
    public OptimizedEngineModule CreateOptimizedModule(string fileName, string moduleName, bool publishModule) {
        if (fileName == null) throw new ArgumentNullException("fileName");
        if (moduleName == null) throw new ArgumentNullException("moduleName");

        CompilerContext context = new CompilerContext(fileName);

        // 创建解析器
        Parser p = Parser.FromFile(Sys, context, Sys.EngineOptions.SkipFirstLine, false);

        // 解析出语法树
        Statement s = p.ParseFileInput();

        // 这里实际产生一个类型
        PythonModule module = OutputGenerator.GenerateModule(Sys, context, s, moduleName);

        // 模块范围
        ModuleScope moduleScope = new ModuleScope(module);

        // EngineModule
        OptimizedEngineModule engineModule = new OptimizedEngineModule(moduleScope);

        module.SetAttr(module, SymbolTable.File, fileName);

        // 如果发布，则将模块添加到 Sys 的模块字典中去
        if (publishModule) {
            Sys.modules[moduleName] = module;
        }

        return engineModule;
    }
}

词法和语法分析的部分，和前面类似。我们循着 OutputGenerator 跟下去：

static class OutputGenerator {
    // 产生模块
    public static PythonModule GenerateModule(SystemState state, CompilerContext context, Statement body, string moduleName) {
        //

        return DoGenerateModule(state, context, gs, moduleName, context.SourceFile, suffix);

        //

    }

    private static PythonModule DoGenerateModule(SystemState state, CompilerContext context, GlobalSuite gs, string moduleName, string sourceFileName, string outSuffix) {
        //

        AssemblyGen ag = new AssemblyGen(moduleName + outSuffix, outDir, fileName + outSuffix + ".exe", true);
        ag.SetPythonSourceFile(fullPath);

        TypeGen tg = GenerateModuleType(moduleName, ag);
        CodeGen cg = GenerateModuleInitialize(context, gs, tg);

        CodeGen main = GenerateModuleEntryPoint(tg, cg, moduleName, null);
        ag.SetEntryPoint(main.MethodInfo, PEFileKinds.ConsoleApplication);
        ag.AddPythonModuleAttribute(tg, moduleName);

        Type ret = tg.FinishType();
        Assembly assm = ag.DumpAndLoad();
        ret = assm.GetType(moduleName);

        // 注意这里
        PythonModule pmod = CompiledModule.Load(moduleName, ret, state);
        return pmod;
    }
}

这里我们可以发现，源文件形式的代码，是被创建为 CompiledModule 来执行的。CompiledModule 和 CompiledCode 所依赖的 ModuleScope 一样，都会对应于一个语义上的 PythonModule, 但其区别是 CompiledModule 并不包含该 PythonModule 的状态信息。

接下来的代码创建了 OptimizedEngineModule, 然后调用其 Execute 方法：

public class OptimizedEngineModule : EngineModule {
    bool globalCodeExecuted;

    internal OptimizedEngineModule(ModuleScope moduleScope)
        : base(moduleScope) {
        Debug.Assert(GlobalsAdapter is CompiledModule);
    }

    public void Execute() {
        // 确保只执行一次 global 代码
        if (globalCodeExecuted)
            throw new InvalidOperationException("Cannot execute global code multiple times");
        globalCodeExecuted = true;

        Module.Initialize();
    }
}

Module 是其父类中定义的一个属性，代表 PythonModule:

public class EngineModule {
internal PythonModule Module { get { return defaultModuleScope.Module; } }
}

PythonModule 代码如下：

[PythonType("module")]
public class PythonModule : ICustomAttributes, IModuleEnvironment, ICodeFormattable {
    private InitializeModule initialize;

    public void Initialize() {
        Debug.Assert(__dict__ != null, "Generated modules should always get a __dict__");

        if (initialize != null) {
            initialize();
        }
    }
}

其中被调用的 Initialize 方法是一个委托：

public delegate void InitializeModule();

而这个委托所指向的方法是被 OutputGenerator 创建出来的。

现在为止，我们已经走马观花一般的领略了 IronPython 的主要执行步骤，其中涉及了下列几个技术细节并未阐述，在后续文章中，我将选择其中有意思的部分进行一些分析。
这些细节是：

1. 词法分析，语法分析涉及的类 Parser, Token, Tokenizer 之类，比较简单。
2. 语法层面上的一些类。比如 Statement, Expression 等。
3. 代码生成相关的内容。涉及到 CodeGen, TypeGen, OutputGenerator 等类别。基本上是通过 Emit 方式发送 IL 代码来进行，代码比较复杂琐碎。
4. Python 的类型系统，以及其特性的实现，这个是重点！
5. 从反编译的角度来分析 Python 产生的程序集及其执行原理。这也是有趣的部分。

有兴趣的朋友请继续期待后续系列文章。

posted on 2006-11-09 22:55 NeilChen 阅读(3055) 评论(8) 收藏举报

刷新页面返回顶部

Neil's blog

IronPython 源码剖析系列（2）：IronPython 引擎的运作流程

公告

导航