dbt CompileTask 简单说明

以前简单介绍过dbt 的manifest Compiler 模块，以下说明下dbt 的CompileTask cli task

作用

核心是对于解析生成的Manifest 进行编译，同时还会对于编译结果写入target 目录，同时还需要进行db 链接进行一个check 处理（比如schema 信息获取的）
类图 (dbt 不少命令都基本如下的继承模式)，run 命令继承了CompileTask

类方法

参考处理

简单说明
因为基于了继承的玩法，CompileTask 实际上主要是一些辅助方法，核心的方法是在run 中执行的(BaseTask的抽象方法)，默认实现是在
GraphRunnableTask 中的，dbt 基本没有任务的执行会有一个runner 实现
GraphRunnableTask 的runner
通过一个方法get_runner_type 说明的

def get_runner_type(self, _):

    return CompileRunner

get_runner_type 的使用
对于不同的task 会有不同的执行模式，实际上是类似实现了一个task pool，方便执行的执行处理
实际run 执行处理

def run(self):

        """

        Run dbt for the query, based on the graph.

        """

        # We set up a context manager here with "task_contextvars" because we

        # need the project_root in runtime_initialize.

        with task_contextvars(project_root=self.config.project_root):

            # 首先运行时初始化

            self._runtime_initialize()
 
            if self._flattened_nodes is None:

                raise DbtInternalError(

                    "after _runtime_initialize, _flattened_nodes was still None"

                )
 
            if len(self._flattened_nodes) == 0:

                with TextOnly():

                    fire_event(Formatting(""))

                warn_or_error(NothingToDo())

                result = self.get_result(

                    results=[],

                    generated_at=datetime.utcnow(),

                    elapsed_time=0.0,

                )

            else:

                with TextOnly():

                    fire_event(Formatting(""))

                selected_uids = frozenset(n.unique_id for n in self._flattened_nodes)

                result = self.execute_with_hooks(selected_uids)
 
        # We have other result types here too, including FreshnessResult

        if isinstance(result, RunExecutionResult):

            result_msgs = [result.to_msg_dict() for result in result.results]

            fire_event(

                EndRunResult(

                    results=result_msgs,

                    generated_at=result.generated_at.strftime("%Y-%m-%dT%H:%M:%SZ"),

                    elapsed_time=result.elapsed_time,

                    success=GraphRunnableTask.interpret_results(result.results),

                )

            )
 
        if self.args.write_json:

           # 写入manifest 文件到target 中

            write_manifest(self.manifest, self.config.project_target_path)

            if hasattr(result, "write"):

                result.write(self.result_path())
 
        self.task_end_messages(result.results)

        return result

GraphRunnableTask _runtime_initialize 默认处理
实际CompileTask 会覆盖出方法，同时也调用了此实现，主要是一些前置处理

def _runtime_initialize(self):

     # 实际的manifest 编译处理，后续单独说明内部处理

    self.compile_manifest()

    if self.manifest is None or self.graph is None:

        raise DbtInternalError("_runtime_initialize never loaded the graph!")
 
    self.job_queue = self.get_graph_queue()
 
    # we use this a couple of times. order does not matter.

    self._flattened_nodes = []

    for uid in self.job_queue.get_selected_nodes():

        if uid in self.manifest.nodes:

            self._flattened_nodes.append(self.manifest.nodes[uid])

        elif uid in self.manifest.sources:

            self._flattened_nodes.append(self.manifest.sources[uid])

        elif uid in self.manifest.saved_queries:

            self._flattened_nodes.append(self.manifest.saved_queries[uid])

        elif uid in self.manifest.unit_tests:

            self._flattened_nodes.append(self.manifest.unit_tests[uid])

        else:

            raise DbtInternalError(

                f"Node selection returned {uid}, expected a node, a source, or a unit test"

            )
 
    self.num_nodes = len([n for n in self._flattened_nodes if not n.is_ephemeral_model])

CompileTask _runtime_initialize 实现

def _runtime_initialize(self):

    if getattr(self.args, "inline", None):

        try:

            block_parser = SqlBlockParser(

                project=self.config, manifest=self.manifest, root_project=self.config

            )

            sql_node = block_parser.parse_remote(self.args.inline, "inline_query")

           # 会进行ref，source 以及docs 的处理

            process_node(self.config, self.manifest, sql_node)

            # keep track of the node added to the manifest

            self._inline_node_id = sql_node.unique_id

        except CompilationError as exc:

            fire_event(

                ParseInlineNodeError(

                    exc=str(exc.msg),

                    node_info={

                        "node_path": "sql/inline_query",

                        "node_name": "inline_query",

                        "unique_id": "sqloperation.test.inline_query",

                        "node_status": "failed",

                    },

                )

            )

            raise DbtException("Error parsing inline query")

    super()._runtime_initialize()

process_node 处理
实际上还是属于Manifest 元数据信息的生成，以及依赖关系的处理

def process_node(config: RuntimeConfig, manifest: Manifest, node: ManifestNode):

    _process_sources_for_node(manifest, config.project_name, node)

    _process_refs(manifest, config.project_name, node, config.dependencies)

    ctx = generate_runtime_docs_context(config, node, manifest, config.project_name)

    _process_docs_for_node(ctx, node)

说明

以上是一个关于compile 的简单说明，可以方便了解dbt compile 内部的操作，同时结合执行也可以看出run 是依赖此的，需要先编译，后边dbt 裁进行实际的执行，对于CompileTask 内部的compiler 没有说明，后边单独说明下，里边还是比较复杂的

参考资料

core/dbt/task/compile.py （core）
core/dbt/compilation.py （core）

posted on 2024-04-15 00:58 荣锋亮阅读(8) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

rongfengliang-荣锋亮

dbt CompileTask 简单说明

作用

参考处理

说明

参考资料

导航

公告