dbt CompileTask 简单说明

以前简单介绍过dbt 的manifest Compiler 模块,以下说明下dbt 的CompileTask cli task

作用

核心是对于解析生成的Manifest 进行编译,同时还会对于编译结果写入target 目录,同时还需要进行db 链接进行一个check 处理(比如schema 信息获取的)
类图 (dbt 不少命令都基本如下的继承模式),run 命令继承了CompileTask


类方法

 

参考处理

  • 简单说明
    因为基于了继承的玩法,CompileTask 实际上主要是一些辅助方法,核心的方法是在run 中执行的(BaseTask的抽象方法),默认实现是在
    GraphRunnableTask 中的,dbt 基本没有任务的执行会有一个runner 实现
  • GraphRunnableTask 的runner
    通过一个方法get_runner_type 说明的
def get_runner_type(self, _):
    return CompileRunner
  • get_runner_type 的使用
    对于不同的task 会有不同的执行模式,实际上是类似实现了一个task pool,方便执行的执行处理
  • 实际run 执行处理
def run(self):
        """
        Run dbt for the query, based on the graph.
        """
        # We set up a context manager here with "task_contextvars" because we
        # need the project_root in runtime_initialize.
        with task_contextvars(project_root=self.config.project_root):
            # 首先运行时初始化
            self._runtime_initialize()
 
            if self._flattened_nodes is None:
                raise DbtInternalError(
                    "after _runtime_initialize, _flattened_nodes was still None"
                )
 
            if len(self._flattened_nodes) == 0:
                with TextOnly():
                    fire_event(Formatting(""))
                warn_or_error(NothingToDo())
                result = self.get_result(
                    results=[],
                    generated_at=datetime.utcnow(),
                    elapsed_time=0.0,
                )
            else:
                with TextOnly():
                    fire_event(Formatting(""))
                selected_uids = frozenset(n.unique_id for n in self._flattened_nodes)
                result = self.execute_with_hooks(selected_uids)
 
        # We have other result types here too, including FreshnessResult
        if isinstance(result, RunExecutionResult):
            result_msgs = [result.to_msg_dict() for result in result.results]
            fire_event(
                EndRunResult(
                    results=result_msgs,
                    generated_at=result.generated_at.strftime("%Y-%m-%dT%H:%M:%SZ"),
                    elapsed_time=result.elapsed_time,
                    success=GraphRunnableTask.interpret_results(result.results),
                )
            )
 
        if self.args.write_json:
           # 写入manifest 文件到target 中
            write_manifest(self.manifest, self.config.project_target_path)
            if hasattr(result, "write"):
                result.write(self.result_path())
 
        self.task_end_messages(result.results)
        return result
  • GraphRunnableTask _runtime_initialize 默认处理
    实际CompileTask 会覆盖出方法,同时也调用了此实现,主要是一些前置处理
def _runtime_initialize(self):
     # 实际的manifest 编译处理,后续单独说明内部处理
    self.compile_manifest()
    if self.manifest is None or self.graph is None:
        raise DbtInternalError("_runtime_initialize never loaded the graph!")
 
    self.job_queue = self.get_graph_queue()
 
    # we use this a couple of times. order does not matter.
    self._flattened_nodes = []
    for uid in self.job_queue.get_selected_nodes():
        if uid in self.manifest.nodes:
            self._flattened_nodes.append(self.manifest.nodes[uid])
        elif uid in self.manifest.sources:
            self._flattened_nodes.append(self.manifest.sources[uid])
        elif uid in self.manifest.saved_queries:
            self._flattened_nodes.append(self.manifest.saved_queries[uid])
        elif uid in self.manifest.unit_tests:
            self._flattened_nodes.append(self.manifest.unit_tests[uid])
        else:
            raise DbtInternalError(
                f"Node selection returned {uid}, expected a node, a source, or a unit test"
            )
 
    self.num_nodes = len([n for n in self._flattened_nodes if not n.is_ephemeral_model])
  • CompileTask _runtime_initialize 实现
def _runtime_initialize(self):
    if getattr(self.args, "inline", None):
        try:
            block_parser = SqlBlockParser(
                project=self.config, manifest=self.manifest, root_project=self.config
            )
            sql_node = block_parser.parse_remote(self.args.inline, "inline_query")
           # 会进行ref,source 以及docs 的处理
            process_node(self.config, self.manifest, sql_node)
            # keep track of the node added to the manifest
            self._inline_node_id = sql_node.unique_id
        except CompilationError as exc:
            fire_event(
                ParseInlineNodeError(
                    exc=str(exc.msg),
                    node_info={
                        "node_path": "sql/inline_query",
                        "node_name": "inline_query",
                        "unique_id": "sqloperation.test.inline_query",
                        "node_status": "failed",
                    },
                )
            )
            raise DbtException("Error parsing inline query")
    super()._runtime_initialize()
  • process_node 处理
    实际上还是属于Manifest 元数据信息的生成,以及依赖关系的处理
def process_node(config: RuntimeConfig, manifest: Manifest, node: ManifestNode):
    _process_sources_for_node(manifest, config.project_name, node)
    _process_refs(manifest, config.project_name, node, config.dependencies)
    ctx = generate_runtime_docs_context(config, node, manifest, config.project_name)
    _process_docs_for_node(ctx, node)

说明

以上是一个关于compile 的简单说明,可以方便了解dbt compile 内部的操作,同时结合执行也可以看出run 是依赖此的,需要先编译,后边dbt 裁进行实际的执行,对于CompileTask 内部的compiler 没有说明,后边单独说明下,里边还是比较复杂的

参考资料

core/dbt/task/compile.py (core)
core/dbt/compilation.py (core)

posted on 2024-04-15 00:58  荣锋亮  阅读(8)  评论(0编辑  收藏  举报

导航