dbt manifest Compiler 简单说明

包含了与处理以及实际的编译处理compile 以及compile_node 在compilation 模块中的Compiler 类中

主要使用的模块

主要是CompileRunner.compile, GenericRPCRunner.compile, RunTask.get_hook_sql 这几处，可以看出我们是可以自己基于
GenericRPCRunner.compile 进行编程处理的（ci/cd 中）

处理简单说明

dbt 对于编译处理的结果会进行存储目录在target 中

初始化文件夹
主要包含了项目级别的target 以及项目的依赖package

 
def initialize(self):

    make_directory(self.config.project_target_path)

    make_directory(self.config.packages_install_path)

预处理
主要是通过Linker 创建networkx graph，之后写graph.gpickle file, 输出统计信息,返回对象图

 
def compile(self, manifest: Manifest, write=True, add_test_edges=False) -> Graph:

    self.initialize()

    linker = Linker()

    linker.link_graph(manifest)
 
    # Create a file containing basic information about graph structure,

    # supporting diagnostics and performance analysis.

    summaries: Dict = dict()

    summaries["_invocation_id"] = get_invocation_id()

    summaries["linked"] = linker.get_graph_summary(manifest)
 
    # This is only called for the "build" command

    if add_test_edges:

        manifest.build_parent_and_child_maps()

        linker.add_test_edges(manifest)
 
        # Create another diagnostic summary, just as above, but this time

        # including the test edges.

        summaries["with_test_edges"] = linker.get_graph_summary(manifest)
 
    with open(

        os.path.join(self.config.project_target_path, "graph_summary.json"), "w"

    ) as out_stream:

        try:

            out_stream.write(json.dumps(summaries))

        except Exception as e:  # This is non-essential information, so merely note failures.

            fire_event(

                Note(

                    msg=f"An error was encountered writing the graph summary information: {e}"

                )

            )
 
    stats = _generate_stats(manifest)
 
    if write:

        self.write_graph_file(linker, manifest)
 
    # Do not print these for list command

    if self.config.args.which != "list":

        stats = _generate_stats(manifest)

        print_compile_stats(stats)
 
    return Graph(linker.graph)

compile_node 处理

 def compile_node(

        self,

        node: ManifestSQLNode,

        manifest: Manifest,

        extra_context: Optional[Dict[str, Any]] = None,

        write: bool = True,

    ) -> ManifestSQLNode:

        """This is the main entry point into this code. It's called by

        CompileRunner.compile, GenericRPCRunner.compile, and

        RunTask.get_hook_sql. It calls '_compile_code' to render

        the node's raw_code into compiled_code, and then calls the

        recursive method to "prepend" the ctes.

        """

        if isinstance(node, UnitTestDefinition):

            return node
 
        # Make sure Lexer for sqlparse 0.4.4 is initialized

        from sqlparse.lexer import Lexer  # type: ignore

        if hasattr(Lexer, "get_default_instance"):

            Lexer.get_default_instance()

        #  可以看出会进行两次处理，一次是基于jinja 的渲染处理，一次是基于递归的ctes 处理 （主要是对于ephemeral_ctes 的处理）

        node = self._compile_code(node, manifest, extra_context)
 
        node, _ = self._recursively_prepend_ctes(node, manifest, extra_context)

        if write:

           # 按照处理的处理还是会进行文件写入的，实际上就是到上边的target 目录下

            self._write_node(node)

        return node

jinja render 处理

def get_rendered(

    string: str,

    ctx: Dict[str, Any],

    node=None,

    capture_macros: bool = False,

    native: bool = False,

) -> Any:

    # performance optimization: if there are no jinja control characters in the

    # string, we can just return the input. Fall back to jinja if the type is

    # not a string or if native rendering is enabled (so '1' -> 1, etc...)

    # If this is desirable in the native env as well, we could handle the

    # native=True case by passing the input string to ast.literal_eval, like

    # the native renderer does.

    has_render_chars = not isinstance(string, str) or _HAS_RENDER_CHARS_PAT.search(string)
 
    if not has_render_chars:

        if not native:

            return string

        elif string in _render_cache:

            return _render_cache[string]

   # 使用dbt common 中包装的jinja 操作，对于macro 部分的处理我后边介绍，dbt 在处理模版的时候与大家日常的少有不同，它直接使用的是字符串处理，并没有使用一些loader 实现,dbt 的manifest 已经包含了所有需要的macro raw code 信息，所以直接字符串就可以了，实际处理还使用了jinja 模版的make_module 动态加载macro 功能的处理，当然为了性能，内部实现进行了cache 处理

    template = get_template(

        string,

        ctx,

        node,

        capture_macros=capture_macros,

        native=native,

    )
 
    rendered = render_template(template, ctx, node)
 
    if not has_render_chars and native:

        _render_cache[string] = rendered
 
    return rendered

一个效果

比如dremio 编译的文件信息

说明

dbt 的编译实际就是进行与处理，基于模版引擎的替换（jinja），以上是一个简单说明，实际模版渲染macro 部分的处理简单说明了下，后边会详细解决macro 的处理

参考资料

core/dbt/compilation.py
core/dbt/clients/jinja.py
core/dbt/context/providers.py

posted on 2024-04-11 07:11 荣锋亮阅读(10) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· dbt CompileTask 简单说明

· dbt click包执行上下文manifest 处理简单说明

· Java 解释执行与JIT小记

· LLVM12-学习手册-全-

· 第一章 CLR执行模型

阅读排行：
· 全程不用写代码，我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了，比商业数据库还牛
· .NET10 - 预览版1新功能体验（一）

历史上的今天：
2023-04-11 arroyo sql 处理
2023-04-11 arroyo single docker 镜像说明
2023-04-11 arroyo 开发说明
2023-04-11 arroyo集群部署简单说明
2023-04-11 arroyo 组件简单说明
2023-04-11 arroyo+redpanda 集成试用
2021-04-11 cube.js 配置自定义basePath 扩展cube.js 多租户处理

rongfengliang-荣锋亮

dbt manifest Compiler 简单说明

主要使用的模块

处理简单说明

一个效果

说明

参考资料

导航

公告

搜索

常用链接

最新随笔

积分与排名

随笔分类 (3865)

随笔档案 (4865)

文章分类 (205)

文章档案 (175)

.net 安全揭秘

DB

geohash 学习

graphql

IE 浏览器

IIS

IOT

open xml

REST 设计

sharepoint

sql server CLR

SSIS 学习

UML

vsto

web

Web service

windows 服务

插件开发

复杂事件处理

技术

类库

流量分析

敏捷

移动

运维

阅读排行榜

评论排行榜

推荐排行榜

最新评论