dbt manifest Compiler 简单说明
包含了与处理以及实际的编译处理compile 以及compile_node 在compilation 模块中的Compiler 类中
主要使用的模块
主要是CompileRunner.compile, GenericRPCRunner.compile, RunTask.get_hook_sql 这几处,可以看出我们是可以自己基于
GenericRPCRunner.compile 进行编程处理的(ci/cd 中)
处理简单说明
dbt 对于编译处理的结果会进行存储目录在target 中
- 初始化文件夹
主要包含了项目级别的target 以及项目的依赖package
def initialize(self):
make_directory(self.config.project_target_path)
make_directory(self.config.packages_install_path)
- 预处理
主要是通过Linker 创建networkx graph,之后写graph.gpickle file, 输出统计信息,返回对象图
def compile(self, manifest: Manifest, write=True, add_test_edges=False) -> Graph:
self.initialize()
linker = Linker()
linker.link_graph(manifest)
# Create a file containing basic information about graph structure,
# supporting diagnostics and performance analysis.
summaries: Dict = dict()
summaries["_invocation_id"] = get_invocation_id()
summaries["linked"] = linker.get_graph_summary(manifest)
# This is only called for the "build" command
if add_test_edges:
manifest.build_parent_and_child_maps()
linker.add_test_edges(manifest)
# Create another diagnostic summary, just as above, but this time
# including the test edges.
summaries["with_test_edges"] = linker.get_graph_summary(manifest)
with open(
os.path.join(self.config.project_target_path, "graph_summary.json"), "w"
) as out_stream:
try:
out_stream.write(json.dumps(summaries))
except Exception as e: # This is non-essential information, so merely note failures.
fire_event(
Note(
msg=f"An error was encountered writing the graph summary information: {e}"
)
)
stats = _generate_stats(manifest)
if write:
self.write_graph_file(linker, manifest)
# Do not print these for list command
if self.config.args.which != "list":
stats = _generate_stats(manifest)
print_compile_stats(stats)
return Graph(linker.graph)
- compile_node 处理
def compile_node(
self,
node: ManifestSQLNode,
manifest: Manifest,
extra_context: Optional[Dict[str, Any]] = None,
write: bool = True,
) -> ManifestSQLNode:
"""This is the main entry point into this code. It's called by
CompileRunner.compile, GenericRPCRunner.compile, and
RunTask.get_hook_sql. It calls '_compile_code' to render
the node's raw_code into compiled_code, and then calls the
recursive method to "prepend" the ctes.
"""
if isinstance(node, UnitTestDefinition):
return node
# Make sure Lexer for sqlparse 0.4.4 is initialized
from sqlparse.lexer import Lexer # type: ignore
if hasattr(Lexer, "get_default_instance"):
Lexer.get_default_instance()
# 可以看出会进行两次处理,一次是基于jinja 的渲染处理,一次是基于递归的ctes 处理 (主要是对于ephemeral_ctes 的处理)
node = self._compile_code(node, manifest, extra_context)
node, _ = self._recursively_prepend_ctes(node, manifest, extra_context)
if write:
# 按照处理的处理还是会进行文件写入的,实际上就是到上边的target 目录下
self._write_node(node)
return node
- jinja render 处理
def get_rendered(
string: str,
ctx: Dict[str, Any],
node=None,
capture_macros: bool = False,
native: bool = False,
) -> Any:
# performance optimization: if there are no jinja control characters in the
# string, we can just return the input. Fall back to jinja if the type is
# not a string or if native rendering is enabled (so '1' -> 1, etc...)
# If this is desirable in the native env as well, we could handle the
# native=True case by passing the input string to ast.literal_eval, like
# the native renderer does.
has_render_chars = not isinstance(string, str) or _HAS_RENDER_CHARS_PAT.search(string)
if not has_render_chars:
if not native:
return string
elif string in _render_cache:
return _render_cache[string]
# 使用dbt common 中包装的jinja 操作,对于macro 部分的处理我后边介绍,dbt 在处理模版的时候与大家日常的少有不同,它直接使用的是字符串处理,并没有使用一些loader 实现,dbt 的manifest 已经包含了所有需要的macro raw code 信息,所以直接字符串就可以了,实际处理还使用了jinja 模版的make_module 动态加载macro 功能的处理,当然为了性能,内部实现进行了cache 处理
template = get_template(
string,
ctx,
node,
capture_macros=capture_macros,
native=native,
)
rendered = render_template(template, ctx, node)
if not has_render_chars and native:
_render_cache[string] = rendered
return rendered
一个效果
比如dremio 编译的文件信息
说明
dbt 的编译实际就是进行与处理,基于模版引擎的替换(jinja),以上是一个简单说明,实际模版渲染macro 部分的处理简单说明了下,后边会详细解决macro 的处理
参考资料
core/dbt/compilation.py
core/dbt/clients/jinja.py
core/dbt/context/providers.py
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
2023-04-11 arroyo sql 处理
2023-04-11 arroyo single docker 镜像说明
2023-04-11 arroyo 开发说明
2023-04-11 arroyo集群部署简单说明
2023-04-11 arroyo 组件简单说明
2023-04-11 arroyo+redpanda 集成试用
2021-04-11 cube.js 配置自定义basePath 扩展cube.js 多租户处理