从源码看dbt-loom 是如何开发dbt plugin的

以前简单介绍过dbt plugin 有一套明确的格式，实际上与dbt adapter 套路是类似的，以下从源码角度看看dbt-loom 的设计

dbt-loom 参考架构

从此图可以看出，dbt-loom 可以从其他地方获取dbt 的元数据（dbt cloud，本地文件，对象存储），之后基于此插件结合dbt-core 可以进行元数据的合并，之后执行的时候创建对应的模型到数据仓库中

代码说明

代码结构

dbt_loom

├── __init__.py

├── clients

│   ├── az_blob.py

│   ├── dbt_cloud.py

│   ├── gcs.py

│   └── s3.py

├── config.py

└── manifests.py

python 包命名

应该是dbt- 格式的，这个是plugin的要求，pyproject.toml 如下

[tool.poetry]

name = "dbt-loom"

version = "0.5.1"

插件入口
默认需要提供一个plugins 的属性
init.py 中

plugins = [dbtLoom]

dbtLoom dbtPlugin 实现

可以看到实现了initialize，get_nodes，但是没有实现get_manifest_artifacts

class dbtLoom(dbtPlugin):

    """

    dbtLoom is a dbt plugin that loads manifest files, parses a DAG from the manifest,

    and injects public nodes from imported manifest.

    """
 
    def __init__(self, project_name: str):

        configuration_path = Path(

            os.environ.get("DBT_LOOM_CONFIG", "dbt_loom.config.yml")

        )

        # 此处是dbt-loom 自己定义的一个ManifestLoader，有点类似dbt-core 的ManifestLoader

        self._manifest_loader = ManifestLoader()
 
        self.config: Optional[dbtLoomConfig] = self.read_config(configuration_path)

        self.models: Dict[str, ModelNodeArgs] = {}
 
        import dbt.contracts.graph.manifest
 
        fire_event(

            Note(

                msg="dbt-loom: Patching ref protection methods to support dbt-loom dependencies."

            )

        )

        dbt.contracts.graph.manifest.Manifest.is_invalid_protected_ref = (  # type: ignore

            self.dependency_wrapper(

                dbt.contracts.graph.manifest.Manifest.is_invalid_protected_ref

            )

        )

        dbt.contracts.graph.manifest.Manifest.is_invalid_private_ref = (  # type: ignore

            self.dependency_wrapper(

                dbt.contracts.graph.manifest.Manifest.is_invalid_private_ref

            )

        )
 
        super().__init__(project_name)
 
    def dependency_wrapper(self, function) -> Callable:

        def outer_function(inner_self, node, target_model, dependencies) -> bool:

            if self.config is not None:

                for manifest in self.config.manifests:

                    dependencies[manifest.name] = LoomRunnableConfig()
 
            return function(inner_self, node, target_model, dependencies)
 
        return outer_function
 
    def read_config(self, path: Path) -> Optional[dbtLoomConfig]:

        """Read the dbt-loom configuration file."""

        if not path.exists():

            return None
 
        with open(path) as file:

            config_content = file.read()
 
        config_content = self.replace_env_variables(config_content)
 
        return dbtLoomConfig(**yaml.load(config_content, yaml.SafeLoader))
 
    @staticmethod

    def replace_env_variables(config_str: str) -> str:

        """Replace environment variable placeholders in the configuration string."""

        pattern = r"\$(\w+)|\$\{([^}]+)\}"

        return re.sub(

            pattern,

            lambda match: os.environ.get(

                match.group(1) if match.group(1) is not None else match.group(2), ""

            ),

            config_str,

        )

   # initialize 方法，此处核心是进行外部元数据的load

    def initialize(self) -> None:

        """Initialize the plugin"""
 
        if self.models != {} or not self.config:

            return
 
        for manifest_reference in self.config.manifests:

            fire_event(

                Note(

                    msg=f"dbt-loom: Loading manifest for `{manifest_reference.name}`"

                    f" from `{manifest_reference.type.value}`"

                )

            )
 
            manifest = self._manifest_loader.load(manifest_reference)

            if manifest is None:

                continue
 
            selected_nodes = identify_node_subgraph(manifest)

            self.models.update(convert_model_nodes_to_model_node_args(selected_nodes))

  # dbt_hook 装饰器

    @dbt_hook

    def get_nodes(self) -> PluginNodes:

        """

        Inject PluginNodes to dbt for injection into dbt's DAG.

        """

        fire_event(Note(msg="dbt-loom: Injecting nodes"))

       # 基于加载的元数据构建自己的node

        return PluginNodes(models=self.models)

loader 处理
核心是读取配置中不同的路径（s3，api，本地）具体可以查看ManifestLoader

说明

dbt-loom 实际上为我们提供了一个比较完整的dbt plugin 开发说明，实际上我们基于此可以完成一些比较有意思的功能，以上是基于配置的元数据，实际上这些元数据也可以是其他数据库的，这样就可以基于数据库自动生成模型的元数据了，比较类似dbt codegen 包的一些能力

参考资料

dbt_loom/manifests.py
https://github.com/nicholasyager/dbt-loom
https://nicholasyager.com/2023/08/dbt_plugin_api.html

posted on 2024-06-04 07:32 荣锋亮阅读(26) 评论(0) 编辑收藏举报

刷新页面返回顶部

rongfengliang-荣锋亮

从源码看dbt-loom 是如何开发dbt plugin的

dbt-loom 参考架构

代码说明

说明

参考资料

导航

公告