dbt dremio nessie create table branch 问题解决简单说明

以前是简单说明过解决方法,以下通过实际代码,说明下解决

dbt dremio 集成nessie 内部处理简单说明

注意目前说明的场景是基于nessie 作为数据库配置的,而不是dbt 标准的space 玩法

  • 整体处理

如下图,实际上和业务模型的定义有关系,包含了view,以及table,当然还有属于dremio 特有的反射

  • view 模式的

因为数据模型直接是以view 存储在nessie 中,默认没有进行branch 的处理,如果需要可以通过配置解决,写入特定的branch(基于config)

  • table 模式

处理上实际上就是先物化(CTAS 到s3 中),然后基于双胞胎策略,写入一个view 到nessie 中,如下图,如果需要支持branch 就需要调整
双胞胎策略macro 了

参考解决方法

  • view 模式的

结合上边说的,直接修改view 物化定义就可以了,扩展下支持branch (commit 也是可以的), view 定义放到macro 中,dbt 有自己的macro 搜索顺序

{% materialization view, adapter='dremio' %}
  {{log("user my view materialization" ~ model, info=True)}}
  {%- set identifier = model['alias'] -%}
  {%- set twin_strategy = config.get('twin_strategy', validator=validation.any[basestring]) or 'clone' -%}
  {%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}
 
  {%- set exists_as_view = (old_relation is not none and old_relation.is_view) -%}
 
  {%- set target_relation = api.Relation.create(
      identifier=identifier, schema=schema, database=database, type='view') -%}
 
  {% set grant_config = config.get('grants') %}
 
  {{ run_hooks(pre_hooks) }}
 
  -- If there's a table with the same name and we weren't told to full refresh,
  -- that's an error. If we were told to full refresh, drop it. This behavior differs
  -- for Snowflake and BigQuery, so multiple dispatch is used.
  {%- if old_relation is not none and old_relation.is_table -%}
    {{ handle_existing_table(should_full_refresh(), old_relation) }}
  {%- endif -%}
 
  -- build model
  {% call statement('main') -%}
   # 扩展支持了branch,通过改写了 target_relation
    {% if config.get('branch') %}
        {% set target_relation = target_relation ~ ' AT ' ~  'branch ' ~ config.get('branch')  %}
    {% endif %}
    {{ create_view_as(target_relation, external_query(sql)) }}
  {%- endcall %}
 
  {{ apply_twin_strategy(target_relation) }}
 
  {{ enable_default_reflection() }}
 
  {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}
 
  {{ run_hooks(post_hooks) }}
 
  {{ return({'relations': [target_relation]}) }}
 
{% endmaterialization %}
  • table 模式的

结合上边说的,就是修改双胞胎策略,此文件也可以放到项目的macro 中

{%- macro apply_twin_strategy(target_relation) -%}
  {{log('apply_twin_strategy target_relation ' ~ target_relation , info=True)}}
  {%- set twin_strategy = config.get('twin_strategy', validator=validation.any[basestring]) or 'clone' -%}
  {%- if target_relation.type == 'view' -%}
    {%- if twin_strategy != 'allow' -%}
      {%- set table_relation = api.Relation.create(
          identifier=generate_alias_name_impl(model.name, config.get('file', validator=validation.any[basestring]), model),
          schema=generate_schema_name_impl(target.root_path, config.get('root_path', validator=validation.any[basestring]), model),
          database=generate_database_name_impl(target.datalake, config.get('datalake', validator=validation.any[basestring]), model),
          type='table') -%}
      {{ adapter.drop_relation(table_relation) }}
      {{log('twin_strategy for view  ' ~ table_relation , info=True)}}
 
    {%- endif -%}
  {%- elif target_relation.type == 'table' -%}
    {%- if twin_strategy in ['prevent', 'clone'] -%}
      {%- set view_relation = api.Relation.create(
          identifier=generate_alias_name_impl(model.name, config.get('alias', validator=validation.any[basestring]), model),
          schema=generate_schema_name_impl(target.schema, config.get('schema', validator=validation.any[basestring]), model),
          database=generate_database_name_impl(target.database, config.get('database', validator=validation.any[basestring]), model),
          type='view') -%}
      {{log('twin_strategy for table ' ~ view_relation , info=True)}}
      {% if config.get('branch') %}
        {% set view_relation = view_relation ~ ' AT ' ~  'branch ' ~ config.get('branch')  %}
      {% endif %}
      {%- if twin_strategy == 'prevent' -%}
        {{ adapter.drop_relation(view_relation) }}
      {%- elif twin_strategy == 'clone' -%}
        {%- set sql_view -%}
          select *
          from {{ render_with_format_clause(target_relation) }}
        {%- endset -%}
        {% call statement('clone_view') -%}
          {{log('new view_relation' ~ view_relation , info=True)}}
          {{ create_view_as(view_relation, sql_view) }}
        {%- endcall %}
      {%- endif -%}
    {%- endif -%}
  {%- endif -%}
{%- endmacro -%}

使用

{{
config(
    materialized = "table",
    alias = "dalong_sensor",
    branch = "prod",
)
}}
select * from {{source("public","sensor")}}

参考日志信息

说明

以上是一个对于nessie branch 支持的简单说明,实际可以包装为一个package,同时可以结合pacakge 的dispatch 能力,使用自定义的macro,dbt 的扩展能力还是很强的,我们可以灵活的进行改写实现方便的模型控制

参考资料

https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch

posted on 2024-07-21 01:21  荣锋亮  阅读(7)  评论(0编辑  收藏  举报

导航