dbt dremio nessie create table branch 问题解决简单说明
以前是简单说明过解决方法,以下通过实际代码,说明下解决
dbt dremio 集成nessie 内部处理简单说明
注意目前说明的场景是基于nessie 作为数据库配置的,而不是dbt 标准的space 玩法
- 整体处理
如下图,实际上和业务模型的定义有关系,包含了view,以及table,当然还有属于dremio 特有的反射
- view 模式的
因为数据模型直接是以view 存储在nessie 中,默认没有进行branch 的处理,如果需要可以通过配置解决,写入特定的branch(基于config)
- table 模式
处理上实际上就是先物化(CTAS 到s3 中),然后基于双胞胎策略,写入一个view 到nessie 中,如下图,如果需要支持branch 就需要调整
双胞胎策略macro 了
参考解决方法
- view 模式的
结合上边说的,直接修改view 物化定义就可以了,扩展下支持branch (commit 也是可以的), view 定义放到macro 中,dbt 有自己的macro 搜索顺序
{% materialization view, adapter='dremio' %}
{{log("user my view materialization" ~ model, info=True)}}
{%- set identifier = model['alias'] -%}
{%- set twin_strategy = config.get('twin_strategy', validator=validation.any[basestring]) or 'clone' -%}
{%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}
{%- set exists_as_view = (old_relation is not none and old_relation.is_view) -%}
{%- set target_relation = api.Relation.create(
identifier=identifier, schema=schema, database=database, type='view') -%}
{% set grant_config = config.get('grants') %}
{{ run_hooks(pre_hooks) }}
-- If there's a table with the same name and we weren't told to full refresh,
-- that's an error. If we were told to full refresh, drop it. This behavior differs
-- for Snowflake and BigQuery, so multiple dispatch is used.
{%- if old_relation is not none and old_relation.is_table -%}
{{ handle_existing_table(should_full_refresh(), old_relation) }}
{%- endif -%}
-- build model
{% call statement('main') -%}
# 扩展支持了branch,通过改写了 target_relation
{% if config.get('branch') %}
{% set target_relation = target_relation ~ ' AT ' ~ 'branch ' ~ config.get('branch') %}
{% endif %}
{{ create_view_as(target_relation, external_query(sql)) }}
{%- endcall %}
{{ apply_twin_strategy(target_relation) }}
{{ enable_default_reflection() }}
{% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}
{{ run_hooks(post_hooks) }}
{{ return({'relations': [target_relation]}) }}
{% endmaterialization %}
- table 模式的
结合上边说的,就是修改双胞胎策略,此文件也可以放到项目的macro 中
{%- macro apply_twin_strategy(target_relation) -%}
{{log('apply_twin_strategy target_relation ' ~ target_relation , info=True)}}
{%- set twin_strategy = config.get('twin_strategy', validator=validation.any[basestring]) or 'clone' -%}
{%- if target_relation.type == 'view' -%}
{%- if twin_strategy != 'allow' -%}
{%- set table_relation = api.Relation.create(
identifier=generate_alias_name_impl(model.name, config.get('file', validator=validation.any[basestring]), model),
schema=generate_schema_name_impl(target.root_path, config.get('root_path', validator=validation.any[basestring]), model),
database=generate_database_name_impl(target.datalake, config.get('datalake', validator=validation.any[basestring]), model),
type='table') -%}
{{ adapter.drop_relation(table_relation) }}
{{log('twin_strategy for view ' ~ table_relation , info=True)}}
{%- endif -%}
{%- elif target_relation.type == 'table' -%}
{%- if twin_strategy in ['prevent', 'clone'] -%}
{%- set view_relation = api.Relation.create(
identifier=generate_alias_name_impl(model.name, config.get('alias', validator=validation.any[basestring]), model),
schema=generate_schema_name_impl(target.schema, config.get('schema', validator=validation.any[basestring]), model),
database=generate_database_name_impl(target.database, config.get('database', validator=validation.any[basestring]), model),
type='view') -%}
{{log('twin_strategy for table ' ~ view_relation , info=True)}}
{% if config.get('branch') %}
{% set view_relation = view_relation ~ ' AT ' ~ 'branch ' ~ config.get('branch') %}
{% endif %}
{%- if twin_strategy == 'prevent' -%}
{{ adapter.drop_relation(view_relation) }}
{%- elif twin_strategy == 'clone' -%}
{%- set sql_view -%}
select *
from {{ render_with_format_clause(target_relation) }}
{%- endset -%}
{% call statement('clone_view') -%}
{{log('new view_relation' ~ view_relation , info=True)}}
{{ create_view_as(view_relation, sql_view) }}
{%- endcall %}
{%- endif -%}
{%- endif -%}
{%- endif -%}
{%- endmacro -%}
使用
{{
config(
materialized = "table",
alias = "dalong_sensor",
branch = "prod",
)
}}
select * from {{source("public","sensor")}}
参考日志信息
说明
以上是一个对于nessie branch 支持的简单说明,实际可以包装为一个package,同时可以结合pacakge 的dispatch 能力,使用自定义的macro,dbt 的扩展能力还是很强的,我们可以灵活的进行改写实现方便的模型控制
参考资料
https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch