dbt 默认snapshot snapshot_staging_table 处理简单说明

dbt 默认snapshot 的处理使用了 snapshot_staging_table 中间表,了解内部处理有利于学习快照处理技术

macro 定义

内部的处理实际上就是对于已经存在的快照表数据与目前需要进行快照的查询数据进行比较,会处理添加的,删除的,以及交集数据,dbt
使用了自己的支持key,以及基于scd2 的模型进行数据存储
如下

{% macro snapshot_staging_table(strategy, source_sql, target_relation) -%}
  {{ adapter.dispatch('snapshot_staging_table', 'dbt')(strategy, source_sql, target_relation) }}
{% endmacro %}
 
{% macro default__snapshot_staging_table(strategy, source_sql, target_relation) -%}
 
    with snapshot_query as (
 
        {{ source_sql }}
 
    ),
 
    snapshotted_data as (
 
        select *,
            {{ strategy.unique_key }} as dbt_unique_key
 
        from {{ target_relation }}
        where dbt_valid_to is null
 
    ),
 
    insertions_source_data as (
 
        select
            *,
            {{ strategy.unique_key }} as dbt_unique_key,
            {{ strategy.updated_at }} as dbt_updated_at,
            {{ strategy.updated_at }} as dbt_valid_from,
            nullif({{ strategy.updated_at }}, {{ strategy.updated_at }}) as dbt_valid_to,
            {{ strategy.scd_id }} as dbt_scd_id
 
        from snapshot_query
    ),
 
    updates_source_data as (
 
        select
            *,
            {{ strategy.unique_key }} as dbt_unique_key,
            {{ strategy.updated_at }} as dbt_updated_at,
            {{ strategy.updated_at }} as dbt_valid_from,
            {{ strategy.updated_at }} as dbt_valid_to
 
        from snapshot_query
    ),
 
    {%- if strategy.invalidate_hard_deletes %}
 
    deletes_source_data as (
 
        select
            *,
            {{ strategy.unique_key }} as dbt_unique_key
        from snapshot_query
    ),
    {% endif %}
 
    insertions as (
 
        select
            'insert' as dbt_change_type,
            source_data.*
 
        from insertions_source_data as source_data
        left outer join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where snapshotted_data.dbt_unique_key is null
           or (
                snapshotted_data.dbt_unique_key is not null
            and (
                {{ strategy.row_changed }}
            )
        )
 
    ),
 
    updates as (
 
        select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id
 
        from updates_source_data as source_data
        join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where (
            {{ strategy.row_changed }}
        )
    )
 
    {%- if strategy.invalidate_hard_deletes -%}
    ,
 
    deletes as (
 
        select
            'delete' as dbt_change_type,
            source_data.*,
            {{ snapshot_get_time() }} as dbt_valid_from,
            {{ snapshot_get_time() }} as dbt_updated_at,
            {{ snapshot_get_time() }} as dbt_valid_to,
            snapshotted_data.dbt_scd_id
 
        from snapshotted_data
        left join deletes_source_data as source_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where source_data.dbt_unique_key is null
    )
    {%- endif %}
 
    select * from insertions
    union all
    select * from updates
    {%- if strategy.invalidate_hard_deletes %}
    union all
    select * from deletes
    {%- endif %}
 
{%- endmacro %}
  • scd_id 内部处理
    目前是基于了md5
{% macro snapshot_hash_arguments(args) -%}
  {{ adapter.dispatch('snapshot_hash_arguments', 'dbt')(args) }}
{%- endmacro %}
 
{% macro default__snapshot_hash_arguments(args) -%}
    md5({%- for arg in args -%}
        coalesce(cast({{ arg }} as varchar ), '')
        {% if not loop.last %} || '|' || {% endif %}
    {%- endfor -%})
{%- endmacro %}

参数处理上会结合配置的主键以及更新时间处理(当然可以自己调整)

{% set scd_id_expr = snapshot_hash_arguments([primary_key, updated_at]) %}

说明

了解dbt 内部默认快照的处理机制还是比较重要的,一个是可以学习,同时也有利于我们开发自己的快照实现

参考资料

core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql
core/dbt/include/global_project/macros/materializations/snapshots/snapshot.sql
core/dbt/include/global_project/macros/materializations/snapshots/helpers.sql
https://docs.getdbt.com/reference/snapshot-properties

posted on 2024-05-13 00:30  荣锋亮  阅读(10)  评论(0编辑  收藏  举报

导航