dbt 默认snapshot snapshot_staging_table 处理简单说明

dbt 默认snapshot 的处理使用了 snapshot_staging_table 中间表,了解内部处理有利于学习快照处理技术

macro 定义

内部的处理实际上就是对于已经存在的快照表数据与目前需要进行快照的查询数据进行比较,会处理添加的,删除的,以及交集数据,dbt
使用了自己的支持key,以及基于scd2 的模型进行数据存储
如下

{% macro snapshot_staging_table(strategy, source_sql, target_relation) -%}
  {{ adapter.dispatch('snapshot_staging_table', 'dbt')(strategy, source_sql, target_relation) }}
{% endmacro %}
 
{% macro default__snapshot_staging_table(strategy, source_sql, target_relation) -%}
 
    with snapshot_query as (
 
        {{ source_sql }}
 
    ),
 
    snapshotted_data as (
 
        select *,
            {{ strategy.unique_key }} as dbt_unique_key
 
        from {{ target_relation }}
        where dbt_valid_to is null
 
    ),
 
    insertions_source_data as (
 
        select
            *,
            {{ strategy.unique_key }} as dbt_unique_key,
            {{ strategy.updated_at }} as dbt_updated_at,
            {{ strategy.updated_at }} as dbt_valid_from,
            nullif({{ strategy.updated_at }}, {{ strategy.updated_at }}) as dbt_valid_to,
            {{ strategy.scd_id }} as dbt_scd_id
 
        from snapshot_query
    ),
 
    updates_source_data as (
 
        select
            *,
            {{ strategy.unique_key }} as dbt_unique_key,
            {{ strategy.updated_at }} as dbt_updated_at,
            {{ strategy.updated_at }} as dbt_valid_from,
            {{ strategy.updated_at }} as dbt_valid_to
 
        from snapshot_query
    ),
 
    {%- if strategy.invalidate_hard_deletes %}
 
    deletes_source_data as (
 
        select
            *,
            {{ strategy.unique_key }} as dbt_unique_key
        from snapshot_query
    ),
    {% endif %}
 
    insertions as (
 
        select
            'insert' as dbt_change_type,
            source_data.*
 
        from insertions_source_data as source_data
        left outer join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where snapshotted_data.dbt_unique_key is null
           or (
                snapshotted_data.dbt_unique_key is not null
            and (
                {{ strategy.row_changed }}
            )
        )
 
    ),
 
    updates as (
 
        select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id
 
        from updates_source_data as source_data
        join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where (
            {{ strategy.row_changed }}
        )
    )
 
    {%- if strategy.invalidate_hard_deletes -%}
    ,
 
    deletes as (
 
        select
            'delete' as dbt_change_type,
            source_data.*,
            {{ snapshot_get_time() }} as dbt_valid_from,
            {{ snapshot_get_time() }} as dbt_updated_at,
            {{ snapshot_get_time() }} as dbt_valid_to,
            snapshotted_data.dbt_scd_id
 
        from snapshotted_data
        left join deletes_source_data as source_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where source_data.dbt_unique_key is null
    )
    {%- endif %}
 
    select * from insertions
    union all
    select * from updates
    {%- if strategy.invalidate_hard_deletes %}
    union all
    select * from deletes
    {%- endif %}
 
{%- endmacro %}
  • scd_id 内部处理
    目前是基于了md5
{% macro snapshot_hash_arguments(args) -%}
  {{ adapter.dispatch('snapshot_hash_arguments', 'dbt')(args) }}
{%- endmacro %}
 
{% macro default__snapshot_hash_arguments(args) -%}
    md5({%- for arg in args -%}
        coalesce(cast({{ arg }} as varchar ), '')
        {% if not loop.last %} || '|' || {% endif %}
    {%- endfor -%})
{%- endmacro %}

参数处理上会结合配置的主键以及更新时间处理(当然可以自己调整)

{% set scd_id_expr = snapshot_hash_arguments([primary_key, updated_at]) %}

说明

了解dbt 内部默认快照的处理机制还是比较重要的,一个是可以学习,同时也有利于我们开发自己的快照实现

参考资料

core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql
core/dbt/include/global_project/macros/materializations/snapshots/snapshot.sql
core/dbt/include/global_project/macros/materializations/snapshots/helpers.sql
https://docs.getdbt.com/reference/snapshot-properties

posted on   荣锋亮  阅读(19)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2023-05-13 golang netlink 方便的网络管理包
2020-05-13 基于VictoriaMetrics的prometheus 集群监控报警方案
2020-05-13 VictoriaMetrics vmalert 重启状态的处理
2020-05-13 karma prometheus alertmanager dashboard简单试用
2020-05-13 kthxbye一种解决prometheus alertmanager 报警确认的守护进程
2020-05-13 karma 一个不错的prometheus alertmanager dashboard
2019-05-13 gitbase cube.js schema 文件项目

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示