dbt 默认snapshot snapshot_staging_table 处理简单说明
dbt 默认snapshot 的处理使用了 snapshot_staging_table 中间表,了解内部处理有利于学习快照处理技术
macro 定义
内部的处理实际上就是对于已经存在的快照表数据与目前需要进行快照的查询数据进行比较,会处理添加的,删除的,以及交集数据,dbt
使用了自己的支持key,以及基于scd2 的模型进行数据存储
如下
{% macro snapshot_staging_table(strategy, source_sql, target_relation) -%}
{{ adapter.dispatch('snapshot_staging_table', 'dbt')(strategy, source_sql, target_relation) }}
{% endmacro %}
{% macro default__snapshot_staging_table(strategy, source_sql, target_relation) -%}
with snapshot_query as (
{{ source_sql }}
),
snapshotted_data as (
select *,
{{ strategy.unique_key }} as dbt_unique_key
from {{ target_relation }}
where dbt_valid_to is null
),
insertions_source_data as (
select
*,
{{ strategy.unique_key }} as dbt_unique_key,
{{ strategy.updated_at }} as dbt_updated_at,
{{ strategy.updated_at }} as dbt_valid_from,
nullif({{ strategy.updated_at }}, {{ strategy.updated_at }}) as dbt_valid_to,
{{ strategy.scd_id }} as dbt_scd_id
from snapshot_query
),
updates_source_data as (
select
*,
{{ strategy.unique_key }} as dbt_unique_key,
{{ strategy.updated_at }} as dbt_updated_at,
{{ strategy.updated_at }} as dbt_valid_from,
{{ strategy.updated_at }} as dbt_valid_to
from snapshot_query
),
{%- if strategy.invalidate_hard_deletes %}
deletes_source_data as (
select
*,
{{ strategy.unique_key }} as dbt_unique_key
from snapshot_query
),
{% endif %}
insertions as (
select
'insert' as dbt_change_type,
source_data.*
from insertions_source_data as source_data
left outer join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
where snapshotted_data.dbt_unique_key is null
or (
snapshotted_data.dbt_unique_key is not null
and (
{{ strategy.row_changed }}
)
)
),
updates as (
select
'update' as dbt_change_type,
source_data.*,
snapshotted_data.dbt_scd_id
from updates_source_data as source_data
join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
where (
{{ strategy.row_changed }}
)
)
{%- if strategy.invalidate_hard_deletes -%}
,
deletes as (
select
'delete' as dbt_change_type,
source_data.*,
{{ snapshot_get_time() }} as dbt_valid_from,
{{ snapshot_get_time() }} as dbt_updated_at,
{{ snapshot_get_time() }} as dbt_valid_to,
snapshotted_data.dbt_scd_id
from snapshotted_data
left join deletes_source_data as source_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
where source_data.dbt_unique_key is null
)
{%- endif %}
select * from insertions
union all
select * from updates
{%- if strategy.invalidate_hard_deletes %}
union all
select * from deletes
{%- endif %}
{%- endmacro %}
- scd_id 内部处理
目前是基于了md5
{% macro snapshot_hash_arguments(args) -%}
{{ adapter.dispatch('snapshot_hash_arguments', 'dbt')(args) }}
{%- endmacro %}
{% macro default__snapshot_hash_arguments(args) -%}
md5({%- for arg in args -%}
coalesce(cast({{ arg }} as varchar ), '')
{% if not loop.last %} || '|' || {% endif %}
{%- endfor -%})
{%- endmacro %}
参数处理上会结合配置的主键以及更新时间处理(当然可以自己调整)
{% set scd_id_expr = snapshot_hash_arguments([primary_key, updated_at]) %}
说明
了解dbt 内部默认快照的处理机制还是比较重要的,一个是可以学习,同时也有利于我们开发自己的快照实现
参考资料
core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql
core/dbt/include/global_project/macros/materializations/snapshots/snapshot.sql
core/dbt/include/global_project/macros/materializations/snapshots/helpers.sql
https://docs.getdbt.com/reference/snapshot-properties
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
2023-05-13 golang netlink 方便的网络管理包
2020-05-13 基于VictoriaMetrics的prometheus 集群监控报警方案
2020-05-13 VictoriaMetrics vmalert 重启状态的处理
2020-05-13 karma prometheus alertmanager dashboard简单试用
2020-05-13 kthxbye一种解决prometheus alertmanager 报警确认的守护进程
2020-05-13 karma 一个不错的prometheus alertmanager dashboard
2019-05-13 gitbase cube.js schema 文件项目