dbt adapter expand_target_column_types 简单说明

adapter expand_target_column_types 核心作用是进行关系的展开对比,确定我们的模型是否需要进行提升(实际上就是数据类型变动)
此功能目前在dbt 的快照以及增量物化场景中使用到

参考使用

  • 简单示例
{% set tmp_relation = adapter.get_relation(...) %}
{% set target_relation = adapter.get_relation(...) %}
 
{% do adapter.expand_target_column_types(tmp_relation, target_relation) %}
  • 增量物化处理
{% if existing_relation is none %}
  {% set build_sql = get_create_table_as_sql(False, target_relation, sql) %}
{% elif full_refresh_mode %}
  {% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}
  {% set need_swap = true %}
{% else %}
{% do run_query(get_create_table_as_sql(True, temp_relation, sql)) %}
{% do adapter.expand_target_column_types(
         from_relation=temp_relation,
         to_relation=target_relation) %}

内部处理

def expand_column_types(self, goal, current):
    reference_columns = {c.name: c for c in self.get_columns_in_relation(goal)}
 
    target_columns = {c.name: c for c in self.get_columns_in_relation(current)}
 
    for column_name, reference_column in reference_columns.items():
        target_column = target_columns.get(column_name)
 
        if target_column is not None and target_column.can_expand_to(reference_column):
            col_string_size = reference_column.string_size()
            new_type = self.Column.string_type(col_string_size)
            fire_event(
                ColTypeChange(
                    orig_type=target_column.data_type,
                    new_type=new_type,
                    table=_make_ref_key_dict(current),
                )
            )
           # 判断需要进行提升的会进行ALTER_COLUMN_TYPE_MACRO_NAME macro 的调用进行类型修改
            self.alter_column_type(current, column_name, new_type)
 
def alter_column_type(self, relation, column_name, new_column_type) -> None:
    """
    1. Create a new column (w/ temp name and correct type)
    2. Copy data over to it
    3. Drop the existing column (cascade!)
    4. Rename the new column to existing column
    """
    kwargs = {
        "relation": relation,
        "column_name": column_name,
        "new_column_type": new_column_type,
    }
    self.execute_macro(ALTER_COLUMN_TYPE_MACRO_NAME, kwargs=kwargs)

具体ALTER_COLUMN_TYPE_MACRO_NAME macro 的处理如下
可以看到为了确保变更的安全,使用了多步处理

{% macro default__alter_column_type(relation, column_name, new_column_type) -%}
  {#
    1. Create a new column (w/ temp name and correct type)
    2. Copy data over to it
    3. Drop the existing column (cascade!)
    4. Rename the new column to existing column
  #}
  {%- set tmp_column = column_name + "__dbt_alter" -%}
 
  {% call statement('alter_column_type') %}
    alter table {{ relation }} add column {{ adapter.quote(tmp_column) }} {{ new_column_type }};
    update {{ relation }} set {{ adapter.quote(tmp_column) }} = {{ adapter.quote(column_name) }};
    alter table {{ relation }} drop column {{ adapter.quote(column_name) }} cascade;
    alter table {{ relation }} rename column {{ adapter.quote(tmp_column) }} to {{ adapter.quote(column_name) }}
  {% endcall %}

说明

此功能还是比较重要的,尤其我们可能会进行数据类型的调整场景,就可以帮助我们自动处理一些ddl了

参考资料

dbt/include/global_project/macros/adapters/columns.sql
https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#expand_target_column_types

posted on 2024-05-20 07:31  荣锋亮  阅读(10)  评论(0编辑  收藏  举报

导航