dbt seed 配置简单说明

dbt 的seed是进行一些初始化数据建模的,可以方便测试,或者在一些场景提供基础数据,dbt
对于seed 支持不少配置,比如seed 的schema 位置,csv 分隔符处理,seed 中数据类型定义,当然还有
一些dbt 的通用配置(比如tag,meta,tests),我以前简单说明过seed 的处理,以下说明一些关于配置处理的

定义类型处理

  • 类型定义
seeds:
  jaffle_shop:
    country_codes:
      +column_types:
        country_code: varchar(2)
        country_name: varchar(32)
  • 列定义
version: 2
 
seeds:
  - name: <string>
    description: <markdown_string>
    docs:
      show: true | false
      node_color: <color_id> # Use name (such as node_color: purple) or hex code with quotes (such as node_color: "#cd7f32")
    config:
      <seed_config>: <config_value>
    tests:
      - <test>
      - ... # declare additional tests
    columns:
      - name: <column name>
        description: <markdown_string>
        meta: {<dictionary>}
        quote: true | false
        tags: [<string>]
        tests:
          - <test>
          - ... # declare additional tests
 
      - name: ... # declare properties of additional columns
 
  - name: ... # declare properties of additional seeds

内部处理

实际上就是以前说明的,只是有一些特殊的地方就是关于类型的

@contextmember()
def load_agate_table(self) -> "agate.Table":
    from dbt_common.clients import agate_helper
 
    if not isinstance(self.model, SeedNode):
        raise LoadAgateTableNotSeedError(self.model.resource_type, node=self.model)
 
    # include package_path for seeds defined in packages
    package_path = (
        os.path.join(self.config.packages_install_path, self.model.package_name)
        if self.model.package_name != self.config.project_name
        else "."
    )
    path = os.path.join(self.config.project_root, package_path, self.model.original_file_path)
    if not os.path.exists(path):
        assert self.model.root_path
        path = os.path.join(self.model.root_path, self.model.original_file_path)
    # 配置指定的类型
    column_types = self.model.config.column_types
   # 配置指定的分隔符
    delimiter = self.model.config.delimiter
    try:
      # 读取使用自定义定义
        table = agate_helper.from_csv(path, text_columns=column_types, delimiter=delimiter)
    except ValueError as e:
        raise LoadAgateTableValueError(e, node=self.model)
    table.original_abspath = os.path.abspath(path)
    return table

说明

实际上大部分内容与以前的是一样的,只是添加了对于seed 配置以及类型处理的说明

参考资料

https://docs.getdbt.com/reference/seed-properties
https://docs.getdbt.com/reference/resource-configs/column_types
https://docs.getdbt.com/reference/resource-configs/delimiter

posted on 2024-06-12 05:38  荣锋亮  阅读(12)  评论(0编辑  收藏  举报

导航