dbt-部署及上手尝试

DBT

安装:

pip install dbt-core dbt-postgres

确保您已安装 dbt Core,并使用以下命令检查版本:dbt --version

(venv) PS D:\workspace\dbt> dbt --version
Core:
  - installed: 1.5.0
  - latest:    1.5.1 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - postgres: 1.5.0 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

使用以下命令启动项目 :

dbt init jaffle_shop

// 如果windows的话,需要在用户目录下建一个 .dbt 目录 
// mkdir .dbt
//  C:\Users\用户名\.dbt
(venv) PS D:\workspace\dbt> dbt init jaffle_shop
00:48:43  Running with dbt=1.5.0
Which database would you like to use?
[1] postgres

(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)

Enter a number: 1
00:48:47  Profile jaffle_shop written to C:\Users\UOS\.dbt\profiles.yml using target's sample configuration. Once updated, you'll be able to start developing with dbt.
00:48:47
Your new dbt project "jaffle_shop" was created!

For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:

  https://docs.getdbt.com/docs/configure-your-profile

One more thing:

Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:

  https://community.getdbt.com/

Happy modeling!

修改配置,链接数据库:

-- C:\Users\UOS\.dbt\profiles.yml
jaffle_shop:
  outputs:

    dev:
      type: postgres
      threads: 1
      host: 127.0.0.1
      port: 5432
      user: postgres
      pass: haosql
      dbname: postgres
      schema: udo

    prod:
      type: postgres
      threads: 1
      host: 127.0.0.1
      port: 5432
      user: postgres
      pass: haosql
      dbname: postgres
      schema: udo

  target: dev

检测配置: dbt debug

(venv) PS D:\workspace\dbt\jaffle_shop> dbt debug
02:27:38  Running with dbt=1.5.0
02:27:38  dbt version: 1.5.0
02:27:38  python version: 3.8.8
02:27:38  python path: d:\workspace\dbt\venv\scripts\python.exe
02:27:38  os info: Windows-10-10.0.17763-SP0
02:27:38  Using profiles.yml file at C:\Users\UOS\.dbt\profiles.yml
02:27:38  Using dbt_project.yml file at D:\workspace\dbt\jaffle_shop\dbt_project.yml
02:27:38  Configuration:
02:27:38    profiles.yml file [OK found and valid]
02:27:38    dbt_project.yml file [OK found and valid]
02:27:38  Required dependencies:
02:27:38   - git [OK found]

02:27:38  Connection:
02:27:38    host: 127.0.0.1
02:27:38    port: 5432
02:27:38    user: postgres
02:27:38    database: postgres
02:27:38    schema: udo
02:27:38    search_path: None
02:27:38    keepalives_idle: 0
02:27:38    sslmode: None
02:27:38    Connection test: [OK connection ok]

02:27:38  All checks passed!

执行示例模型:dbt run

(venv) PS D:\workspace\dbt\jaffle_shop> dbt run
02:29:14  Running with dbt=1.5.0
02:29:14  Unable to do partial parsing because saved manifest not found. Starting full parse.
02:29:15  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 307 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
02:29:15
02:29:15  Concurrency: 1 threads (target='dev')
02:29:15
02:29:15  1 of 2 START sql table model udo.my_first_dbt_model ............................ [RUN]
02:29:15  1 of 2 OK created sql table model udo.my_first_dbt_model ....................... [SELECT 2 in 0.17s]
02:29:15  2 of 2 START sql view model udo.my_second_dbt_model ............................ [RUN]
02:29:16  2 of 2 OK created sql view model udo.my_second_dbt_model ....................... [CREATE VIEW in 0.10s]
02:29:16
02:29:16  Finished running 1 table model, 1 view model in 0 hours 0 minutes and 0.56 seconds (0.56s).
02:29:16
02:29:16  Completed successfully
02:29:16
02:29:16  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Build DBT Project

资源Sources

可以暂时理解为是我们各个主要的业务数据库。

数据库模型 model

SQL 模型

  • 模型配置
  • 模型属性
  • run command
  • ref function

SQL模型是一个语句,模型在文件中定义(通常在目录中):

  • 每个文件包含一个模型/语句: .sql select
  • 模型名称继承自文件名
  • 模型可以嵌套在目录内的子目录中

当执行dbt run,dbt将构建这个数据模型。

这里新建了一个简单的模型:

-- 计算书籍阅读人数排行
{{ config(materialized='table') }}

with book_ranks as(
    select u_bookid ,count(distinct u_userid) as cnt_rs from udo.t_read_history 
    group by u_bookid order  by cnt_rs desc 
)

select u_bookid,cnt_rs,t2.u_bookname,t2.u_bookdesc,t2.u_booktypecode,t2.u_author
from book_ranks t1 
join
udo.t_bookstore t2
on t1.u_bookid=t2.u_id

dbt这里会对应的构建一个脚本:


  create  table "postgres"."udo"."task3__dbt_tmp"  
    as
  (
    -- 计算书籍阅读人数排行
with book_ranks as(
    select u_bookid ,count(distinct u_userid) as cnt_rs from udo.t_read_history 
    group by u_bookid order  by cnt_rs desc 
)

select u_bookid,cnt_rs,t2.u_bookname,t2.u_bookdesc,t2.u_booktypecode,t2.u_author
from book_ranks t1 
join
udo.t_bookstore t2
on t1.u_bookid=t2.u_id
  );
  

执行完成后,数据库里会新增一个task3的表,存储上面的结果。

note:

  1. 在哪里可以看到执行的sql:

dbt CLI:

  • 目录: target/compiled/select
  • 目录: target/run/create
  • 日志文件: logs/dbt.log
  1. 是否需要提前创建结果表:

dbt会自动检测,如果不存在就创建目录。

模型配置

可以在dbt_project.yml 和具体的模型SQL中配置。

  • 改变模型的持久化方式
  • 将模型构建为单独的架构??
  • 创建模型的标签

dbt_project.yml 示例:

name: jaffle_shop
config-version: 2

models:
  jaffle_shop: # this matches the `name:`` config
    +materialized: view # this applies to all models in the current project
    marts:
      +materialized: table # this applies to all models in the `marts/` directory
      marketing:
        +schema: marketing # this applies to all models in the `marts/marketing/`` directory

在具体的模型里配置:


{{ config(
    materialized="view",
    schema="marketing"
) }}

with customer_orders as ...

请务必注意,配置是分层应用的 — 应用于子目录的配置将覆盖任何常规配置。

note:

  • dbt模型结果输出方式: view,table,incremental,ephemeral

模型依赖关系

可以使用 ref 函数代替查询中的表名来构建模型之间的依赖关系

with customers as (

    select * from {{ ref('stg_customers') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

note:

  • 通过创建DAG来确定运行模型的顺序
  • 推荐模块化转换,重用模型减少复用

dbt docs

通过dbt docs可以生成项目的资料。

  • 首先执行:dbt docs generate
  • 启动服务: dbt docs serve

可以在浏览器中查看:http://localhost:8080/#!/overview

posted @ 2023-06-11 19:04  人人从众  阅读(488)  评论(0编辑  收藏  举报