ZhangZhihui's Blog  

 

In this section, we are going to explore some of the DuckDB niceties – little shortcuts and tweaks to make your use of DuckDB easier.

D CREATE OR REPLACE TABLE skiers AS
  SELECT *
  FROM read_csv('skiers.csv');

 

A very common SQL shortcut is the SELECT * syntax. Like in many databases, * is a SQL shortcut that’s interpreted by DuckDB as a request to return all the columns.

One small but very useful affordance of DuckDB is the use of a final trailing comma in a multi-column selection; this does not raise a syntax error.

DuckDB can also explicitly omit columns with the EXCLUDE syntax. To ignore the skiers’ ages and heights, we can simply specify these as columns to exclude:

SELECT s.* EXCLUDE(skier_age, skier_height)
FROM skiers AS s;

 

复制代码
D SELECT s.*
      REPLACE(round(skier_age / 10) * 10)::INTEGER AS skier_age
  FROM skiers AS s;
┌──────────────────┬─────────────────┬───────────┬──────────────┬────────────────────┬─────────────────┐
│ skier_first_name │ skier_last_name │ skier_age │ skier_height │ skier_helmet_color │ skier_bib_color │
│     varcharvarchar     │   int32   │    int64     │      varcharvarchar     │
├──────────────────┼─────────────────┼───────────┼──────────────┼────────────────────┼─────────────────┤
│ Alice            │ Smith           │        10152 │ red                │ black           │
│ Bob              │ Blaese          │        20178 │ blue               │ yellow          │
│ Carol            │ Wilson          │        30159 │ yellow             │ pink            │
│ Dan              │ Jones           │        50182 │ red                │ yellow          │
│ Erin             │ Taylor          │        20168 │ black              │ green           │
│ Frank            │ Williams        │        20187 │ yellow             │ red             │
│ Grace            │ Miller          │        20172 │ pink               │ black           │
│ Heidi            │ Johnson         │        20178 │ yellow             │ yellow          │
│ Ivan             │ Brown           │        20185 │ green              │ pink            │
│ Judy             │ Moore           │        30160 │ red                │ black           │
├──────────────────┴─────────────────┴───────────┴──────────────┴────────────────────┴─────────────────┤
│ 10 rows                                                                                    6 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
复制代码

 

复制代码
D SELECT skier_first_name, COLUMNS('.*color$')
  FROM skiers;
┌──────────────────┬────────────────────┬─────────────────┐
│ skier_first_name │ skier_helmet_color │ skier_bib_color │
│     varcharvarcharvarchar     │
├──────────────────┼────────────────────┼─────────────────┤
│ Alice            │ red                │ black           │
│ Bob              │ blue               │ yellow          │
│ Carol            │ yellow             │ pink            │
│ Dan              │ red                │ yellow          │
│ Erin             │ black              │ green           │
│ Frank            │ yellow             │ red             │
│ Grace            │ pink               │ black           │
│ Heidi            │ yellow             │ yellow          │
│ Ivan             │ green              │ pink            │
│ Judy             │ red                │ black           │
├──────────────────┴────────────────────┴─────────────────┤
│ 10 rows                                       3 columns │
└─────────────────────────────────────────────────────────┘
复制代码

 

复制代码
D SELECT skier_first_name, COLUMNS('.*color.*')
  FROM skiers
  WHERE COLUMNS('.*color.*') = 'yellow';
┌──────────────────┬────────────────────┬─────────────────┐
│ skier_first_name │ skier_helmet_color │ skier_bib_color │
│     varcharvarcharvarchar     │
├──────────────────┼────────────────────┼─────────────────┤
│ Heidi            │ yellow             │ yellow          │
└──────────────────┴────────────────────┴─────────────────┘
复制代码

This is equivalent to combining the candidate columns in a where clause, such as skier_helmet_color='yellow' AND skier_bib_color='yellow'. This is a powerful capability where you can apply the same predicate to a large collection of fields, without the need to list each one individually.

 

复制代码
D SELECT CONCAT(
          LPAD(
              UPPER(
                  CONCAT_WS(' ', skier_first_name, skier_last_name)
              ),
              20,
              '>'
          ),
          '.'
      ) AS skier_name
  FROM skiers;
┌───────────────────────┐
│      skier_name       │
│        varchar        │
├───────────────────────┤
│ >>>>>>>>>ALICE SMITH. │
│ >>>>>>>>>>BOB BLAESE. │
│ >>>>>>>>CAROL WILSON. │
│ >>>>>>>>>>>DAN JONES. │
│ >>>>>>>>>ERIN TAYLOR. │
│ >>>>>>FRANK WILLIAMS. │
│ >>>>>>>>GRACE MILLER. │
│ >>>>>>>HEIDI JOHNSON. │
│ >>>>>>>>>>IVAN BROWN. │
│ >>>>>>>>>>JUDY MOORE. │
├───────────────────────┤
│        10 rows        │
└───────────────────────┘
复制代码

 

复制代码
D SELECT CONCAT_WS(' ', skier_first_name, skier_last_name)
      .UPPER()
      .LPAD(20, '>')
      .CONCAT('.') AS skier_name
  FROM skiers;
┌───────────────────────┐
│      skier_name       │
│        varchar        │
├───────────────────────┤
│ >>>>>>>>>ALICE SMITH. │
│ >>>>>>>>>>BOB BLAESE. │
│ >>>>>>>>CAROL WILSON. │
│ >>>>>>>>>>>DAN JONES. │
│ >>>>>>>>>ERIN TAYLOR. │
│ >>>>>>FRANK WILLIAMS. │
│ >>>>>>>>GRACE MILLER. │
│ >>>>>>>HEIDI JOHNSON. │
│ >>>>>>>>>>IVAN BROWN. │
│ >>>>>>>>>>JUDY MOORE. │
├───────────────────────┤
│        10 rows        │
└───────────────────────┘
复制代码

 

D CREATE UNIQUE INDEX skier_unique ON skiers (skier_first_name);

 

D INSERT INTO skiers(skier_first_name, skier_helmet_color)
  SELECT 'Kim' AS skier_first_name, 'blue' AS skier_helmet_color;

 

D INSERT INTO skiers BY NAME
  SELECT 'green' AS skier_helmet_color,
      'red' AS skier_bib_color,
      'Liam' AS skier_first_name;

 

D INSERT OR REPLACE INTO skiers BY NAME
  SELECT 'Liam' AS skier_first_name, 'black' AS skier_helmet_color;

 

 

D CREATE OR REPLACE TABLE scores AS
  SELECT *
  FROM read_csv('skier_scores.csv');

 

复制代码
D SELECT df1.skier_first_name,
      df1.skier_last_name,
      df2.score
  FROM skiers AS df1
  POSITIONAL JOIN scores AS df2;
┌──────────────────┬─────────────────┬───────┐
│ skier_first_name │ skier_last_name │ score │
│     varcharvarchar     │ int64 │
├──────────────────┼─────────────────┼───────┤
│ Alice            │ Smith           │     8 │
│ Bob              │ Blaese          │     9 │
│ Carol            │ Wilson          │     4 │
│ Dan              │ Jones           │     7 │
│ Erin             │ Taylor          │     6 │
│ Frank            │ Williams        │     9 │
│ Grace            │ Miller          │     9 │
│ Heidi            │ Johnson         │     5 │
│ Ivan             │ Brown           │     7 │
│ Judy             │ Moore           │     8 │
│ Kim              │                 │       │
│ Liam             │                 │       │
├──────────────────┴─────────────────┴───────┤
│ 12 rows                          3 columns │
└────────────────────────────────────────────┘
复制代码

The result of POSITIONAL JOIN gives us a query result that matches the skier to the correct score.

 

D CREATE OR REPLACE TABLE weather AS
  SELECT *
  FROM read_csv('weather.csv', timestampformat='%Y-%m-%d %H:%M:%S');

 

复制代码
D SELECT *
  FROM weather
  LIMIT 10;
┌─────────────────────┬────────────┬───────┐
│  measurement_time   │ wind_speed │ temp  │
│      timestamp      │   int64    │ int64 │
├─────────────────────┼────────────┼───────┤
│ 2023-12-01 10:00:007-9 │
│ 2023-12-01 11:00:0019-8 │
│ 2023-12-01 12:00:008-6 │
│ 2023-12-01 13:00:009-3 │
│ 2023-12-01 14:00:0010-2 │
│ 2023-12-01 15:00:0010-2 │
│ 2023-12-01 16:00:0012-2 │
│ 2023-12-01 17:00:0012-3 │
│ 2023-12-01 18:00:0012-3 │
│ 2023-12-01 19:00:0011-4 │
├─────────────────────┴────────────┴───────┤
│ 10 rows                        3 columns │
└──────────────────────────────────────────┘
复制代码

 

复制代码
D WITH weather_cte AS(
      SELECT measurement_time,
          wind_speed,
          temp,
          LEAD(measurement_time, 1) OVER (
              ORDER BY measurement_time
          ) AS measurement_end
      FROM weather
      ORDER BY measurement_time
  )
  SELECT *
  FROM weather_cte
  WHERE TIMESTAMP '2023-12-01 10:01:00' BETWEEN measurement_time AND measurement_end;
┌─────────────────────┬────────────┬───────┬─────────────────────┐
│  measurement_time   │ wind_speed │ temp  │   measurement_end   │
│      timestamp      │   int64    │ int64 │      timestamp      │
├─────────────────────┼────────────┼───────┼─────────────────────┤
│ 2023-12-01 10:00:007-92023-12-01 11:00:00 │
└─────────────────────┴────────────┴───────┴─────────────────────┘
复制代码

 

复制代码
D SELECT *
  FROM scores AS s ASOF JOIN weather AS w
  ON s.score_time >= w.measurement_time;
┌─────────────────────┬───────┬─────────────────────┬────────────┬───────┐
│     score_time      │ score │  measurement_time   │ wind_speed │ temp  │
│      timestamp      │ int64 │      timestamp      │   int64    │ int64 │
├─────────────────────┼───────┼─────────────────────┼────────────┼───────┤
│ 2023-12-01 10:01:0082023-12-01 10:00:007-9 │
│ 2023-12-01 10:42:0092023-12-01 10:00:007-9 │
│ 2023-12-01 11:24:0042023-12-01 11:00:0019-8 │
│ 2023-12-01 14:23:0072023-12-01 14:00:0010-2 │
│ 2023-12-01 15:22:0062023-12-01 15:00:0010-2 │
│ 2023-12-01 15:41:0092023-12-01 15:00:0010-2 │
│ 2023-12-02 10:21:0092023-12-02 10:00:008-4 │
│ 2023-12-02 11:01:0052023-12-02 11:00:008-2 │
│ 2023-12-02 12:23:0072023-12-02 12:00:0010-1 │
│ 2023-12-02 13:06:0082023-12-02 13:00:00100 │
├─────────────────────┴───────┴─────────────────────┴────────────┴───────┤
│ 10 rows                                                      5 columns │
└────────────────────────────────────────────────────────────────────────┘
复制代码

 

 

复制代码
D SELECT s.*,
      bar(w.wind_speed, 0, 20, 20) AS wind_bar_plot,
      w.wind_speed
  FROM scores AS s
  ASOF JOIN weather AS w ON s.score_time >= w.measurement_time;
┌─────────────────────┬───────┬─────────────────────┬────────────┐
│     score_time      │ score │    wind_bar_plot    │ wind_speed │
│      timestamp      │ int64 │       varchar       │   int64    │
├─────────────────────┼───────┼─────────────────────┼────────────┤
│ 2023-12-01 10:01:008 │ ███████             │          7 │
│ 2023-12-01 10:42:009 │ ███████             │          7 │
│ 2023-12-01 11:24:004 │ ███████████████████ │         19 │
│ 2023-12-01 14:23:007 │ ██████████          │         10 │
│ 2023-12-01 15:22:006 │ ██████████          │         10 │
│ 2023-12-01 15:41:009 │ ██████████          │         10 │
│ 2023-12-02 10:21:009 │ ████████            │          8 │
│ 2023-12-02 11:01:005 │ ████████            │          8 │
│ 2023-12-02 12:23:007 │ ██████████          │         10 │
│ 2023-12-02 13:06:008 │ ██████████          │         10 │
├─────────────────────┴───────┴─────────────────────┴────────────┤
│ 10 rows                                              4 columns │
└────────────────────────────────────────────────────────────────┘
复制代码

 

D CREATE OR REPLACE TABLE wines AS
  SELECT *
  FROM read_csv('wines.csv');

 

复制代码
D SELECT *
  FROM wines
  ORDER BY wine_id;
┌─────────┬────────────────────┬──────────────┐
│ wine_id │     wine_name      │ sub_class_of │
│  int64  │      varchar       │    int64     │
├─────────┼────────────────────┼──────────────┤
│       1 │ French wine        │              │
│       2 │ Red                │            1 │
│       3 │ White              │            1 │
│       4 │ White Bordeaux     │            3 │
│       5 │ Mission Haut Blanc │            4 │
│       6 │ Haut Brion Blanc   │            4 │
│       7 │ Red Bordeaux       │            2 │
│       8 │ Latour             │            7 │
│       9 │ Rothschild         │            7 │
└─────────┴────────────────────┴──────────────┘
复制代码

 

复制代码
D WITH RECURSIVE wine_hierarchy(wine_id, start_with, wine_path) AS(
      SELECT wine_id,
          wine_name,
          [wine_name] AS wine_path
      FROM wines
      WHERE sub_class_of IS NULL
      UNION ALL
      SELECT wines.wine_id,
          wines.wine_name,
          list_prepend(wines.wine_name, wine_hierarchy.wine_path)
      FROM wines,
          wine_hierarchy
      WHERE wines.sub_class_of = wine_hierarchy.wine_id
  )
  SELECT wine_path
  FROM wine_hierarchy
  WHERE start_with = 'Rothschild';
┌──────────────────────────────────────────────┐
│                  wine_path                   │
│                  varchar[]                   │
├──────────────────────────────────────────────┤
│ [Rothschild, Red Bordeaux, Red, French wine] │
└──────────────────────────────────────────────┘
复制代码

 

D CREATE OR REPLACE MACRO unit_price(price, capacity) AS round(price / capacity, 3);

 

D CREATE OR REPLACE TABLE wine_prices AS
  SELECT *
  FROM read_csv('wine_prices.csv');

 

复制代码
D SELECT wine_name, price, capacity_ml
  FROM wine_prices;
┌───────────────────┬────────┬─────────────┐
│     wine_name     │ price  │ capacity_ml │
│      varchardouble │    int64    │
├───────────────────┼────────┼─────────────┤
│ French Rose       │   16.5750 │
│ French Pinot Noir │   17.0750 │
│ Rose              │   18.41000 │
└───────────────────┴────────┴─────────────┘
复制代码

 

复制代码
D SELECT wine_name,
      price,
      capacity_ml,
      unit_price(price, capacity_ml) AS price_ml
  FROM wine_prices;
┌───────────────────┬────────┬─────────────┬──────────┐
│     wine_name     │ price  │ capacity_ml │ price_ml │
│      varchardouble │    int64    │  double  │
├───────────────────┼────────┼─────────────┼──────────┤
│ French Rose       │   16.57500.022 │
│ French Pinot Noir │   17.07500.023 │
│ Rose              │   18.410000.018 │
└───────────────────┴────────┴─────────────┴──────────┘
复制代码

The calculation for unit_price has been applied to each column. Macros can be considered a useful mechanism for encapsulation logic that can be shared.

posted on   ZhangZhihuiAAA  阅读(9)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
历史上的今天:
2024-01-16 Redis - List Use Cases
2024-01-16 Redis - HyperLogLog Use Cases
2024-01-16 Redis - SORT
 
点击右上角即可分享
微信分享提示