ZhangZhihui's Blog  

 

复制代码
D SELECT * FROM duckdb_extensions();
┌──────────────────┬─────────┬───────────┬──────────────┬───┬───────────────────┬───────────────────┬───────────────────┬────────────────┐
│  extension_name  │ loaded  │ installed │ install_path │ … │      aliases      │ extension_version │   install_mode    │ installed_from │
│     varchar      │ boolean │  boolean  │   varchar    │   │     varchar[]varcharvarcharvarchar     │
├──────────────────┼─────────┼───────────┼──────────────┼───┼───────────────────┼───────────────────┼───────────────────┼────────────────┤
│ arrow            │ false   │ false     │              │ … │ []                │                   │                   │                │
│ autocomplete     │ true    │ true      │ (BUILT-IN)   │ … │ []                │                   │ STATICALLY_LINKED │                │
│ aws              │ false   │ false     │              │ … │ []                │                   │                   │                │
│ azure            │ false   │ false     │              │ … │ []                │                   │                   │                │
│ delta            │ false   │ false     │              │ … │ []                │                   │                   │                │
│ excel            │ false   │ false     │              │ … │ []                │                   │                   │                │
│ fts              │ true    │ true      │ (BUILT-IN)   │ … │ []                │ v1.1.3            │ STATICALLY_LINKED │                │
│ httpfs           │ false   │ false     │              │ … │ [http, https, s3] │                   │                   │                │
│ iceberg          │ false   │ false     │              │ … │ []                │                   │                   │                │
│ icu              │ true    │ true      │ (BUILT-IN)   │ … │ []                │ v1.1.3            │ STATICALLY_LINKED │                │
│ inet             │ false   │ false     │              │ … │ []                │                   │                   │                │
│ jemalloc         │ true    │ true      │ (BUILT-IN)   │ … │ []                │ v1.1.3            │ STATICALLY_LINKED │                │
│ json             │ true    │ true      │ (BUILT-IN)   │ … │ []                │ v1.1.3            │ STATICALLY_LINKED │                │
│ motherduck       │ false   │ false     │              │ … │ [md]              │                   │                   │                │
│ mysql_scanner    │ false   │ false     │              │ … │ [mysql]           │                   │                   │                │
│ parquet          │ true    │ true      │ (BUILT-IN)   │ … │ []                │ v1.1.3            │ STATICALLY_LINKED │                │
│ postgres_scanner │ false   │ false     │              │ … │ [postgres]        │                   │                   │                │
│ shell            │ true    │ true      │              │ … │ []                │                   │ STATICALLY_LINKED │                │
│ spatial          │ false   │ false     │              │ … │ []                │                   │                   │                │
│ sqlite_scanner   │ false   │ false     │              │ … │ [sqlite, sqlite3] │                   │                   │                │
│ substrait        │ false   │ false     │              │ … │ []                │                   │                   │                │
│ tpcds            │ false   │ false     │              │ … │ []                │                   │                   │                │
│ tpch             │ true    │ true      │ (BUILT-IN)   │ … │ []                │ v1.1.3            │ STATICALLY_LINKED │                │
│ vss              │ false   │ false     │              │ … │ []                │                   │                   │                │
├──────────────────┴─────────┴───────────┴──────────────┴───┴───────────────────┴───────────────────┴───────────────────┴────────────────┤
│ 24 rows                                                                                                            9 columns (8 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
复制代码

 

 

D INSTALL sqlite;
100% ▕████████████████████████████████████████████████████████████▏
D LOAD sqlite;

Executing the INSTALL command causes DuckDB to download the sqlite extension and then install it into your local DuckDB installation. The LOAD command is needed to enable the extension for use in your current session.

复制代码
D SELECT extension_name, installed, loaded
  FROM duckdb_extensions()
  WHERE extension_name = 'sqlite_scanner';
┌────────────────┬───────────┬─────────┐
│ extension_name │ installed │ loaded  │
│    varchar     │  boolean  │ boolean │
├────────────────┼───────────┼─────────┤
│ sqlite_scanner │ true      │ true    │
└────────────────┴───────────┴─────────┘
复制代码

With the extension installed, DuckDB has now acquired the new ability to read an external SQLite database. We can open the my_sqlite.db SQLite database file using the ATTACH command:

D ATTACH 'my_sqlite.db' (TYPE sqlite);

D SELECT *
  FROM my_sqlite.countries_sqlite;
Invalid Error: Failed to prepare query "SELECT type FROM sqlite_master WHERE lower(name)=lower('countries_sqlite');": file is not a database

 

D USE my_sqlite;
D SHOW TABLES;
Invalid Error: Failed to prepare query "SELECT name FROM sqlite_master WHERE type='table'": file is not a database

用右键另存为下载的my_sqlite.db文件有问题,重新下载raw文件后这个error不再出现,但似乎这个数据库文件中没有表:

复制代码
D LOAD sqlite;
D ATTACH 'my_sqlite.db' (TYPE sqlite);
D SHOW TABLES;
┌─────────┐
│  name   │
│ varchar │
├─────────┤
│ 0 rows  │
└─────────┘
D SHOW ALL TABLES;
INTERNAL Error: Information loss on integer cast: value -5760601708572726771 outside of target range [-9223372036854775808, 9223372036854775807]
This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors
复制代码

 

复制代码
D LOAD sqlite;
D ATTACH 'my_sqlite.db' (TYPE sqlite);
D USE my_sqlite;
D SHOW TABLES;
┌──────────────────┐
│       name       │
│     varchar      │
├──────────────────┤
│ countries_sqlite │
└──────────────────┘
复制代码

 

In DuckDB, the difference between the SHOW TABLES and SHOW ALL TABLES commands is subtle but important. Here's how they differ:

1. SHOW TABLES:

  • This command lists all the user-defined tables in the currently active schema.
  • It will only show tables that have been explicitly created by the user (not system or internal tables).

2. SHOW ALL TABLES:

  • This command provides a broader overview and lists all tables, including system tables and user-defined tables within the current schema.
  • It will also show any internal or hidden tables used by DuckDB for its internal operations.

 

复制代码
D SELECT *
  FROM my_sqlite.countries_sqlite;
┌─────────┬──────────────────────────────────┐
│ country │               name               │
│ varcharvarchar              │
├─────────┼──────────────────────────────────┤
│ AD      │ Andorra                          │
│ AE      │ United Arab Emirates             │
│ AF      │ Afghanistan                      │
│ AG      │ Antigua and Barbuda              │
│ AI      │ Anguilla                         │
│ AL      │ Albania                          │
│ AM      │ Armenia                          │
│ AN      │ Netherlands Antilles             │
│ AO      │ Angola                           │
│ AQ      │ Antarctica                       │
│ AR      │ Argentina                        │
│ AS      │ American Samoa                   │
│ AT      │ Austria                          │
│ AU      │ Australia                        │
│ AW      │ Aruba                            │
│ AZ      │ Azerbaijan                       │
│ BA      │ Bosnia and Herzegovina           │
│ BB      │ Barbados                         │
│ BD      │ Bangladesh                       │
│ BE      │ Belgium                          │
│ ·       │   ·                              │
│ ·       │   ·                              │
│ ·       │   ·                              │
│ UG      │ Uganda                           │
│ UM      │ U.S. Minor Outlying Islands      │
│ US      │ United States                    │
│ UY      │ Uruguay                          │
│ UZ      │ Uzbekistan                       │
│ VA      │ Vatican City                     │
│ VC      │ Saint Vincent and the Grenadines │
│ VE      │ Venezuela                        │
│ VG      │ British Virgin Islands           │
│ VI      │ U.S. Virgin Islands              │
│ VN      │ Vietnam                          │
│ VU      │ Vanuatu                          │
│ WF      │ Wallis and Futuna                │
│ WS      │ Samoa                            │
│ XK      │ Kosovo                           │
│ YE      │ Yemen                            │
│ YT      │ Mayotte                          │
│ ZA      │ South Africa                     │
│ ZM      │ Zambia                           │
│ ZW      │ Zimbabwe                         │
├─────────┴──────────────────────────────────┤
│ 245 rows (40 shown)              2 columns │
└────────────────────────────────────────────┘
复制代码

 

D INSTALL httpfs;
100% ▕████████████████████████████████████████████████████████████▏
D LOAD httpfs;

Note that the httpfs extension is capable of being automatically installed and loaded. This means that you can skip the preceding steps and DuckDB will automatically install and load this extension the first time you issue a query or statement that makes use of it.

复制代码
D SELECT *
  FROM read_csv('https://www2.census.gov/programs-surveys/popest/datasets/2020-2022/cities/totals/sub-est2022.csv')
  LIMIT 5;
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬───┬─────────┬───────────────────┬─────────────────┬─────────────────┬─────────────────┐
│ SUMLEV  │  STATE  │ COUNTY  │  PLACE  │ COUSUB  │ CONCIT  │ … │ STNAME  │ ESTIMATESBASE2020 │ POPESTIMATE2020 │ POPESTIMATE2021 │ POPESTIMATE2022 │
│ varcharvarcharvarcharvarcharvarcharvarchar │   │ varchar │       int64       │      int64      │      int64      │      int64      │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼───┼─────────┼───────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ 04001000000000000000000   │ … │ Alabama │           5024356503136250498465074296 │
│ 16201000001240000000000   │ … │ Alabama │              2355235623612366 │
│ 16201000004600000000000   │ … │ Alabama │              4372436042924224 │
│ 16201000004840000000000   │ … │ Alabama │               661659666669 │
│ 16201000006760000000000   │ … │ Alabama │               227226226221 │
├─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴───┴─────────┴───────────────────┴─────────────────┴─────────────────┴─────────────────┤
│ 5 rows                                                                                                                      14 columns (11 shown) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
复制代码

 

复制代码
D CREATE OR REPLACE SECRET mysecret (
      TYPE S3,
      REGION 'us-east-1',
      ENDPOINT 's3.amazonaws.com'
  );
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ true    │
└─────────┘
复制代码

 

复制代码
D SELECT *
  FROM read_parquet('s3://duckdb-s3-bucket-public/countries.parquet')
  WHERE name SIMILAR TO '.*Republic.*';
┌─────────┬──────────────────────────┐
│ country │           name           │
│ varcharvarchar          │
├─────────┼──────────────────────────┤
│ CF      │ Central African Republic │
│ CG      │ Congo [Republic]         │
│ CZ      │ Czech Republic           │
│ DO      │ Dominican Republic       │
└─────────┴──────────────────────────┘
复制代码

This bucket is public, meaning we didn’t need to authenticate ourselves to get access to the file. If the file is not publicly accessible, the DuckDB httpfs extension can be configured to set an access key ID and secret access key. You can learn more about configuring secrets and the methods for handling multiple secrets for a variety of cloud service providers in the DuckDB documentation (https://duckdb.org/docs/sql/statements/create_secret).

Once you have set these with appropriate credentials, not only will you be able to read from cloud storage that requires authentication, but you’ll also be able to write files to cloud storage—provided, of course, you have the appropriate permissions to do so.

 

复制代码
D CREATE OR REPLACE SEQUENCE book_details_seq;
D CREATE OR REPLACE TABLE book_details AS
  SELECT nextval('book_details_seq') AS book_details_id,
      Title AS book_title,
      description AS book_description
  FROM read_csv('../C04/books_data.csv');
D SUMMARIZE book_details;
┌──────────────────┬─────────────┬──────────────────────┬──────────────────────┬───────────────┬───┬─────────┬─────────┬────────┬─────────────────┐
│   column_name    │ column_type │         minmax          │ approx_unique │ … │   q50   │   q75   │ count  │ null_percentage │
│     varcharvarcharvarcharvarchar        │     int64     │   │ varcharvarchar │ int64  │  decimal(9,2)   │
├──────────────────┼─────────────┼──────────────────────┼──────────────────────┼───────────────┼───┼─────────┼─────────┼────────┼─────────────────┤
│ book_details_idBIGINT1212404193201 │ … │ 1062221592612124040.00 │
│ book_titleVARCHAR     │ " Film technique, …  │ you can do anythin…  │        200024 │ … │         │         │ 2124040.00 │
│ book_descriptionVARCHAR     │ !! ALL NEW CAMPGRO…  │ �Una novela llam�m…  │        134539 │ … │         │         │ 21240432.22 │
├──────────────────┴─────────────┴──────────────────────┴──────────────────────┴───────────────┴───┴─────────┴─────────┴────────┴─────────────────┤
│ 3 rows                                                                                                                     12 columns (9 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
复制代码

 

D INSTALL fts;
100% ▕████████████████████████████████████████████████████████████▏
D LOAD fts;

 

D PRAGMA create_fts_index(
      'book_details',
      'book_details_id',
      'book_title',
      'book_description',
      overwrite=true
  );
100% ▕████████████████████████████████████████████████████████████▏

 

 

复制代码
D WITH book_cte AS (
      SELECT *,
          fts_main_book_details.match_bm25(book_details_id, 'travel france wine') AS match_score
      FROM book_details
  )
  SELECT book_title,
      book_description,
      match_score
  FROM book_cte
  WHERE match_score IS NOT NULL
  ORDER BY match_score DESC
  LIMIT 10;
┌──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────────┐
│      book_title      │                                               book_description                                               │    match_score    │
│       varcharvarchardouble       │
├──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┤
│ Passions : The Win…  │ This is a biography of Thomas Jefferson at leisure, enjoying two of his passions--wine and travel. Twelve …  │ 9.659004777031209 │
│ France: Best Place…  │ Offers an illustrated guide to the wines and foods of France's sixteen regions, along with information on …  │ 8.960100836315272 │
│ A Traveller's Wine…  │ Italy has probably the richest variety of wines of any country in the world. Stephen Hobley describes Ital…  │ 7.285849373142181 │
│ Rhone Renaissance:…  │ Norman assesses more than 1,500 Rhone-style wines from over 220 estates on four continents. Such classics …  │ 7.027475775939768 │
│ The Judgement of P…  │ Looks at an event held in 1976 in which French judges, during a blind taste-test, chose unknown California…  │ 6.981737912835492 │
│ The Rough Guide to…  │ Now available in ePub format. The Rough Guide to France is the ultimate travel guide to this varied and be…  │ 6.922094970746354 │
│ The French Vineyar…  │ A Michelin three-star French chef conducts readers on an illustrated tour of the wine-making regions of Fr…  │ 6.859384986272269 │
│ The Best 50 Bargai…  │ The most knowledgeable wine drinkers know that there are inexpensive wine bargains to be found, from every…  │ 6.785503637575159 │
│ Women of Wine: The…  │ "This book, with its personal approach and global scope, is the first to explore women's increasingly infl…  │ 6.492226510429365 │
│ Africa Uncorked: T…  │                                                                                                              │ 6.419296348638699 │
├──────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┤
│ 10 rows                                                                                                                                       3 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
复制代码

 

 

 

D INSTALL spatial;
D LOAD spatial;

 

D SELECT st_point(48.858935, 2.293412) AS Eiffel_Tower;
┌────────────────────────────┐
│        Eiffel_Tower        │
│          geometry          │
├────────────────────────────┤
│ POINT (48.858935 2.293412) │
└────────────────────────────┘

 

Now that we’ve covered some of the fundamentals, let’s see if we can combine a few DuckDB spatial commands to work out the distance between two points. Given that we’re currently standing at the foot of the Eiffel Tower, let’s work out how far away the Arc de Triomphe is:

复制代码
D SELECT st_point(48.858935, 2.293412) AS Eiffel_Tower,
      st_point(48.873407, 2.295471) AS Arc_de_Triomphe,
      st_distance(
          st_transform(Eiffel_Tower, 'EPSG:4326', 'EPSG:27563'),
          st_transform(Arc_de_Triomphe, 'EPSG:4326', 'EPSG:27563')
      ) AS Aerial_Distance_M;
┌────────────────────────────┬────────────────────────────┬────────────────────┐
│        Eiffel_Tower        │      Arc_de_Triomphe       │ Aerial_Distance_M  │
│          geometry          │          geometry          │       double       │
├────────────────────────────┼────────────────────────────┼────────────────────┤
│ POINT (48.858935 2.293412) │ POINT (48.873407 2.295471) │ 1622.0284586410457 │
└────────────────────────────┴────────────────────────────┴────────────────────┘
复制代码

We used the st_transform function to transform these points from the EPSG:4326 latitude and longitude degrees to the EPSG:27563 co-ordinate system (which is based on meters and covers northern France).

We can see that there are 1,622 meters (as the straight-line distance) between the Eiffel Tower and the Arc de Triomphe, which could be a nice afternoon walk.

 

We can use the st_read function included within the spatial extension to read Microsoft Excel files.

The data we want are located in a sheet named layers (I think it should be 'stations'.), which we can specify using the layer parameter of the st_read function:

复制代码
D SELECT *
  FROM st_read('stations.xlsx', layer='stations')
  LIMIT 5;
┌──────────────┬───────────┬───────────┐
│ station_name │ longitude │ latitude  │
│   varchardoubledouble   │
├──────────────┼───────────┼───────────┤
│ Abancourt    │  1.77430649.685224 │
│ Abbaretz     │ -1.52441647.554643 │
│ Abbeville    │   1.8244950.10221 │
│ Abbémont     │   2.5080849.616695 │
│ Aboncourt    │      6.3549.266667 │
└──────────────┴───────────┴───────────┘
复制代码

 

 

D CREATE OR REPLACE TABLE stations AS
  SELECT *
  FROM st_read('stations.xlsx', layer='stations');

 

D SELECT geom
  FROM st_read('bordeaux_wine_region.geojson');
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                          geom                                                                           │
│                                                                        geometry                                                                         │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ POLYGON ((-1.1535645 45.5756002, -0.6481934 44.6022017, -0.1963806 44.5053207, -0.1922607 44.6256638, 0.2252197 44.781835, 0.1483154 44.9725707, -0.0…  │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

 

复制代码
D SELECT station_name
  FROM stations
  WHERE st_within(
          st_point(longitude, latitude),
          (
              SELECT geom
              FROM st_read('bordeaux_wine_region.geojson')
          )
      );
┌──────────────────────────┐
│       station_name       │
│         varchar          │
├──────────────────────────┤
│ Alouette France          │
│ Arbanats                 │
│ Aubie—St-Antoine         │
│ Barsac                   │
│ Bassens                  │
│ Beautiran                │
│ Blanquefort              │
│ Bordeaux                 │
│ Bordeaux St-Jean         │
│   ·                      │
│   ·                      │
│   ·                      │
│ St-Médard-d’Eyrans       │
│ St-Seurin-sur-l’Isle     │
│ St-Sulpice—Izon          │
│ St-Émilion               │
│ Ste-Eulalie—Carbon-Blanc │
│ Vayres                   │
│ Villenave-d’Ornon        │
│ Vélines                  │
├──────────────────────────┤
│    61 rows (x shown)     │
└──────────────────────────┘
复制代码

 

posted on   ZhangZhihuiAAA  阅读(1)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
历史上的今天:
2024-01-14 Redis
2024-01-14 Redis - Use case of GETRANGE, SETRANGE, MGET, MSET
2024-01-14 Redis Commands - GETRANGE SETRANGE
2024-01-14 Redis - Why is Redis fast?
2021-01-14 Ubuntu - Remove PPA(Personal Package Archive) Repository
 
点击右上角即可分享
微信分享提示