Introducing DuckLake

20.364 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Introducing DuckLake

A podcast on the DuckLake project by Hannes Mühleisen and Mark Raasveldt.
For more information, see https://ducklake.select/

00:00 Opening
00:42 Introduction
02:33 How is DuckDB coming along?
06:13 Restriction in multi-user setups
10:44 Disconnecting storage and compute
11:57 The history of data lakes
13:41 The Parquet format
19:13 Limitations of a bare-bones Parquet+S3 architecture
20:29 Data lake formats (Iceberg, Delta)
22:40 Limitations of existing data lake formats
24:10 Lakehouse formats
28:06 The aesthetics of existing Lakehouse formats
32:15 The epiphany
38:23 The principles of DuckLake: Simplicity
38:35 “Wij van Wc-eend adviseren Wc-eend!”
40:40 Scalability
46:33 Speed
50:52 Data inlining against the small files problem
52:36 Transaction handling
54:21 Features
55:05 Run any query you want
55:50 Multi-schema, multi-table with transactions
57:46 Time travel
59:33 Partitioning
02:39 Privacy features
06:45 Compatibility with existing data lake formats
07:42 Standards vs. implementations
09:22 The Ducklake DuckDB extension
15:11 Scaling compute with DuckLake
16:20 Multiplayer DuckDB
21:05 Next steps for DuckLake
23:08 Closing thoughts					

Introducing DuckLake

Nhạc Theo Chủ Đề

Liên kết website