A podcast on the DuckLake project by Hannes Mühleisen and Mark Raasveldt.
For more information, see https://ducklake.select/
00:00:00 Opening
00:00:42 Introduction
00:02:33 How is DuckDB coming along?
00:06:13 Restriction in multi-user setups
00:10:44 Disconnecting storage and compute
00:11:57 The history of data lakes
00:13:41 The Parquet format
00:19:13 Limitations of a bare-bones Parquet+S3 architecture
00:20:29 Data lake formats (Iceberg, Delta)
00:22:40 Limitations of existing data lake formats
00:24:10 Lakehouse formats
00:28:06 The aesthetics of existing Lakehouse formats
00:32:15 The epiphany
00:38:23 The principles of DuckLake: Simplicity
00:38:35 “Wij van Wc-eend adviseren Wc-eend!”
00:40:40 Scalability
00:46:33 Speed
00:50:52 Data inlining against the small files problem
00:52:36 Transaction handling
00:54:21 Features
00:55:05 Run any query you want
00:55:50 Multi-schema, multi-table with transactions
00:57:46 Time travel
00:59:33 Partitioning
01:02:39 Privacy features
01:06:45 Compatibility with existing data lake formats
01:07:42 Standards vs. implementations
01:09:22 The Ducklake DuckDB extension
01:15:11 Scaling compute with DuckLake
01:16:20 Multiplayer DuckDB
01:21:05 Next steps for DuckLake
01:23:08 Closing thoughts