Introducing DuckLake

Introducing DuckLake

20.364 Lượt nghe
Introducing DuckLake
A podcast on the DuckLake project by Hannes Mühleisen and Mark Raasveldt. For more information, see https://ducklake.select/ 00:00:00 Opening 00:00:42 Introduction 00:02:33 How is DuckDB coming along? 00:06:13 Restriction in multi-user setups 00:10:44 Disconnecting storage and compute 00:11:57 The history of data lakes 00:13:41 The Parquet format 00:19:13 Limitations of a bare-bones Parquet+S3 architecture 00:20:29 Data lake formats (Iceberg, Delta) 00:22:40 Limitations of existing data lake formats 00:24:10 Lakehouse formats 00:28:06 The aesthetics of existing Lakehouse formats 00:32:15 The epiphany 00:38:23 The principles of DuckLake: Simplicity 00:38:35 “Wij van Wc-eend adviseren Wc-eend!” 00:40:40 Scalability 00:46:33 Speed 00:50:52 Data inlining against the small files problem 00:52:36 Transaction handling 00:54:21 Features 00:55:05 Run any query you want 00:55:50 Multi-schema, multi-table with transactions 00:57:46 Time travel 00:59:33 Partitioning 01:02:39 Privacy features 01:06:45 Compatibility with existing data lake formats 01:07:42 Standards vs. implementations 01:09:22 The Ducklake DuckDB extension 01:15:11 Scaling compute with DuckLake 01:16:20 Multiplayer DuckDB 01:21:05 Next steps for DuckLake 01:23:08 Closing thoughts