Data & Drinks: Building Next-Gen Data Systems with Apache DataFusion

Data & Drinks: Building Next-Gen Data Systems with Apache DataFusion

280 Lượt nghe
Data & Drinks: Building Next-Gen Data Systems with Apache DataFusion
The first Data & Drinks event of 2025 will offer an exciting exploration of Apache DataFusion and its potential to shape the future of data systems! DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. This edition, which takes place on Thursday the 23rd of January at Xomnia's HQ in the heart of Amsterdam, features three insightful talks from Apache DataFusion contributors, showcasing the inner workings of the project and real-world applications, and providing an opportunity to explore the diverse possibilities Apache DataFusion unlocks for data-centric systems. This free event includes dinner, drinks and a lot of networking opportunities with data professionals from Amsterdam and beyond. Abstracts Talk 1: Intro to DataFusion: Technology, Community, and Not Quite Enough Time by Andrew Lamb Andrew delves into the architecture, modularity, and tradeoffs of Apache DataFusion, a high-performance Rust-based query engine, and how it's employed in building advanced data systems. Talk 2: Building A Unified Compute Engine with Apache DataFusion by Mehmet Ozan Kabak This talk explores how DataFusion’s modular architecture enables the vision of “unified” compute engines. Ozan will discuss how its extensibility addresses core engine-level limitations and empowers streamlined solutions for data and AI workloads, while also considering the challenges that remain. Talk 3: Distributed Joins with DataFusion at Coralogix by Jan Kaul In this technical deep dive, Jan will explore the implementation of materialized views in Apache Iceberg using DataFusion. He will demonstrate how incremental refresh operations are realized for certain queries and provide practical examples of incremental data pipelines with Datafusion. Biographies of the speakers: Andrew Lamb, InfluxData, Staff Engineer, Apache DataFusion PMC Chair: After spending many years as C/C++ systems programmer (databases and compilers), and a stint working on Machine Learning startups (as one does), Andrew now works at InfluxData and a talented team of engineers on InfluxDB IOx, a new engine for time series data. Mehmet Ozan Kabak, CEO & Co-founder of Synnada, Apache DataFusion PMC: After diving deep into distributed systems and big data throughout his career path through various startups and Meta, he now leads Synnada as CEO, bringing his Stanford Ph.D. and extensive machine learning expertise to build next-generation data infrastructure. His journey has consistently revolved around tackling large-scale distributed systems challenges and advancing the field of applied machine learning. Jan Kaul, Founder/CEO of Dashbook: Jan Kaul is the Founder and CEO of Dashbook, where he develops tools for modern data infrastructure using Apache Arrow, Apache Iceberg, and DataFusion. He is the creator of Dashtool, a Lakehouse build tool that creates and manages Iceberg materialized views using declarative SQL.