This presentation was recorded at GOTO Chicago 2024. #GOTOcon #GOTOchgo
https://gotochgo.com
Kasun Indrasiri - Author of "Microservices for the Enterprise" @kasunindrasiri
ORIGINAL TALK TITLE
Kafka Meets Iceberg: Real-Time Data Streaming into Modern Data Lakes & Warehouses
RESOURCES
https://twitter.com/kasunindrasiri
https://medium.com/@kasunindrasiri
https://www.linkedin.com/in/kasun-indrasiri
ABSTRACT
In this talk, we'll explore how Kafka serves as a powerful platform for capturing real-time streaming data and how organizations are increasingly adopting Apache Iceberg table format to store data in data lakes and data warehouses. We'll discuss the key benefits of using Apache Iceberg tables in your data lake such as schema evolution, ACID transactions, hidden partitioning, time traveling and efficient querying.
Next, we'll dive into how to efficiently stream data from Kafka into Iceberg-based data lakes. Confluent Tableflow will be introduced as a potential solution for streamlining the ingestion of Kafka streams into Iceberg tables within your data lake. A live demo will showcase the seamless integration of Kafka with Iceberg, equipping participants with practical knowledge to enhance their data architectures for powerful real-time analytics.
• The role of Kafka in real-time data streaming
• Why Apache Iceberg is essential for data lakes and data warehouses
• Iceberg fundamentals: Core concepts and key features
• Streaming data from Kafka to Iceberg tables in data lakes
• Use case: Leveraging Confluent Tableflow to stream Kafka data into data lakes and warehouses [...]
TIMECODES
00:00 Intro
01:10 Overview
02:06 Kafka is the standard for operational data
03:41 Iceberg for analytical data in data lakes
04:42 Apache Iceberg
05:27 Why Iceberg?
12:24 Structure of an Iceberg table
16:40 Streaming to data lakes is complicated
20:47 Tableflow materialize Kafka topics as Iceberg tables
23:47 Demo
35:37 Outro
Download slides and read the full abstract here:
https://gotochgo.com/2024/sessions/3370
RECOMMENDED BOOKS
Kasun Indrasiri & Sriskandarajah Suhothayan • Design Patterns for Cloud Native Applications • https://amzn.to/3szGx0p
Kasun Indrasiri & Danesh Kuruppu • gRPC: Up and Running • https://amzn.to/3sBGBJJ
Kasun Indrasiri & Prabath Siriwardena • Microservices for the Enterprise • https://amzn.to/40FhxkQ
Kasun Indrasiri • Beginning WSO2 ESB • https://amzn.to/3sx9NF0
https://bsky.app/profile/gotocon.com
https://twitter.com/GOTOcon
https://www.linkedin.com/company/goto-
https://www.instagram.com/goto_con
https://www.facebook.com/GOTOConferences
#ApacheKafka #Kafka #ApacheIceberg #Iceberg #AmazonAthena #DataStreaming #DataLake #DataWarehouse #ACIDTransactions #KafkaStreams #Confluent #Tableflow #ConfluentTableflow #KasunIndrasiri
CHANNEL MEMBERSHIP BONUS
Join this channel to get early access to videos & other perks:
https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join
Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech
Sign up for updates and specials at https://gotopia.tech/newsletter
SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
https://www.youtube.com/user/GotoConferences/?sub_confirmation=1