What if Apache Kafka could use cheap S3 storage instead of expensive disks? This architectural shift is worth hundreds of millions—and it's coming to open source. The week we look a project that's push for a Kafka that uses object storage services like S3 as its main disk, sacrificing a little latency for cheap, infinitely-scalable disks.
There are several companies trying to walk down that road, and it’s clearly big business - one of them recently got bought out for a rumoured $250m. But one of them is actively trying to get those changes back into the community, as are pushing to make Apache Kafka speak object storage natively.
Joining me to explain why and how are Josep Prat and Filip Yonov of Aiven. We break down what it takes to make Kafka’s storage layer optional on a per-topic basis, how they’re making sure it’s not a breaking change, and how they plan to get such a foundational feature merged.
Thanks to Aiven for sponsoring this episode.
–
Diskless Kafka Overview: https://fnf.dev/45fRuFh
Announcement Post: https://aiven.io/blog/guide-diskless-apache-kafka-kip-1150
Aiven’s (Temporary) Fork, Project Inkless: https://github.com/aiven/inkless/blob/main/docs/inkless/README.md
Kafka Improvement Process (KIP) Articles:
KIP-1150: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics
KIP-1163: Diskless Core: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core
KIP-1164: Topic Based Batch Coordinator: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Topic+Based+Batch+Coordinator
KIP-1165: Object Compaction for Diskless: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1165%3A+Object+Compaction+for+Diskless
Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices
Support Developer Voices on YouTube: https://www.youtube.com/@DeveloperVoices/join
Filip on LinkedIn: https://www.linkedin.com/in/filipyonov
Josep on LinkedIn: https://www.linkedin.com/in/jlprat/
Kris on Bluesky: https://bsky.app/profile/krisajenkins.bsky.social
Kris on Mastodon: http://mastodon.social/@krisajenkins
Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/
--
0:00 Intro
3:14 The Problem: Why Kafka's Current Storage is Expensive
9:46 Servless, Diskless and Naming Things
11:53 Why Make A Competitive Advantage Open Source?
17:15 If Kafka Were Started Today, Would It Be Designed Cloud-First?
20:35 The Solution: Retrofitting Object Storage Into Kafka
26:57 Durability in the Face of Errors
34:37 What About Latency?
42:23 Performance Characteristics with Bursty Traffic
46:47 Transaction Support
52:34 How Transparent Is It To The Client?
58:00 How does this change Reads?
1:03:04 How Do You Handle The Metadata Layer?
1:09:04 What's The State Of The Open Source Merge?
1:25:30 How Can Users Get Involved?
1:28:18 Outro