This presentation was recorded at GOTO Copenhagen 2023. #GOTOcon #GOTOcph
https://gotocph.com
Tim Berglund - VP DevRel StarTree & Author of "Gradle Beyond the Basics" @tlberglund @StarTree
RESOURCES
http://timberglund.com
https://twitter.com/tlberglund
https://www.linkedin.com/in/tlberglund
https://pinot.apache.org
https://twitter.com/startreedata
https://www.linkedin.com/company/startreedata
https://dev.startree.ai
https://stree.ai/slack
ABSTRACT
Apache Kafka has become the standard infrastructure for event-driven and streaming data systems. The stunningly simple abstraction of the distributed log provides exactly what modern microservices and real-time systems need, but no choice is without its tradeoffs. Logs are an excellent way to keep track of events, but they are notoriously difficult to query. Given a constellation of services exchanging events with each other and reacting to inputs in real time, how can you find out—and gain insight into—what has just happened? How, in other words, do you query a log? This is where Apache Pinot comes in.
Developed at LinkedIn alongside Kafka, Pinot is a distributed, real-time analytics database designed to ingest data from Kafka (and other sources) and make it instantly queryable at low latency in the face of a huge number of concurrent requests. All that data tucked neatly away into topics, maintaining an immutable record of how the state of the system has evolved, can now be ingested into Pinot and made accessible through simple SQL queries.
This talk explores Pinot's internal architecture, how its integration with Kafka is specially optimized, and how Pinot fits architecturally in the modern streaming stack. You'll leave understanding how Pinot works, how it fits together with Kafka, where it has been used successfully in the real world, and what steps to take next in your own Pinot learning journey. [...]
TIMECODES
00:00 Intro
02:57 A brief history
12:53 Pinot architecture
24:04 Indexes
32:29 Ingest
41:51 Remember our history
44:57 Outro
Download slides and read the full abstract here:
https://gotocph.com/2023/sessions/2900
RECOMMENDED BOOKS
Tim Berglund • Gradle Beyond the Basics • https://amzn.to/3fSjfMD
Tim Berglund & Matthew McCullough • Building and Testing with Gradle • https://amzn.to/3VaBY6g
Mark Needham • Building Real-Time Analytics Systems • https://amzn.to/41AOZJd
Gwen Shapira, Todd Palino, Rajini Sivaram & Krit Petty • Kafka: The Definitive Guide • https://amzn.to/41AVlrO
Adi Polak • Scaling Machine Learning with Spark • https://amzn.to/3N9vx1H
https://twitter.com/GOTOcon
https://www.linkedin.com/company/goto-
https://www.instagram.com/goto_con
https://www.facebook.com/GOTOConferences
#ApachePinot #Analytics #RealTime #RealTimeAnalytics #TimBerglund #StarTree #StarTreeCloud #Cloud #ApachePinotTutorial #ApachePinotTraining #Snowflake #ApacheZooKeeper #ApacheHelix #Hadoop #ApacheSpark
CHANNEL MEMBERSHIP BONUS
Join this channel to get early access to videos & other perks:
https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join
Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech
Sign up for updates and specials at https://gotopia.tech/newsletter
SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
https://www.youtube.com/user/GotoConferences/?sub_confirmation=1