In this end-to-end Databricks tutorial, I explain the fundamentals of Spark Structured Streaming and its usage for building low-latency, real-time analytics solutions.
Chapters:
00:00:00- Learning objectives
00:06:12- Overview of real-time analytics technologies in Databricks- Spark Streaming, Spark Structured Streaming and Delta Live Tables
00:09:04- Overview of Spark Structured Streaming -use cases, key concepts, limitations, source & and sinks
00:16:08- Course setup
00:20:27- Demo: Building your first streaming pipeline from file sources
00:26:50- Streaming between Delta tables
00:28:53- Managing stream flows-start, stop, block execution flow
00:30:21- Streaming triggers
00:33:00- Common streaming source/sink options
00:38:02- Demo: Memory sink
00:38:40- Custom sink types: foeachBatch and foreach
00:40:37- Streaming output modes: append, update, complete
00:42:26- Stateless and stateful transformations
00:43:30- Demo: Selection, projection, filter transformations, deduplication
00:44:26- Aggregations
00:49:17-Handling Late Arriving Events
00:51:31-Arbitrary State Management
00:57:46-Arbitrary State Management with timeouts
01:00:05-Streaming Joins
01:01:24-Joining streaming and static Dataframes
01:01:58-Stream-to-stream joins
01:02:48-Interval matching with streaming joins
01:02:48-Using watermarks to handle late arriving events
01:05:04-Monitoring streaming pipelines-overview
01:05:57-Interactive monitoring
01:07:42-Monitoring streams with Spark listeners
01:10:36-Using dashboards and alerts for centralized monitoring
01:12:15- Stream monitoring with Spark UI
Please subscribe: https://www.youtube.com/channel/UC8d958MxE2t1dr27QNqoOhA
Download demo/exercise notebooks from here:
Data engineering with Databricks/SparkStructuredStreaming/Mastering Spark Structured Streaming.dbc
Download stream monitoring query:
https://github.com/fazizov/youtube/blob/main/Data%20engineering%20with%20Databricks/SparkStructuredStreaming/StreamingMonitoringQuery.sql
To sign up for the Databricks community edition, see this:
https://docs.databricks.com/en/getting-started/community-edition.html