Mastering Databricks Real-Time Analytics with Spark Structured Streaming

Mastering Databricks Real-Time Analytics with Spark Structured Streaming

304 Lượt nghe
Mastering Databricks Real-Time Analytics with Spark Structured Streaming
In this end-to-end Databricks tutorial, I explain the fundamentals of Spark Structured Streaming and its usage for building low-latency, real-time analytics solutions. Chapters: 00:00:00- Learning objectives 00:06:12- Overview of real-time analytics technologies in Databricks- Spark Streaming, Spark Structured Streaming and Delta Live Tables 00:09:04- Overview of Spark Structured Streaming -use cases, key concepts, limitations, source & and sinks 00:16:08- Course setup 00:20:27- Demo: Building your first streaming pipeline from file sources 00:26:50- Streaming between Delta tables 00:28:53- Managing stream flows-start, stop, block execution flow 00:30:21- Streaming triggers 00:33:00- Common streaming source/sink options 00:38:02- Demo: Memory sink 00:38:40- Custom sink types: foeachBatch and foreach 00:40:37- Streaming output modes: append, update, complete 00:42:26- Stateless and stateful transformations 00:43:30- Demo: Selection, projection, filter transformations, deduplication 00:44:26- Aggregations 00:49:17-Handling Late Arriving Events 00:51:31-Arbitrary State Management 00:57:46-Arbitrary State Management with timeouts 01:00:05-Streaming Joins 01:01:24-Joining streaming and static Dataframes 01:01:58-Stream-to-stream joins 01:02:48-Interval matching with streaming joins 01:02:48-Using watermarks to handle late arriving events 01:05:04-Monitoring streaming pipelines-overview 01:05:57-Interactive monitoring 01:07:42-Monitoring streams with Spark listeners 01:10:36-Using dashboards and alerts for centralized monitoring 01:12:15- Stream monitoring with Spark UI Please subscribe: https://www.youtube.com/channel/UC8d958MxE2t1dr27QNqoOhA Download demo/exercise notebooks from here: Data engineering with Databricks/SparkStructuredStreaming/Mastering Spark Structured Streaming.dbc Download stream monitoring query: https://github.com/fazizov/youtube/blob/main/Data%20engineering%20with%20Databricks/SparkStructuredStreaming/StreamingMonitoringQuery.sql To sign up for the Databricks community edition, see this: https://docs.databricks.com/en/getting-started/community-edition.html