Master Reading Spark DAGs

25.163 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Master Reading Spark DAGs

Spark Performance Tuning

In this tutorial, we dive deep into the core of Apache Spark performance tuning by exploring the Spark DAGs (Directed Acyclic Graph).

We cover the Spark DAGs (Directed Acyclic Graph) for a range of operations from reading files, Spark narrow and wide transformations with examples, aggregation using groupBy count, groupBy count distinct. Understand the differences between sort merge and broadcast joins, and analyze the DAG from different perspectives with practical examples.

This video is a treasure trove for both beginners and experienced Spark users looking to optimize their code and understand the inner workings of Apache Spark. We examine the DAG, input batches, and partitions in great detail, understand the significance of metadata, and explore how Spark optimizes the execution of jobs and stages. 

📄 Complete Code on GitHub: https://github.com/afaqueahmad7117/spark-experiments/blob/main/spark/3_reading_query_DAGs.ipynb
🎥 Full Spark Performance Tuning Playlist: https://www.youtube.com/playlist?list=PLWAuYt0wgRcLCtWzUxNg4BjnYlCZNEVth
🎥 Link to Spark Query Plan Video:  https://www.youtube.com/watch?v=KnUXztKueMU&t=2049s

🔗 LinkedIn: https://www.linkedin.com/in/afaque-ahmad-5a5847129

Chapters:
00:00 Introduction
00:34 Module imports 
00:51 Topics covered 
01:54 Spark DAG for Reading a file 
07:36 DAG for Narrow transformations
11:17 Wide transformations introduction
11:24 DAG for Sort Merge join (wide transformation)
18:30 DAG for Broadcast join (narrow transformation)
20:15 DAG for Aggregations Group by count (wide transformation)
24:41 DAG for Aggregations Group by sum (wide transformation)
25:44 DAG for Aggregations Group by count distinct (wide transformation)

 #ApacheSpark #SparkPerformanceTuning #DataEngineering #SparkDAG #SparkOptimization
#dataengineering #interviewquestions #azuredataengineer					

Master Reading Spark DAGs

Nhạc Theo Chủ Đề

Liên kết website