Broadcast Joins & AQE (Adaptive Query Execution)

9.645 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Broadcast Joins & AQE (Adaptive Query Execution)

Spark Performance Tuning

Welcome back to another engaging apache spark tutorial! In this apache spark performance optimization hands on tutorial, we dive deep into the techniques to fix data skew, focusing on Adaptive Query Execution (AQE) and broadcast join. AQE, a feature introduced in Spark 3.0, uses runtime statistics to select the most efficient query plan, optimizing shuffle partitions, joins, and skewed joins. We will discuss how Spark coalesces partitions, converts sort merge joins into broadcast joins, and splits larger partitions into smaller ones to optimize skewed joins.

We will walk through the Spark documentation to understand the properties that need to be set to true for Spark to dynamically handle skew in a sort mode join. Then, we will look at an example joining two datasets, transaction and customer, to analyze how the join will look with and without AQE. By the end of this video, you will have a solid understanding of AQE, how to optimize skewed joins, and how to set up a Spark session to handle data skews. 

Key Takeaways:
Understanding Adaptive Query Execution (AQE) and its benefits.
How to optimize shuffle partitions and joins using AQE.
Setting up a Spark session and properties to handle data skew dynamically.
Analyzing the distribution of data and identifying skewed partitions.
Comparing the performance of sort merge join with and without AQE.

📄 Complete Code on GitHub: https://github.com/afaqueahmad7117/spark-experiments/blob/main/spark/1_data_skew/3_solving_data_skew_aqe_broadcast.ipynb
🎥 Full Spark Performance Tuning Playlist: https://www.youtube.com/playlist?list=PLWAuYt0wgRcLCtWzUxNg4BjnYlCZNEVth

🔗 LinkedIn: https://www.linkedin.com/in/afaque-ahmad-5a5847129/

Chapters:
00:00 Introduction 
00:35 What is AQE?
04:25 Sort-Merge-Join of Customer & Transaction Dataset
06:00 Spark UI showing Data Skew
06:41Join of Customer & Transaction Dataset (AQE enabled)
07:04 Code + Spark UI - Comparing Join Performance (with & without AQE)
10:52 Broadcast Join 
11:18 Internal Working of Sort Merge Join 
13:12 Concept of Hash Partitioning 
14:47 Sort Merge Join example
17:12 Broadcast Join example
19:44 Code for Broadcast join fixing Data Skew

#DataEngineering #AdaptiveQueryExecution #DataSkew #BroadcastJoin  #spark #apachespark  #dataengineering #sparkperformancetuning #dataengineering #interviewquestions #dataengineerinterviewquestions #azuredataengineer #dataanalystinterview					

Broadcast Joins & AQE (Adaptive Query Execution)

Nhạc Theo Chủ Đề

Liên kết website

Broadcast Joins & AQE (Adaptive Query Execution)

Những bài liên quan

Chưa có bài liên quan nào!