PySpark | Databricks | Apache Spark | Big Data Engineering
In this video, you'll learn PySpark optimization techniques from the ground up — covering everything from PySpark Tutorial, Driver OOM issues, Salting, Adaptive Query Execution, Caching and Persistence, Spark SQL Hints, Broadcast Joins and Variables, to Databricks Delta Lake and more!
Timestamps:
0:00 Introduction
8:52 Databricks Free Account
12:29 Databricks Overview
17:18 Spark Cluster and Spark Session
26:55 Scanning Optimization using PySpark Partitioning
52:42 Joins Optimization in Spark using Broadcast Joins
1:04:47 Sort Merge Join vs Broadcast Join in PySpark
1:17:57 Spark SQL Hints
1:19:48 Caching and Persistence in PySpark
1:38:11 Spark Dynamic Resource Allocation
1:45:12 AQE - Adaptive Query Execution
2:05:45 Dynamic Partition Pruning in Apache Spark
2:26:35 Broadcast Variables
2:32:23 Salting in PySpark
2:51:34 Delta Lake Optimization using PySpark
WATCH MY PREVIOUS VIDEOS
➡️Azure End To End Data Project
https://youtu.be/uc-u_juRg-w?si=sO_CavjnhNaSnZri
➡️Databricks Tutorial
https://youtu.be/7pee6_Sq3VY?si=pPBrTnE2R8PhhHu5
➡️PySpark Full Course
https://youtu.be/94w6hPk7nkM?si=i3u_8ZWkGf6fiGyh
REPOSITORY
➡️GitHub
https://github.com/anshlambagit/SparkOptimization
=======================================
Join this channel to SUPPORT MY HARDWORK:
https://www.youtube.com/channel/UCu7lQE-L5gzt8aD7zuuufjw/join
=======================================
Connect with ME
LinkedIn - https://www.linkedin.com/in/ansh-lamba-793681184/
Telegram - https://t.me/anshlambadatafam
=========================
For COLLABORATION 👇
[email protected]
=========================
⭐Hashtags⭐
#pyspark #apachespark #dataengineering #databricks #bigdata