Apache Spark | Databricks | PySpark | Big Data Engineering | Hadoop
🔍 What You'll Learn:
This 6+ hour video is your complete guide to mastering Apache Spark from scratch. It covers everything from Spark architecture to advanced topics like Shuffle Joins, Broadcast Joins, Executor Out of Memory issues, Salting, Caching and Persisting, Dynamic Partition Pruning, Adaptive Query Execution, and much more.
Timestamps:
0:00 Introduction
13:26 What is Apache Spark
21:03 Apache Spark V/S Hadoop MapReduce
33:36 Spark Architecture
49:40 Application Master Container
59:08 Databricks Free Account
1:06:55 Spark Session
1:17:56 Lazy Evaluation and Actions
1:32:58 Spark Query Plans and Spark UI
1:39:27 Spark RDD
1:49:49 Narrow and Wide Transformations
2:10:10 Repartition VS Coalesce
2:17:54 Jobs, Stages, and Tasks in PySpark
2:47:58 Shuffle Joins in PySpark
3:12:28 Broadcast Joins in PySpark
3:33:51 Spark SQL Engine
3:41:07 Driver Memory Management
3:52:15 Executor Memory Management
4:06:16 Unified Memory Management
4:13:16 Executor Out Of Memory
4:24:16 Salting in PySpark
4:29:58 Cache and Persist in Apache Spark
4:55:23 Edge Node and Deployment Mode
5:08:49 Dynamic Partition Pruning in Apache Spark
5:28:49 Adaptive Query Execution
WATCH MY PREVIOUS VIDEOS
➡️PySpark Full Course
https://youtu.be/94w6hPk7nkM?si=i3u_8ZWkGf6fiGyh
➡️Azure End To End Data Project
https://youtu.be/uc-u_juRg-w?si=sO_CavjnhNaSnZri
➡️Databricks Tutorial
https://youtu.be/7pee6_Sq3VY?si=pPBrTnE2R8PhhHu5
REPOSITORY
➡️GitHub
https://github.com/anshlambagit/Apache-Spark-Full-Course
=======================================
Join this channel to SUPPORT MY HARDWORK:
https://www.youtube.com/channel/UCu7lQE-L5gzt8aD7zuuufjw/join
=======================================
Connect with ME
LinkedIn - https://www.linkedin.com/in/ansh-lamba-793681184/
Telegram - https://t.me/anshlambadatafam
=========================
For COLLABORATION 👇
[email protected]
=========================
⭐Hashtags⭐
#dataengineering #bigdata #apachespark #databricks