Apache Spark Memory Management

Apache Spark Memory Management

19.005 Lượt nghe
Apache Spark Memory Management
Welcome back to our comprehensive series on Apache Spark Performance Tuning/Optimisation! In this video, we dive deep into the intricacies of Spark's internal memory allocation and how it divides memory resources for optimal performance. 🔹 What you'll learn: 1. On-Heap Memory: Learn about the parts of memory where Spark stores data for computation (shuffling, joins, sorting, aggregation) and caching directly within the Java heap. 2. Off-Heap Memory: Discover how Spark utilises memory outside the Java heap to manage data storage in crucial situations. 3. Overhead: Understand the additional memory overhead required for managing Spark internals and how it impacts your applications. 4. Unified Memory: Explore the concept of unified memory management in Spark, the movable slider between execution and storage memory and the rules that define this movement. 5. Memory Calculation: How spark calculates and allocates memory to different storages. 📘 Resources: 📄 Complete Code on GitHub: https://github.com/afaqueahmad7117/spark-experiments 🎥 Full Spark Performance Tuning Playlist: https://www.youtube.com/playlist?list=PLWAuYt0wgRcLCtWzUxNg4BjnYlCZNEVth 🔗 LinkedIn: https://www.linkedin.com/in/afaque-ahmad-5a5847129/ 📘 Chapters: 0:00 Intro 0:45 Roadmap 1:12 Executor Memory Layout 4:33 Executor Memory Calculations 11:52 Unified Memory 19:04 Off Heap Memory 22:30 Summary #ApacheSparkTutorial #SparkPerformanceTuning #ApacheSparkPython #LearnApacheSpark #SparkInterviewQuestions #ApacheSparkCourse #PerformanceTuningInPySpark #ApacheSparkPerformanceOptimization #pyspark #databricks #dataengineering #interviewquestions #azuredataengineer