Azure Synapse Analytics | Data Distribution Strategy and Best Practices

Azure Synapse Analytics | Data Distribution Strategy and Best Practices

13.391 Lượt nghe
Azure Synapse Analytics | Data Distribution Strategy and Best Practices
In any distributed system, for efficient parallel processing and for better performance, the data distribution strategy to store data evenly and colocation of data across nodes play important roles. In this video, in the context of Azure Synapse Analytics – dedicated SQL pool, I am going to walk you through data distribution strategy - by way of distributions, different data strategies like round robin and hash and finally replicated table - and best practices to provide prescriptive guidelines in-terms of when to use which and what are the consideration for better performance. 0:00 Introduction of distributed system and data distribution 7:14 Table types in SQL pools 8:40 Round Robin Distribution - Introduction 13:18 Hash Distribution - Introduction 16:42 Concept of distribution and how it maps to compute nodes 22:48 Round Robin Vs Hash - Example and performance differences 35:51 Round Robin Vs Hash - Analyze execution plans 42:52 Round Robin Vs Hash - Join Compatibility 49:30 Hash Distribution - Data skewness 51:51 Round Robin - Best Practices and Guidelines 53:58 Hash Distributed - Best Practices and Guidelines 59:17 Replicated Table - Introduction, Best Practices and Guidelines 1:03:24 Replicated Table - Example Thank you for watching, in my next video, I am going to talk in detail about columnstore index and how it helps in improving performance for analytical queries. Stay tuned. GitHub Repo to download deck and script used in the video: https://github.com/AasTrailblazers/AzureSynapse/tree/main/SQL%20pool Sample Databases https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-load-from-azure-blob-storage-with-polybase Table Design https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview Memory and concurrency limits https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits