In any distributed system, for efficient parallel processing and for better performance, the data distribution strategy to store data evenly and colocation of data across nodes play important roles.
In this video, in the context of Azure Synapse Analytics – dedicated SQL pool, I am going to walk you through data distribution strategy - by way of distributions, different data strategies like round robin and hash and finally replicated table - and best practices to provide prescriptive guidelines in-terms of when to use which and what are the consideration for better performance.
0:00 Introduction of distributed system and data distribution
7:14 Table types in SQL pools
8:40 Round Robin Distribution - Introduction
13:18 Hash Distribution - Introduction
16:42 Concept of distribution and how it maps to compute nodes
22:48 Round Robin Vs Hash - Example and performance differences
35:51 Round Robin Vs Hash - Analyze execution plans
42:52 Round Robin Vs Hash - Join Compatibility
49:30 Hash Distribution - Data skewness
51:51 Round Robin - Best Practices and Guidelines
53:58 Hash Distributed - Best Practices and Guidelines
59:17 Replicated Table - Introduction, Best Practices and Guidelines
1:03:24 Replicated Table - Example
Thank you for watching, in my next video, I am going to talk in detail about columnstore index and how it helps in improving performance for analytical queries. Stay tuned.
GitHub Repo to download deck and script used in the video:
https://github.com/AasTrailblazers/AzureSynapse/tree/main/SQL%20pool
Sample Databases
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-load-from-azure-blob-storage-with-polybase
Table Design
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview
Memory and concurrency limits
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits