Monitoring Structured Streaming Applications Using Web UI - Jacek Laskowski

Monitoring Structured Streaming Applications Using Web UI - Jacek Laskowski

5.680 Lượt nghe
Monitoring Structured Streaming Applications Using Web UI - Jacek Laskowski
"Spark Structured Streaming in Apache Spark 2.2 comes with quite a few unique Catalyst operators, most notably stateful streaming operators and three different output modes. Understanding how Spark Structured Streaming manages intermediate state between triggers and how it affects performance is paramount. After all you use Apache Spark for processing huge amount of data that alone can be tricky to get right, and Spark Structured Streaming adds the additional streaming factor that given a structured query can make the data even bigger due to state management. This deep-dive talk is going to show you what is included in execution diagrams, logical and physical plans, and metrics in SQL tab's Details for Query page. The talk will also explain the other parts of SQL tab and the subpages with details for streaming queries. The talk is going to answer the following questions: * What do blue boxes represent in Details for Query page in SQL tab? * What does the black popup window tell me when hovering over a blue box in Details for Query page in SQL tab? * What’s under Details section at the bottom in Details for Query page in SQL tab? * Why does a single streaming query execute many queries as shown in SQL tab? * What are the Spark jobs in Spark Jobs page in Jobs tab? * Why would a single query execution lead to zero or more Spark jobs? How does the translation happen? * Why are the shuffles/exchanges in an execution plan for a streaming aggregation query? * and more!" About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Read more here: https://databricks.com/product/unified-data-analytics-platform Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner