Real World Applications of Velox and Apache Gluten in Dataproc’s NQE - Abhishek Modi, Google
Google Dataproc’s Performance Boost, powered by the integration of Velox and Apache Gluten, is revolutionizing big data analytics for enterprise customers. This session delves into the real-world impact of our Velox, showcasing how we’ve scaled these open-source technologies to deliver unprecedented performance gains. We’ll explore the architecture of our Dataproc integration, highlighting how Velox’s vectorized execution and Apache Gluten’s glue layer combine to significantly reduce query latency and resource consumption. Furthermore, we’ll detail our key improvements to the Spark-Velox integration including broadcast hash join optimizations that dramatically improve the performance of join-heavy workloads. We also focused on the addition and stabilization of comprehensive ORC and Parquet test cases, ensuring robust and reliable performance for a critical data format.
To help customers quickly understand the potential benefits, we’ve developed a qualification tool that analyzes Spark event logs and accurately predicts the performance gains achievable with Velox pushdown. Through case studies with marquee customers, we’ll demonstrate the tangible benefits of NQE, including accelerated data processing, improved cost efficiency, and enhanced scalability. Join us to learn how we’re pushing the boundaries of big data analytics with Velox and Apache Gluten in Google Dataproc, and how our extensions and innovative tools are contributing to the broader open-source community.