Data Reliability for Data Lakes | Databricks

Data Reliability for Data Lakes | Databricks

11.235 Lượt nghe
Data Reliability for Data Lakes | Databricks
ABOUT THE KEYNOTE (https://www.datacouncil.ai/talks/data-reliability-for-data-lakes) Building a modern data lake requires dealing with a lot of complexity: querying historical data + streaming data simultaneously (lambda architecture), validation to ensure data isn't too messy for data science and machine learning, reprocessing to handle failures, and ensuring ACID-compliant data updates. We created the Delta Lake project, open sourced under the Linux Foundation, to relieve data scientists and data engineers from these complex systems problems and instead enable them to focus on extracting value from data. In this talk, we'll dive into these challenges and how ACID transactions solve them. We'll discuss patterns that emerge when you can focus on data quality and the nitty gritty internals of ACID on Spark which enable this focus. ABOUT THE KEYNOTE SPEAKER Michael Armbrust is committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and Databricks Delta. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization. ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups. FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai Facebook: https://www.facebook.com/datacouncilai Eventbrite: https://www.eventbrite.com/o/data-council-30357384520