The Case for Metadata for Machine Learning Platforms | ArangoDB

The Case for Metadata for Machine Learning Platforms | ArangoDB

1.709 Lượt nghe
The Case for Metadata for Machine Learning Platforms | ArangoDB
Get the slides: https://www.datacouncil.ai/talks/the-case-for-metadata-for-machine-learning-platforms ABOUT THE TALK It is well known that data quality and quantity are crucial for building Machine Learning models, especially when dealing with Deep Learning and Neural Networks. But besides the data required to build the model itself, there is another often overlooked type of data required to build a production grade Machine Learning Platform: metadata. Modern Machine Learning platforms contain a number of different components: Distributed Training, Jupyter Notebooks, CI/CD, Hyperparameter Optimization, Feature stores, and many more. Most of these components have associated metadata including versioned datasets, versioned Jupyter Notebooks, training parameters, test/training accuracy of a trained model, versioned features, and statistics from model serving. For the dataops team managing such production platforms, it is critical to have a common view across all this metadata, as we have to ask questions such as: Which Jupyter Notebook has been used to build Model XYZ currently running in production? If there is new data for a given dataset, which models (currently serving in production) have to be updated? In this talk, we look at existing implementations, in particular MLMD as part of the TensorFlow ecosystem. Further, we propose a first draft of a (MLMD compatible) universal Metadata API. We demo the first implementation of this API using ArangoDB. ABOUT THE SPEAKER Jörg Schad is Head of Engineering and Machine Learning at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls. ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups. FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai Facebook: https://www.facebook.com/datacouncilai Eventbrite: https://www.eventbrite.com/o/data-council-30357384520