Production-grade ML Pipelines – From Data To Metadata by Jörg Schad

243 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Production-grade ML Pipelines – From Data To Metadata by Jörg Schad

It is well known that data quality and quantity are crucial for building Machine Learning models, especially when dealing with Deep Learning and Neural Networks. But besides the data required to build the model itself, there is another often overlooked type of data required to build a production-grade Machine Learning Platform: Metadata.

Modern Machine Learning platforms contain a number of different components: Distributed Training, Jupyter Notebooks, CI/CD, Hyperparameter Optimization, Feature stores, and many more. Most of these components have associated metadata including versioned datasets, versioned Jupyter Notebooks, training parameters, test/training accuracy of a trained model, versioned features, and statistics from model serving. For the dataops team managing such production platforms, it is critical to have a common view across all this metadata, as we have to ask questions such as: Which Jupyter Notebook has been used to build Model XYZ currently running in production? If there is new data for a given dataset, which models (currently serving in production) have to be updated? In this talk, we look at existing implementations, in particular, MLMD as part of the TensorFlow ecosystem.					

Production-grade ML Pipelines – From Data To Metadata by Jörg Schad

Nhạc Theo Chủ Đề

Liên kết website