Introducing alma: an open source Python library for benchmarking PyTorch model speed across different conversion options 🚀
Inference optimization is a massive part of modern ML and AI. The faster you can run your model, the less money you have to spend on compute. Additionally, faster models make your application latency that much less, improving your appplication's user experience.
However, it’s difficult to know which conversion option to use for any given combination of a model, data, and hardware. alma is an open source, free Python package that allows you to find out the best conversion option for your situation, with one function call.
Get started here: https://github.com/saifhaq/alma
00:00 Intro
01:15 Basic usage of the repo
07:50 Documentation and Advanced Usages
17:05 How to add new conversion methods
18:55 Conclusion