Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum
Jianwei Yang, Principal Researcher, Microsoft Research Redmond, introduces Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings. It covers two new techniques, Set-of-Mark and Trace-of-Mark, for action grounding and planning, and details the unified pretraining pipeline that learns agentic capabilities.
Magma on arXiv: https://arxiv.org/pdf/2502.13130
Magma code on GitHub: https://microsoft.github.io/Magma/
Azure AI Foundry: https://ai.azure.com/
This session aired on February 25, 2025, at Microsoft Research Forum, Episode 5.
Register for the series: https://aka.ms/registerresearchforumYTe5
Continue watching episode 5: https://aka.ms/researchforumYTe5
Explore all previous episodes: https://aka.ms/researchforumYTplaylist