Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum

Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum

23.514 Lượt nghe
Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum
Jianwei Yang, Principal Researcher, Microsoft Research Redmond, introduces Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings. It covers two new techniques, Set-of-Mark and Trace-of-Mark, for action grounding and planning, and details the unified pretraining pipeline that learns agentic capabilities. Magma on arXiv: https://arxiv.org/pdf/2502.13130 Magma code on GitHub: https://microsoft.github.io/Magma/ Azure AI Foundry: https://ai.azure.com/ This session aired on February 25, 2025, at Microsoft Research Forum, Episode 5. Register for the series: https://aka.ms/registerresearchforumYTe5 Continue watching episode 5: https://aka.ms/researchforumYTe5 Explore all previous episodes: https://aka.ms/researchforumYTplaylist