Taming Rogue AI Agents with Observability-Driven Evaluation — Jim Bennett, Galileo
LLM agents often drift into failure when prompts, retrieval, external data, and policies interact in unpredictable ways. This session introduces a repeatable, metric-driven framework for detecting, diagnosing, and correcting these undesirable behaviors in agentic systems at production scale.
About Jim Bennett
Jim is the worlds most energetic dev rel, and a Principal Developer Advocate at Galileo, focusing on enabling AI developers to be more productive by monitoring and evaluating LLMs and AI agents. He’s British, so sounds way smarter than he actually is, and lives in the Pacific North West of the USA. In the past he’s lived in 4 continents working as a developer in the mobile, desktop, and scientific space. He's spoken at conferences and events all around the globe, organised meetup groups and communities, and written books on mobile development and IoT. He is currently a Microsoft MVP for AI and Developer Tools.
He also hates and is allergic to cats, but has a 12-year-old who loves cats, so he has 2 cats.
Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter