Ensure AI Agents Work: Evaluation Frameworks for Scaling Success — Aparna Dhinkaran, CEO Arize
Turning AI agents into reliable, production-ready tools that deliver tangible business results requires more than just great models. It demands robust evaluation frameworks that ensure agents perform at scale, align with organizational objectives, and continuously improve in dynamic environments.
This session provides an executive-level perspective on evaluating AI agents at scale. We’ll explore practical strategies for designing evaluation processes that drive measurable impact, identifying and mitigating performance bottlenecks, and implementing observability practices to maintain reliability over time. Through insights from real-world deployments, we’ll highlight common pitfalls, share best practices for iterative improvement, and demonstrate how effective evaluation frameworks can transform experimental agents into enterprise-grade solutions.
Whether you're shaping your organization’s GenAI strategy or looking to unlock the full potential of AI agents, this talk offers actionable insights to ensure your agents work—and scale—successfully.
Recorded live at the Leadership Track Session Day from the AI Engineer Summit 2025 in New York. Learn more at https://ai.engineer and purchase tickets to our next event, the AI Engineer World's Fair, in SF June 3 - 5 here: https://ti.to/software-3/ai-engineer-worlds-fair-2025
About Aparna
Aparna Dhinakaran is the Co-Founder and Chief Product Officer at Arize AI, a pioneer, and early leader in machine learning (ML) observability. A frequent speaker at top conferences and thought leader in the space, Dhinakaran was recently named to the Forbes 30 Under 30. Before Arize, Dhinakaran was an ML engineer and leader at Uber, Apple, and TubeMogul (acquired by Adobe). During her time at Uber, she built several core ML Infrastructure platforms, including Michelangelo. She has a bachelor’s from Berkeley's Electrical Engineering and Computer Science program, where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision Ph.D. program at Cornell University.