Part 2: How Do You Evaluate Agents? | Evaluating AI Agents with Arize AI

Part 2: How Do You Evaluate Agents? | Evaluating AI Agents with Arize AI

131 Lượt nghe
Part 2: How Do You Evaluate Agents? | Evaluating AI Agents with Arize AI
Evaluating LLMs can be a daunting task, and when it comes to agentic systems, the complexity increases exponentially. In this Part 2 of our community series with Arize AI, we will explore why traditional LLM evaluation metrics fall short when applied to agents and introduce modern LLM evaluation techniques that are built for this new paradigm. What We Will Cover: - Why traditional metrics like BLEU and ROUGE fall short for agent evaluation. - Core agent evaluation methods: code-based, LLM-driven, human feedback, and ground truth comparisons. - Writing high-quality LLM evaluations aligned with real-world tasks. - Building and benchmarking LLM evaluations using ground truth data. - Best practices for capturing telemetry and scaling evaluations. - How OpenInference standards enhance system interoperability and consistency. - Hands-on Exercise: Evaluate a sample agent run with Arize Phoenix using code-based and LLM evaluations. #llmevaluation #agenticai #machinelearning #arizeai #evaluatingagent ------- Table of Content: 0:00 - Introduction and Series Overview 1:26 - Focus of Today: Evaluating AI Agents 2:10 - Agent Components Overview (Router, Skills, Path) 4:39 - How to Evaluate a Router 6:10 - How to Evaluate Skills (API, RAG, Code) 7:37 - Evaluating Agent Paths (Trajectory Eval) 9:52 - Evaluation Techniques Overview 10:15 - Technique 1: LLM as a Judge 19:44 - Technique 2: Code-Based Evaluation 22:08 - Technique 3: Human Annotations 24:24 - Live Demo: Evaluating a Travel Agent 27:03 - Example of LLM-as-a-Judge in Action 30:11 - How to Build and Apply Evaluation Templates 34:50 - Using Test Datasets for Evaluation 42:04 - Guardrails and Prompt Injection Detection 46:04 - Summary: Combining Techniques in Dev & Prod 48:30 - Multimodal Evaluation Note (Voice, Image, Video) 49:16 - Final Wrap-Up and Next Steps ----------- 👉 Learn more about Data Science Dojo here: https://datasciencedojo.com/ 👉 Watch the latest video tutorials here: https://datasciencedojo.com/tutorials/ 👉 See what our past attendees are saying here: https://datasciencedojo.com/data-scie... -- At Data Science Dojo, we believe data science is for everyone. Our in-person data science training has been attended by more than 8000+ employees from over 2000+ companies globally, including many leaders in tech like Microsoft, Apple, and Facebook. -- 🔗 Subscribe to our newsletter for data science content & infographics: https://datasciencedojo.com/newsletter/