Evaluating only an agent’s final response isn’t enough. While the answer may be correct, the agent may have taken inefficient or incorrect steps to get there, making it necessary to evaluate the entire trajectory taken.
This is why we built AgentEvals—an open-source package for agent trajectory evaluation.
In this video, we explore AgentEvals:
0:00 - Why trajectory evaluation matters
0:55 - Overview of AgentEvals
7:22 - Using it in Code
8:58 - Running experiments in LangSmith
Links:
AgentEvals: https://github.com/langchain-ai/agentevals
Notebook: https://github.com/catherine-langchain/agentevals/blob/main/react-agent-eval.ipynb