How to Evaluate (and Improve) Your LLM Apps

How to Evaluate (and Improve) Your LLM Apps

2.330 Lượt nghe
How to Evaluate (and Improve) Your LLM Apps
Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.com/shaw Here, I discuss 3 types of evals and how to use them to improve LLM apps. 📰 Blog: https://medium.com/@shawhin/how-to-evaluate-and-improve-your-llm-apps-f7b08fb7493c?sk=f2fbcd3f16b958baa4734d4a39d5b237 💻 Example Code: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/evals References [1] https://youtu.be/XGJNo8TpuVA [2] arXiv:2501.12948 [cs.CL] [3] arXiv:2402.01383 [cs.CL] [4] https://hamel.dev/blog/posts/llm-judge/ [5] arXiv:2203.02155 [cs.CL] [6] https://youtu.be/SnbGD677_u0 -- Intro - 0:00 Vibe Checks - 0:27 Evals - 3:26 Type 1: Code-based - 5:58 Type 2: Human-based - 9:34 Type 3: LLM-based - 13:34 Example: Improving y2b with LLM Judge - 15:28 Homepage: https://www.shawhintalebi.com