Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.com/shaw
Here, I discuss 3 types of evals and how to use them to improve LLM apps.
📰 Blog: https://medium.com/@shawhin/how-to-evaluate-and-improve-your-llm-apps-f7b08fb7493c?sk=f2fbcd3f16b958baa4734d4a39d5b237
💻 Example Code: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/evals
References
[1]
https://youtu.be/XGJNo8TpuVA
[2] arXiv:2501.12948 [cs.CL]
[3] arXiv:2402.01383 [cs.CL]
[4] https://hamel.dev/blog/posts/llm-judge/
[5] arXiv:2203.02155 [cs.CL]
[6]
https://youtu.be/SnbGD677_u0
--
Intro -
0:00
Vibe Checks -
0:27
Evals -
3:26
Type 1: Code-based -
5:58
Type 2: Human-based -
9:34
Type 3: LLM-based -
13:34
Example: Improving y2b with LLM Judge -
15:28
Homepage: https://www.shawhintalebi.com