Evaluation for Large Language Models (LLMs) and Generative AI - A Deep Dive

Evaluation for Large Language Models (LLMs) and Generative AI - A Deep Dive

10.007 Lượt nghe
Evaluation for Large Language Models (LLMs) and Generative AI - A Deep Dive
Evaluation for Large Language Models and Generative AI - A Deep Dive Notebooks and additional resources: https://github.com/rajshah4/LLM-Evaluation 0:00 Overview 1:30 Evaluation is broken 4:00 Reliability issues with HELM and Hugging Face Leaderboard 6:32 Evaluation before generative AI 9:33 Framework for evaluating generative AI 14:01 Reviewing 8 evaluation approaches 15:22 Exact matching approach 29:12 Similarity approach 32:37 Functional Correctness 35:50 Evaluation Benchmarks 45:12 Human Evaluation 49:03 Human Comparison/Arena 52:00 Model-Based Evaluation 1:02:22 Red Teaming 1:06:11 Operational Issues in Evaluation 1:09:10 RAG Case Study ━━━━━━━━━━━━━━━━━━━━━━━━━ ★ Rajistics Social Media » ● Home Page: http://www.rajivshah.com ● LinkedIn: https://www.linkedin.com/in/rajistics/ ━━━━━━━━━━━━━━━━━━━━━━━━━