Benchmarking LLMs Metrics, Challenges, and Best Practices for Evaluation - DevConf.IN 2025

16 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Benchmarking LLMs Metrics, Challenges, and Best Practices for Evaluation - DevConf.IN 2025

Speaker(s): Ravindra Patil

---
LLMs have been very useful and we have high potential for LLMs in Enterprises. However, evaluating these models remains a complex challenge and one of the reasons for LLMs not being adopted directly. 
The responsible and ethical AI is going to be the key for Enterprises to adopt the LLMs for their business needs.
Traditional metrics like perplexity or BLEU score often fail to capture the nuanced capabilities of LLMs in real-world applications. 
This talk is about current best practices in benchmarking LLMs, limitations of existing approaches and emerging evaluation techniques.

We’ll explore a range of qualitative and quantitative metrics, 
from task-specific benchmarks (e.g., code generation, summarization) 
to user-centric evaluations (e.g., coherence, creativity, bias detection). 
importance of specialized benchmarks that test LLMs on ethical and explainability grounds

Outcome : The audience will be able to understand how to choose LLMs for the right balance of accuracy, efficiency, and fairness. Additionally understand what has improved in granite 3.0 which makes it better LLM.
---


Slides and other resources: 
https://pretalx.devconf.info/devconf-in-2025/talk/9EP8YM/					

Benchmarking LLMs Metrics, Challenges, and Best Practices for Evaluation - DevConf.IN 2025

Nhạc Theo Chủ Đề

Liên kết website