How Multilingual Data Is Reshaping LLMs and VLMs

How Multilingual Data Is Reshaping LLMs and VLMs

179 Lượt nghe
How Multilingual Data Is Reshaping LLMs and VLMs
This webinar explores the latest advances in multilingual AI systems. Industry experts discuss challenges in training and benchmarking multilingual models, and examine how they are applied in the industry and who benefits from them. What you’ll learn: 1. Approaches to address the scarcity of multilingual datasets in post-training 2. How sample-efficient learning techniques overcome data scarcity for multilingual NLU 3. How translation impacts fine-tuning and evaluation 4. Challenges of evaluating generative models without human references in a multilingual setting 5. Why VLMs struggle with cultural and linguistic nuances Learn more about the JEEM multi-dialect benchmark here: https://toloka.ai/jeem-benchmark Speakers: - Ahmet Üstün, Senior Research Scientist, Cohere For AI - Evgeniia Razumovskaia, Applied Scientist, Unlikely AI - Pinzhen "Patrick" Chen, Senior NLP Engineer, Aveni - Ekaterina Artemova, ML Researcher, Toloka - Patrícia Schmidtová, PhD student, Charles University