Model Distillation: Same LLM Power but 3240x Smaller

17.867 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Model Distillation: Same LLM Power but 3240x Smaller

Foundation model performance at a fraction of the cost- model distillation is a powerful technique to leverage the advanced generation capabilities of foundation models like Llama 3.1 405B, GPT-4, or Claude Opus as teachers, distilling their knowledge and performance on a given task to a student model.

The result is a task-specific lightweight language model that provides the same performance, capability, or style as the foundation model without all the extra parameters. In this video we demonstrate this by using Llama 3.1 405B to perform sentiment analysis on a dataset of tweets, and use that generated dataset to train RoBERTa, a 125 million parameter model, to perform with the same accuracy on tweet sentiment classification tasks. Comparable performance using a model 3240 times smaller!

Resources:
Code: https://github.com/ALucek/LLM-distillation-guide
Llama 3.1 405B Tweet Dataset: https://huggingface.co/datasets/AdamLucek/twittersentiment-llama-3.1-405B-labels
Distilled Model: https://huggingface.co/AdamLucek/roberta-llama3.1405B-twitter-sentiment
Moritz Laurer Blog: https://huggingface.co/blog/synthetic-data-save-costs
AutoTrain: https://huggingface.co/autotrain
A Survey on Knowledge Distillation of Large Language Models: https://arxiv.org/pdf/2402.13116

Chapters:
00:00 - Intro
01:11 - Model Distillation Trend
04:49 - Use Case: Instruction Following
05:45 - Use Case: Multi-Turn Dialogue
06:17 - Use Case: Retrieval Augmented Generation
06:59 - Use Case: Tool & Function Calling
07:52 - Use Case: Text Annotation
08:16 - Code: Distilling Llama 3.1 405B Overview
09:32 - Code: Initializing Tweet Dataset
10:57 - Code: Setting Up LLM & Annotation Prompt
15:10 - Code: Creating Annotated Dataset
17:25 - Training: RoBERTa & AutoTrain
18:30 - Training: Setting up AutoTrain Environment
19:02 - Training: Running Training Job on RoBERTa
21:42 - Evaluate: Using our Fine Tuned RoBERTa Model
22:23 - Evaluate: Visualizing Accuracy
23:37 - Evaluate: Visualizing Label Distribution
24:14 - Evaluate: Cost & Time Considerations
24:49 - Outro

#machinelearning #ai #coding					

Model Distillation: Same LLM Power but 3240x Smaller

Nhạc Theo Chủ Đề

Liên kết website

Model Distillation: Same LLM Power but 3240x Smaller

Những bài liên quan

Chưa có bài liên quan nào!