Improving RAG Retrieval by 60% with Fine-Tuned Embeddings

15.326 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Improving RAG Retrieval by 60% with Fine-Tuned Embeddings

Actually worked better than I thought lol

Resources:
Code: https://github.com/ALucek/ft-modernbert-domain
Model: https://huggingface.co/AdamLucek/ModernBERT-embed-base-legal-MRL
Dataset: https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic
Philipp Schmid’s Blog: https://www.philschmid.de/fine-tune-embedding-model-for-rag#3-define-loss-function-with-matryoshka-representation
Matryoshka Representation Learning Blog: https://huggingface.co/blog/matryoshka
MRL Paper: https://arxiv.org/pdf/2205.13147

Chapters:
00:00 - Why Care About Embedding Models
02:41 - Setting the Scene
04:33 - Synthetic Dataset Creation
06:09 - Triplets
08:05 - Formatting our Dataset
08:53 - Choosing a Base Model
10:14 - Evaluation Dataset Prep
12:44 - Matryoshka Representation Learning
15:51 - Creating the Sequence Evaluator
17:00 - Evaluation Metric Breakdown
21:08 - Base Model Evaluation
22:04 - Loading the Model for Training
22:51 - Loss Function Selection
25:05 - Trainer Arguments
26:08 - Training the Model!
26:57 - Comparing Base vs Fine Tune Metrics
28:28 - Using The Fine Tuned Model

#ai #datascience #machinelearning					

Improving RAG Retrieval by 60% with Fine-Tuned Embeddings

Nhạc Theo Chủ Đề

Liên kết website