Improving RAG Retrieval by 60% with Fine-Tuned Embeddings

Improving RAG Retrieval by 60% with Fine-Tuned Embeddings

15.326 Lượt nghe
Improving RAG Retrieval by 60% with Fine-Tuned Embeddings
Actually worked better than I thought lol Resources: Code: https://github.com/ALucek/ft-modernbert-domain Model: https://huggingface.co/AdamLucek/ModernBERT-embed-base-legal-MRL Dataset: https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic Philipp Schmid’s Blog: https://www.philschmid.de/fine-tune-embedding-model-for-rag#3-define-loss-function-with-matryoshka-representation Matryoshka Representation Learning Blog: https://huggingface.co/blog/matryoshka MRL Paper: https://arxiv.org/pdf/2205.13147 Chapters: 00:00 - Why Care About Embedding Models 02:41 - Setting the Scene 04:33 - Synthetic Dataset Creation 06:09 - Triplets 08:05 - Formatting our Dataset 08:53 - Choosing a Base Model 10:14 - Evaluation Dataset Prep 12:44 - Matryoshka Representation Learning 15:51 - Creating the Sequence Evaluator 17:00 - Evaluation Metric Breakdown 21:08 - Base Model Evaluation 22:04 - Loading the Model for Training 22:51 - Loss Function Selection 25:05 - Trainer Arguments 26:08 - Training the Model! 26:57 - Comparing Base vs Fine Tune Metrics 28:28 - Using The Fine Tuned Model #ai #datascience #machinelearning