RAG for long context LLMs

RAG for long context LLMs

40.323 Lượt nghe
RAG for long context LLMs
This is a talk that @rlancemartin gave at a few recent meetups on RAG in the era of long context LLMs. With context windows growing to 1M+ tokens, there have been many questions about whether RAG is "dead." We pull together threads from a few different recent projects to take a stab at addressing this. We review some current limitations with long context LLM fact reasoning & retrieval (using multi-needle in a haystack analysis), but also discuss some likely shifts in the RAG landscape due to expanding context windows (approaches for doc-centric indexing and RAG "flow engineering"). Slides: https://docs.google.com/presentation/d/1mJUiPBdtf58NfuSEQ7pVSEQ2Oqmek7F1i4gBwR6JDss/edit?usp=sharing Highlighted references: 1/ Multi-needle analysis w/ @GregKamradt https://blog.langchain.dev/multi-needle-in-a-haystack/ 2/ RAPTOR (@parthsarthi03 et al) https://github.com/parthsarthi03/raptor/tree/master https://www.youtube.com/watch?v=jbGchdTL7d0 3/ Dense-X / multi-representation indexing (@tomchen0 et al) https://arxiv.org/pdf/2312.06648.pdf https://blog.langchain.dev/semi-structured-multi-modal-rag/ 4/ Long context embeddings (@JonSaadFalcon, @realDanFu, @simran_s_arora) https://hazyresearch.stanford.edu/blog/2024-01-11-m2-bert-retrieval https://www.together.ai/blog/rag-tutorial-langchain 5/ Self-RAG (@AkariAsai et al), C-RAG (Shi-Qi Yan et al) https://arxiv.org/abs/2310.11511 https://arxiv.org/abs/2401.15884 https://blog.langchain.dev/agentic-rag-with-langgraph/ (edited) Timepoints: 0:20 - Context windows are getting longer 2:10 - Multi-needle in a haystack 9:30 - How might RAG change? 12:00 - Query analysis 13:07 - Document-centric indexing 16:23 - Self-reflective RAG 19:40 - Summary