This is a talk that @rlancemartin gave at a few recent meetups on RAG in the era of long context LLMs. With context windows growing to 1M+ tokens, there have been many questions about whether RAG is "dead." We pull together threads from a few different recent projects to take a stab at addressing this. We review some current limitations with long context LLM fact reasoning & retrieval (using multi-needle in a haystack analysis), but also discuss some likely shifts in the RAG landscape due to expanding context windows (approaches for doc-centric indexing and RAG "flow engineering").
Slides:
https://docs.google.com/presentation/d/1mJUiPBdtf58NfuSEQ7pVSEQ2Oqmek7F1i4gBwR6JDss/edit?usp=sharing
Highlighted references:
1/ Multi-needle analysis w/ @GregKamradt
https://blog.langchain.dev/multi-needle-in-a-haystack/
2/ RAPTOR (@parthsarthi03 et al)
https://github.com/parthsarthi03/raptor/tree/master
https://www.youtube.com/watch?v=jbGchdTL7d0
3/ Dense-X / multi-representation indexing (@tomchen0 et al)
https://arxiv.org/pdf/2312.06648.pdf
https://blog.langchain.dev/semi-structured-multi-modal-rag/
4/ Long context embeddings (@JonSaadFalcon, @realDanFu, @simran_s_arora)
https://hazyresearch.stanford.edu/blog/2024-01-11-m2-bert-retrieval
https://www.together.ai/blog/rag-tutorial-langchain
5/ Self-RAG (@AkariAsai et al), C-RAG (Shi-Qi Yan et al)
https://arxiv.org/abs/2310.11511
https://arxiv.org/abs/2401.15884
https://blog.langchain.dev/agentic-rag-with-langgraph/ (edited)
Timepoints:
0:20 - Context windows are getting longer
2:10 - Multi-needle in a haystack
9:30 - How might RAG change?
12:00 - Query analysis
13:07 - Document-centric indexing
16:23 - Self-reflective RAG
19:40 - Summary