The BEST Way to Chunk Text for RAG

The BEST Way to Chunk Text for RAG

18.300 Lượt nghe
The BEST Way to Chunk Text for RAG
To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/AdamLucek/ You’ll also get 20% off an annual premium subscription! Resources: Chunking Notebook: https://github.com/ALucek/chunking-strategies ChromaDB Technical Report: https://research.trychroma.com/evaluating-chunking ChromaDB Report Repo: https://github.com/brandonstarxel/chunking_evaluation OpenAI Token Visualizer: https://platform.openai.com/tokenizer Greg Kamradt 5 Levels of Text Splitting: https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb Jaccard Index: https://en.wikipedia.org/wiki/Jaccard_index Chapters: 00:00 - Background on Text Chunking 02:28 - Brilliant! 03:47 - Character Text Splitting 06:28 - Token Text Splitting 10:26 - Recursive Character/Token Splitting 16:07 - Kamradt & Modified Semantic Chunking 20:43 - Cluster Semantic Chunking 24:46 - LLM Semantic Chunking 27:56 - Chunking Metrics & Comparison 30:00 - Overall Findings #ai #programming #datascience This video is sponsored by Brilliant