To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/AdamLucek/ You’ll also get 20% off an annual premium subscription!
Resources:
Chunking Notebook: https://github.com/ALucek/chunking-strategies
ChromaDB Technical Report: https://research.trychroma.com/evaluating-chunking
ChromaDB Report Repo: https://github.com/brandonstarxel/chunking_evaluation
OpenAI Token Visualizer: https://platform.openai.com/tokenizer
Greg Kamradt 5 Levels of Text Splitting: https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb
Jaccard Index: https://en.wikipedia.org/wiki/Jaccard_index
Chapters:
00:00 - Background on Text Chunking
02:28 - Brilliant!
03:47 - Character Text Splitting
06:28 - Token Text Splitting
10:26 - Recursive Character/Token Splitting
16:07 - Kamradt & Modified Semantic Chunking
20:43 - Cluster Semantic Chunking
24:46 - LLM Semantic Chunking
27:56 - Chunking Metrics & Comparison
30:00 - Overall Findings
#ai #programming #datascience
This video is sponsored by Brilliant