Buckle up - HUGE amount of value in this video for building RAG AI Agents that actually work. Honestly I could have made this video into an entire course but I wanted to give it away to you for free. :)
RAG is the most common approach for providing external knowledge to an LLM. The problem is, once you have your own curated data in a vector database as a knowledgebase for your LLM, often times these RAG setups can be very underwhelming. The wrong text is returned from the search, the LLM ignores the context provided, etc. The logic of RAG makes sense in your head but it just doesn’t work in practice.
And you certainly aren’t alone! That’s why there is a TON of research in the industry for how to essentially just do RAG better. There are a lot of strategies out there, but out of all the ones I’ve researched and tried myself, agentic RAG is the most obvious, works the best, and is what I’m going to introduce you to and show you exactly how to implement in this video.
In the last video on my channel, I showed you how to use Crawl4AI, an open source LLM-friendly web crawler, to scrape entire websites for RAG SUPER fast. We used the entire documentation for my favorite agent framework, Pydantic AI, as an example. Now we’re taking this MUCH further by:
1. Putting all the documentation in a database for RAG
2. Creating an agentic RAG agent to use this knowledgebase with Pydantic AI
3. Building a frontend to chat with our agent using Streamlit
I’ll explain exactly what Agentic RAG is, what makes it so powerful, and then this AI agent we’ll build in the video will be the perfect example!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Try GPUStack for free - it's open source and you can find their GitHub repo here:
https://github.com/gpustack/gpustack
I don't have the pleasure of being sponsored by open source projects often, so this was a treat! It's the best GPU cluster manager for LLM inference that I have seen and a very honest recommendation! Here is their main site as well:
https://gpustack.ai/
Key features of GPUStack:
1. Heterogeneous GPU cluster management including Linux, Mac and Windows with Nvidia, and Apple Silicon. AMD coming soon!
2. Distributed inference with smart scheduling: GPUStack can distribute a big model to multiple heterogeneous workers. Automatically calculates and decide whether distributed inference is required and configure it automatically.
3. Rich model types support: GPUStack supports LLM, VLM, Image Generation, Embedding, Rerank, TTS&STT models.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previous video with Crawl4AI:
https://youtu.be/JWfNLF_g_V0
All code for this Agentic RAG Agent can be found here:
https://github.com/coleam00/ottomator-agents/tree/main/crawl4AI-agent
Try this agent yourself right now on the Live Agent Studio (called the "Pydantic AI Expert")!
https://studio.ottomator.ai
Diagram to follow along with the knowledgebase creation flow:
https://claude.site/artifacts/f4dca1c3-f137-4b82-9254-dfa01ca43802
Weaviate Article on Agentic RAG:
https://weaviate.io/blog/what-is-agentic-rag
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
00:00 - Agentic RAG - the Holy Grail of RAG
02:18 - What is Agentic RAG?
06:22 - Breaking our Agent Down Step by Step
08:33 - Try this Agent Now for Free
09:00 - Code Overview
09:58 - Crawl4AI Review
10:52 - Creating Our Knowledgebase for Supabase
21:38 - GPUStack
23:33 - Supabase Setup
26:08 - Getting Crawl4AI Data into Supabase
28:09 - Basic RAG AI Agent with Pydantic AI
33:44 - Testing our Basic RAG Agent
36:33 - Agentic RAG Implementation
40:40 - Demo of Our Agentic RAG Agent
41:37 - Streamlit UI
44:53 - Outro
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Join me as I push the limits of what is possible with AI. I'll be uploading videos at least two times a week - Sundays and Wednesdays at
7:00 PM CDT! Sundays and Wednesdays are for everything AI, focusing on providing insane and practical educational value. I will also post sometimes on Fridays at
7:00 PM CDT - specifically for platform showcases - sometimes sponsored, always creative in approach!