Clean knowledge bases are essential for your agents in order to continue answering questions and maintain quality outputs. Standard web scrapers can inject additional unnecessary chunks to the LLM, making its job harder.
You can automate the process with Crawl4AI, n8n, google drive, and voiceflow.
This video covers the default scraper issues, setting up a better workflow, and integrating the cleaned data into Voiceflow.
Cole Medin's video setting up Crawl4AI + @DockerInc + @DigitalOcean
https://www.youtube.com/watch?v=c5dw_jsGNBk
⚫Sign up for N8N cloud and start automating workflows!⚫ (affiliate link)
https://n8n.partnerlinks.io/adblhu2pfwu3
My video Auto-Updating Voiceflow KB:
https://youtu.be/9UxaDG2uCO8
🤖 Sign up for VoiceFlow and start building AI Agents Today! 🤖 (affiliate link)
https://partners.voiceflow.com/Umbral-AI
Looking to implement AI Assistant Chatbots or Automations?
Feel free to book a discovery call:
https://cal.com/umbral/discovery-call
Or reach out at:
https://umbral.ai
[email protected]
Chapters:
00:00 – Intro
00:29 – Default Scraper
01:10 – Scraper Issues
03:00 – Automation Setup
05:50 – KB Generation
07:00 – Agent Integration
09:40 – n8n Workflow
13:50 – Markdown Cleanup
18:30 – Wrap-Up