Hands-On Multimodal RAG: Images, Tables & Text

10.862 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Hands-On Multimodal RAG: Images, Tables & Text

Learn how to build a vision-based RAG pipeline that directly indexes and retrieves images, tables, and text—no captions needed! We’ll compare Cohere’s Embed-v4 API with a fully local ColPali based solution, then plug the results into a vision-language model like Gemini for accurate, context-rich answers. Whether you need a cloud-powered workflow or a private on-prem setup, this hands-on tutorial shows you every step.

LINKS:
- https://youtu.be/DI9Q60T_054
- https://youtu.be/Ra8n_9wnHFs
- https://x.com/Nils_Reimers/status/1915431608980586874
- Embed-v4: https://cohere.com/blog/embed-4
- Notebook: https://colab.research.google.com/drive/1JwZ_nWhBUFbrzJnHKmyd0qKJ3gVt5lCe?usp=sharing

RAG Beyond Basics Course:
https://prompt-s-site.thinkific.com/courses/rag

Let's Connect: 
🦾 Discord: https://discord.com/invite/t4eYQRUcXB
☕ Buy me a Coffee: https://ko-fi.com/promptengineering
|🔴 Patreon: https://www.patreon.com/PromptEngineering
💼Consulting: https://calendly.com/engineerprompt/consulting-call
📧 Business Contact: [email protected]
Become Member: http://tinyurl.com/y5h28s6h

💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off).  

Signup for Newsletter, localgpt:
https://tally.so/r/3y9bb0

00:00 Introduction to Multimodal RAG Systems
00:31 Traditional Text-Based RAG Systems
02:13 Cohere's Embed Form for Multimodal Search
02:56 Workflow Overview
05:17 Code Implementation: Proprietary API
14:04 Code Implementation: Local Model
15:07 Using ColPali for Local Vision-Based Retrieval					

Hands-On Multimodal RAG: Images, Tables & Text

Nhạc Theo Chủ Đề

Liên kết website