Join Roie Schwaber-Cohen and me for a deep dive into The Pinecone Vercel starter template that deploys an AI chatbot that is less likely to hallucinate thanks to Retrieval Augmented Generation (RAG).
This is an excellent video to watch if you are learning about Generative AI, want to build a chatbot, or are having difficulty getting your current AI chatbots to return factual answers about specific topics and your proprietary data. You don't need to already be an AI pro to watch this video, because we start off by explaining what RAG is and why it's such an important technique for building Generative AI applications that are less likely to hallucinate.
The majority of this content was originally captured as a live Twitch.tv stream co-hosted by Roie (rschwabco) and myself (zackproser). Be sure to follow us on Twitch for more Generative AI deep dives, tutorials, live demos, and conversations about the rapidly developing world of Artificial Intelligence.
The Pinecone Vercel template recently went viral. With a GitHub and Vercel account, you can deploy this template yourself in about 2 minutes flat. Since it's an open-source GitHub project, you can modify and tweak it to your heart's content: change the logo and swap in your company name, tweak the behavior, or extend it to meet your needs.
In this video, we step through the code commit by commit, and Roie explains how he built the application, its major components, and how RAG works practically. Roie built the template with the latest Next.js framework and in TypeScript, so we discuss how TypeScript can help us to write ambitious applications more quickly.
In addition to the AI chatbot, the template also implements a recursive crawler for web pages, sanitizes and converts HTML into chunked documents, converts those documents to embeddings (vectors) and upserts the vectors into the Pinecone vector database.
Before diving into the discussion, I provide a basic introduction to Retrieval Augmented Generation and how it works at a conceptual level. I've linked the timestamps to the topic discussed in this video below so that you can jump to the sections you're most interested in.
Timestamps:
0:00 -
0:42 What is a Vercel template?
1:18 -
1:34 How to find the template
1:51 -
2:46 What is hallucination?
2:56 -
5:14 Example of hallucination in a Generative AI application
5:18 -
10:15 What is Retrieval Augmented Generation? (Diagram)
10:17 -
10:44 How you can modify and deploy this template yourself
12:04 Start of the live stream recording
12:04 -
14:00 Reading TypeScript signatures
14:20 -
15:15 OpenAIStream responses
15:55 -
19:22 What are embeddings? How do we get them?
19:55 -
21:03 Initializing the Pinecone client (API key & environment)
21:29 -
22:05 Getting matches from Pinecone vector database
23:02 -
26:44 The getContext method and how it works / Metadata
26:48 -
27:17 Converting the user's query into embeddings
27:40 -
28:30 Summarizing what we have built so far
28:59 -
33:17 The Crawler component - how it works - HTML semantics
33:26 -
34:30 Zack presses Roie on how this Crawler was REALLY built :)