From Documents to Dialogue: Unlocking PDF Data with a Smart Chatbot
Mountains of valuable information are locked away inside PDF files. Whether it’s business reports, regulatory documents, user manuals, or research papers, the ability to extract and utilize insights from these documents is becoming essential.
In this live session, Simon Prickett (Developer Advocate at CrateDB), will begin by showing you how to extract data from text and images in PDF files, storing it in CrateDB. From there, you’ll see how to generate embeddings using AI models and perform hybrid semantic and keyword searches with SQL queries. Finally, we’ll put it all together and demonstrate a natural language chatbot that takes questions in plain English, returning responses from a large language model.
You can find the code for a complete Python project that you can try out with a free CrateDB cloud database and your own PDF files on GitHub at https://github.com/crate/devrel-pdf-rag-chatbot