AI Agents for Data Analysis with Shreya Shankar - 703

AI Agents for Data Analysis with Shreya Shankar - 703

3.693 Lượt nghe
AI Agents for Data Analysis with Shreya Shankar - 703
Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL - https://www.docetl.com/, a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and complex document analysis tasks. We explore how DocETL's optimizer architecture works, the intricacies of building agentic systems for data processing, the current landscape of benchmarks for data processing tasks, how these differ from reasoning-based benchmarks, and the need for robust evaluation methods for human-in-the-loop LLM workflows. Additionally, Shreya shares real-world applications of DocETL, the importance of effective validation prompts, and building robust and fault-tolerant agentic systems. Lastly, we cover the need for benchmarks tailored to LLM-powered data processing tasks and the future directions for DocETL. 🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/703. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: https://twitter.com/twimlai Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/ Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 📖 CHAPTERS =============================== 00:00 - Introduction 4:57 - Challenges in AI interface design 9:02 - DocETL 14:13 - Data connector challenges in ETL systems 15:04 - UI for document processing 17:49 - Model support 18:54 - Data extraction tasks 21:08 - Prompts and HITL 25:32 - Evaluation in data processing 31:31 - Agents and agentic systems 38:46 - Benchmarks for data processing 43:44 - States-based models or long-context LLMs 44:02 - Future directions 🔗 LINKS & RESOURCES =============================== DocETL - https://www.docetl.com/ Reimagining LLM-Powered Unstructured Data Analysis with DocETL - https://data-people-group.github.io/blogs/2024/09/24/docetl/ EPIC Data Lab - https://epic.berkeley.edu/ 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5