🤯 Are you REALLY using the BEST LLM for parallel function calling? I ran a benchmark with 14 LLMs, 420 prompts, and there was 1 clear winner!
🎥 Featured Media:
- Live Benchmark Codebase (WIP): https://github.com/disler/benchy
- Autocomplete Benchmark Video:
https://youtu.be/1ObiaSiA8BQ
- My Plan for 2025 - MAX AI COMPUTE:
https://youtu.be/4SnvMieJiuw
What's the secret to creating powerful, long-running agentic workflows? It all comes down to parallel function calling. In this video, we discuss a benchmark comparing 14 LLMs across 420 prompts to uncover the BEST LLM and tool-calling techniques for reliable, efficient, and cost-effective parallel function calls. This isn't just theory—we're showing you LIVE benchmark results, breaking down execution time, cost, and accuracy for each LLM, including Gemini Experimental, Gemini Flash, Claude 3.5 Sonnet, Claude 3.5 Haiku, GPT-4o, o1-mini, and more.
🚀 We'll explore two critical elements for building robust agentic workflows: specialized AI agents and reliable tool-calling mechanisms. You'll see how to design prompts that trigger multiple tools in parallel, testing chains of 1, 2, 3, all the way up to 15 parallel function calls! The results are eye-opening, revealing which LLMs excel at handling long chains of tool calls and which ones fall short.
🔥 We'll also uncover a surprising yet common trick: using JSON prompts to give LLMs *without* native function calling the ability to execute parallel tool calls. We'll compare this technique against built-in function calling capabilities and see which approach delivers the best performance. This is critical for maximizing the efficiency and cost-effectiveness of your agentic workflows. Don't waste your resources on underperforming LLMs – this benchmark will show you the path to optimized AI performance.
🛠️ This video is packed with actionable insights for anyone building agentic systems, personal AI assistants, or any application requiring robust parallel function calling. We'll discuss the importance of benchmarking, testing, and evaluating your AI tools to make data-driven decisions and maximize your ROI. Plus, we'll share tips on prompt design and structuring JSON prompts for optimal results. Join us as we unlock the secrets to building next-level agentic applications with the best LLM for parallel function calling.
Stay focused and KEEP BUILDING
📖 Chapters
00:00 - Two Elements for Agentic Workflows
01:05 - Parallel Function Calling
01:36 - Parallel Function Length 1
02:41 - Parallel Function Length 2
04:11 - Parallel Function Length 3
04:47 - Parallel Function Length 4
06:31 - Gemini 1.5 Flash is insane
07:30 - Parallel Function Length 5
09:50 - Parallel Function Length 7
12:12 - Structured Outputs and JSON prompts
14:45 - Parallel Function Length 10
16:15 - JSON Prompts beating Function Calling
18:20 - Parallel Function Length 15
19:20 - You have options for parallel function calling
20:40 - Live Benchmarks are insanely VALUABLE
#agentic #promptengineering #llm