Deep Dive into Inference Optimization for LLMs with Philip Kiely

Deep Dive into Inference Optimization for LLMs with Philip Kiely

747 Lượt nghe
Deep Dive into Inference Optimization for LLMs with Philip Kiely
Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads. We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice. Timestamps 01:16 Start 05:43 Why focus on Inference? 11:15 Model Selection 16:52 Saving Costs 20:07 Saturating a GPU 21:28 When does it makes sense to fine tune 23:12 Compound AI 29:18 Performance 31:09 Why is inference slow 33:28 Techniques to optimize inference 50:54 Choice of GPUs 58:19 Programming Language Choice 59:48 Quick Fire