Deep Dive into Inference Optimization for LLMs with Philip Kiely

747 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Deep Dive into Inference Optimization for LLMs with Philip Kiely

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads.

We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice. 

Timestamps

16 Start
43 Why focus on Inference?
15 Model Selection
52 Saving Costs
07 Saturating a GPU
28 When does it makes sense to fine tune
12 Compound AI
18 Performance
09 Why is inference slow
28 Techniques to optimize inference
54 Choice of GPUs
19 Programming Language Choice
48 Quick Fire					

Deep Dive into Inference Optimization for LLMs with Philip Kiely

Nhạc Theo Chủ Đề

Liên kết website