Run A Local LLM Across Multiple Computers! (vLLM Distributed Inference)

Run A Local LLM Across Multiple Computers! (vLLM Distributed Inference)

11.773 Lượt nghe
Run A Local LLM Across Multiple Computers! (vLLM Distributed Inference)
Timestamps: 00:00 - Intro 01:24 - Technical Demo 09:48 - Results 11:02 - Intermission 11:57 - Considerations 15:48 - Conclusion In this video, we explore distributed inference using vLLM and Ray. To demonstrate this exciting functionality, we set up two nodes: one equipped with two RTX 3090 Ti GPUs and another with two RTX 3060 GPUs. After configuring the nodes, we test distributed inference by loading a model across both nodes, enabling interaction with a fully distributed inference setup. Join us as we dive into the technical details, share results, and discuss considerations for using distributed inference in your own projects!