Cloud vs. Homelab: Which is *Actually* Better for LLMs?

Cloud vs. Homelab: Which is *Actually* Better for LLMs?

1.818 Lượt nghe
Cloud vs. Homelab: Which is *Actually* Better for LLMs?
I battled my homelab machine cerebro against cloud machines with identical or better gpus to see if my local setup is worth it or not. The results were interesting to say the least, especially when it came to multi-gpu tests. Checkout the video and watch me scratching my head given some odd results along the way. I did document the results and scripts in a dedicated git repo. Go ahead and check that one out as well. Links from the video 🔥: 👉🏻 GitHub Project: https://github.com/tech-grandpa/llmperf (with sources of the video) 👉🏻 SGLang: https://github.com/sgl-project/sglang 👉🏻 vLLM: https://github.com/vllm-project/vllm 👉🏻 Ollama: https://ollama.com 👉🏻 Llama-3.1-8B - FP8: https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 👉🏻 Llama-3.1-8B - BF16: https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct My Links 🔗 👉🏻 GitHub Project: https://github.com/tech-grandpa/llmperf (with sources of the video) 👉🏻 Subscribe: https://www.youtube.com/@theNittyGritty 👉🏻 BlueSky: https://bsky.app/profile/techgrandpa.bsky.social 👉🏻 GitHub: https://github.com/tech-grandpa ⬇️Chapters in case you want to skip ahead ⬇️ 00:00 - Intro 00:42 - Test Goals 02:11 - Test Setup - Phase I 04:47 - Test Results for single runs 07:13 - Test Results for single runs - cleaned-up 09:10 - Test for parallel runs 09:35 - Test Results for parallel runs 10:21 - Performance hit for parallel stress test (multiple cards) 10:36 - Mainboard error indicator beep during test run 11:20 - Parallel test run - overall result analysis 11:39 - Cloud performance comparison - local test driver 12:28 - Cloud test run analysis 13:12 - Final thoughts 14:52 - Outro-ish