The Raspberry Pi is a compelling low-power option for running GPU-accelerated LLMs locally.
For my main test setup, here's the hardware I used (some links are affiliate links):
- Raspberry Pi 5 8GB ($80): https://www.raspberrypi.com/products/raspberry-pi-5/
- Raspberry Pi 27W Power Supply ($14): https://www.raspberrypi.com/products/power-supply/
- 1TB USB SSD ($64): https://amzn.to/3OjJysQ
- Pineboards HatDrive! Bottom ($20): https://amzn.to/3Zbz0T5
- JMT M.2 Key to PCIe eGPU Dock ($55): https://amzn.to/4eCpi0g
- OCuLink cable ($20): https://amzn.to/3YTXNJW
- Lian-Li SFX 750W PSU ($130): https://amzn.to/48T4a4R
- AMD RX 6700 XT ($400): https://amzn.to/3UXywgI
And here are the resources I mentioned for setting up your own GPU-accelerated Pi:
- Blog post with AMD GPU setup instructions: https://www.jeffgeerling.com/blog/2024/amd-radeon-pro-w7700-running-on-raspberry-pi
- Blog post with llama.cpp Vulkan instructions: https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5
- Llama Benchmarking issue: https://github.com/geerlingguy/ollama-benchmark/issues/1
- AMD not supporting ROCm on Arm: https://github.com/ROCm/ROCm/issues/3960
- Raspberry Pi PCIe Database: https://pipci.jeffgeerling.com
- Home Assistant Voice Control: https://www.home-assistant.io/voice_control/
- James Mackenzie's video with RX 580:
https://www.youtube.com/watch?v=J0z09Ddr58w
Support me on Patreon: https://www.patreon.com/geerlingguy
Sponsor me on GitHub: https://github.com/sponsors/geerlingguy
Merch: https://www.redshirtjeff.com
2nd Channel: https://www.youtube.com/@GeerlingEngineering
3rd Channel: https://www.youtube.com/@Level2Jeff
Contents:
00:00 - Why do this on a Pi
01:33 - Should I even try?
02:06 - Hardware setup
04:34 - Comparisons with Llama
05:43 - How much is too much?
06:52 - Benchmark results
07:41 - Software setup
09:13 - More models, more testing