Today, we're trying to load and use a 70B LLM with ollama on a 14" M4 Pro MacBook Pro with 48GB RAM. Will it work?
In this video:
💻 14" MacBook Pro M4 Pro (12 cores): https://amzn.to/3ANEPwB
💨 TG Pro (CPU + GPU cores temps and fan speed): https://www.tunabellysoftware.com/tgpro/index.php?fpr=d157l
🎤 Microphone: https://amzn.to/3AFgvNw
🖱️ Mouse: https://amzn.to/3Z3pal4
⌨️ Keyboard: https://amzn.to/3OdkjZv
I tested 7 small LLMs locally to find the fastest 👇
https://www.youtube.com/watch?v=CDdo29LgoRk
Models tested:
- phi
3:14b - 7.9GB
- qwen2.
5:14b - 9.0GB
- gemma
2:27b - 16GB
- llama3.
1:8b (fp16) - 16GB
- qwen2.
5:32b - 20GB
- llama3.
1:70b - 39GB
00:00 LLMs tested
00:47 Prompt used
01:05 phi
3:14b
01:55 qwen2.
5:14b
02:53 gemma
2:27b
04:23 how to find alternative models on ollama.com
05:03 llama3.
1:8b-instruct-fp16
05:58 qwen2.
5:32b
07:15 hearing the fans
07:49 llama3.
1:70b
08:15 memory pressure goes through the roof
09:10 fans and temperature are increasing
09:35 llama3.
1:70b results
09:58 Analysis and speed considerations
11:27 Stats recap