rubyclaw
Y
You
Do some research on the best open source model i can run on 64gb unified memory mac mini with decent t/s
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Web search
AI
Assistant
Based on my research, here's what I found for running open source LLMs on a 64GB unified memory Mac Mini with decent performance:
Top Recommendations for 64GB Unified Memory Mac Mini (2026)
🏆 Qwen3.5-35B-A3B-4bit (Best Overall)
- Performance: ~60 tokens/second on M1 Ultra 64GB (M4 Pro should be 2-3x faster)
- Memory Usage: ~16-18GB with 4-bit quantization
- Why: Excellent balance of speed and intelligence, widely praised in the community
- Source: Reddit r/LocalLLM, like2byte.com benchmarks
🥈 Llama-3.1-70B-4bit (Best Raw Intelligence)
- Performance: ~20-30 tokens/second on M4 Pro
- Memory Usage: ~28-32GB with 4-bit quantization
- Why: Strong reasoning capabilities, leaves ~32GB headroom for OS and context
- Note: May struggle with long contexts due to memory constraints
🥉 Llama-3.1-8B-4bit (Best Speed)
- Performance: ~80-120 tokens/second on M4 Pro
- Memory Usage: ~6-8GB with 4-bit quantization
- Why: Blazing fast, leaves ~56GB headroom for other apps
- Trade-off: Less capable than 35B/70B models
🎯 Gemma-2-27B-4bit (Great Balance)
- Performance: ~40-50 tokens/second on M4 Pro
- Memory Usage: ~14-16GB with 4-bit quantization
- Why: Google's model, good for coding and general tasks
Key Considerations
Memory Headroom is Critical
- Leave at least 16-20GB for macOS, browser, and other apps
- Account for KV cache (context window) - this can consume 2-4GB per 10k tokens
- 64GB total means you have ~40-48GB available for the model
Inference Engines
- Ollama: Easy to use, good performance on Apple Silicon
- llama.cpp: More control, better optimization for M-series chips
- MLX: Apple's native framework, best performance on M4/M5
Quantization Matters
- 4-bit: Best balance of quality and speed
- 5-bit: Slightly better quality, slower
- 8-bit: Best quality, slower, uses more memory
My Recommendation
For a 64GB Mac Mini in 2026, I'd suggest:
- Primary:
Qwen3.5-35B-A3B-4bitvia Ollama or llama.cpp - For speed:
Llama-3.1-8B-4bitorGemma-2-9B-4bit - For heavy tasks:
Llama-3.1-70B-4bit(when you need it)
Hardware Tips
- M4 Pro is the sweet spot for 64GB (better than M4 base)
- M5 (if available) delivers 4x faster AI performance
- Consider OpenClaw for multi-agent workflows
- Keep macOS updated for the latest MLX optimizations
Would you like me to dive deeper into any specific model or help you set up Ollama/llama.cpp?