Rubyclaw

Y

You

Do some research on the best open source model i can run on 64gb unified memory mac mini with decent t/s

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Web search

AI

Assistant

Based on my research, here's what I found for running open source LLMs on a 64GB unified memory Mac Mini with decent performance:

Top Recommendations for 64GB Unified Memory Mac Mini (2026)

🏆 Qwen3.5-35B-A3B-4bit (Best Overall)

Performance: ~60 tokens/second on M1 Ultra 64GB (M4 Pro should be 2-3x faster)
Memory Usage: ~16-18GB with 4-bit quantization
Why: Excellent balance of speed and intelligence, widely praised in the community
Source: Reddit r/LocalLLM, like2byte.com benchmarks

🥈 Llama-3.1-70B-4bit (Best Raw Intelligence)

Performance: ~20-30 tokens/second on M4 Pro
Memory Usage: ~28-32GB with 4-bit quantization
Why: Strong reasoning capabilities, leaves ~32GB headroom for OS and context
Note: May struggle with long contexts due to memory constraints

🥉 Llama-3.1-8B-4bit (Best Speed)

Performance: ~80-120 tokens/second on M4 Pro
Memory Usage: ~6-8GB with 4-bit quantization
Why: Blazing fast, leaves ~56GB headroom for other apps
Trade-off: Less capable than 35B/70B models

🎯 Gemma-2-27B-4bit (Great Balance)

Performance: ~40-50 tokens/second on M4 Pro
Memory Usage: ~14-16GB with 4-bit quantization
Why: Google's model, good for coding and general tasks

Key Considerations

Memory Headroom is Critical

Leave at least 16-20GB for macOS, browser, and other apps
Account for KV cache (context window) - this can consume 2-4GB per 10k tokens
64GB total means you have ~40-48GB available for the model

Inference Engines

Ollama: Easy to use, good performance on Apple Silicon
llama.cpp: More control, better optimization for M-series chips
MLX: Apple's native framework, best performance on M4/M5

Quantization Matters

4-bit: Best balance of quality and speed
5-bit: Slightly better quality, slower
8-bit: Best quality, slower, uses more memory

My Recommendation

For a 64GB Mac Mini in 2026, I'd suggest:

Primary: Qwen3.5-35B-A3B-4bit via Ollama or llama.cpp
For speed: Llama-3.1-8B-4bit or Gemma-2-9B-4bit
For heavy tasks: Llama-3.1-70B-4bit (when you need it)

Hardware Tips

M4 Pro is the sweet spot for 64GB (better than M4 base)
M5 (if available) delivers 4x faster AI performance
Consider OpenClaw for multi-agent workflows
Keep macOS updated for the latest MLX optimizations

Would you like me to dive deeper into any specific model or help you set up Ollama/llama.cpp?