Qwen 3.6 27B is the sweet spot for local development(quesma.com)
1117 points by stared 1 day ago | 699 comments
tl;dr: Qwen 3.6 27B (dense) is praised as the first genuinely useful local general-purpose model, outperforming its faster MoE sibling (35B A3B) in quality while still running within 48GB of Apple Silicon RAM or a quantized RTX 5090. On a MacBook M5 Max, llama.cpp with multi-token prediction hits ~32 tok/s, and benchmarks place it roughly at mid-2025 GPT-5/Claude Sonnet 4.5 level. The author provides llama.cpp and OpenCode setup instructions and argues local models are increasingly viable alternatives to subsidized frontier APIs.
HN Discussion:
  • ~MacBook is impractical for local LLM work due to heat and noise; use a Mac Mini instead
  • The hardware cost ($6.7K-$10K) is prohibitive and API credits are far more economical
  • Benchmarks don't reflect real work; local models struggle with existing codebases
  • ~Cheaper alternatives like Intel Arc Pro or MoE models on lesser hardware work well
  • ~Other models like Gemma4 31B are comparably good and underrated