Qwen 3.6 27B is the sweet spot for local development(quesma.com)
1172 points by stared 2 days ago | 744 comments
tl;dr: Qwen 3.6 27B is a locally-runnable dense model that reportedly matches mid-2025 frontier models (GPT-5/Claude Sonnet 4.5) on benchmarks, handling coding, writing, and general tasks well from a single prompt. On a MacBook M5 Max, it runs at ~32 tok/s via llama.cpp with multi-token prediction using ~42GB RAM (8-bit quantization), and fits on a 5090 at Q6 quantization. The author prefers it over the faster MoE 35B A3B variant for higher-quality output, and sees local models as increasingly viable alternatives to subsidized proprietary APIs.
HN Discussion:
  • ~MacBook Pro is impractical for local LLM work due to heat/noise; dedicated hardware like MacMini is better
  • The cost of 128GB MacBooks makes cloud API credits far more economical than local models
  • Benchmarks and zero-shot demos don't reflect real-world use on existing codebases
  • ~Alternative cheaper hardware like Intel Arc Pro or smaller Macs can run these models adequately
  • Dense models run poorly on unified memory; MoE variants or dedicated GPUs are better choices