Qwen 3.6 27B is the sweet spot for local development

	Qwen 3.6 27B is the sweet spot for local development(quesma.com)
	1172 points by stared 2 days ago \| 744 comments
	tl;dr: Qwen 3.6 27B is a locally-runnable dense model that reportedly matches mid-2025 frontier models (GPT-5/Claude Sonnet 4.5) on benchmarks, handling coding, writing, and general tasks well from a single prompt. On a MacBook M5 Max, it runs at ~32 tok/s via llama.cpp with multi-token prediction using ~42GB RAM (8-bit quantization), and fits on a 5090 at Q6 quantization. The author prefers it over the faster MoE 35B A3B variant for higher-quality output, and sees local models as increasingly viable alternatives to subsidized proprietary APIs.
	HN Discussion: ~MacBook Pro is impractical for local LLM work due to heat/noise; dedicated hardware like MacMini is better ↓The cost of 128GB MacBooks makes cloud API credits far more economical than local models ↓Benchmarks and zero-shot demos don't reflect real-world use on existing codebases ~Alternative cheaper hardware like Intel Arc Pro or smaller Macs can run these models adequately ↓Dense models run poorly on unified memory; MoE variants or dedicated GPUs are better choices