Jamesob's guide to running SOTA LLMs locally

	Jamesob's guide to running SOTA LLMs locally(github.com)
	362 points by livestyle 21 hours ago \| 163 comments
	tl;dr: A detailed hardware/software guide for running SOTA LLMs locally, ranging from a $2k dual-RTX-3090 setup (running Qwen3.6-27B and Whisper STT) to a $40k rig with 4× RTX PRO 6000s (384GB VRAM) running GLM-5.2-594B at near-Opus quality. The build uses a last-gen EPYC/DDR4 base with c-payne PCIe Gen4 switches for GPU peer-to-peer communication, and covers finicky details like BIOS bifurcation, ACS disabling, IOMMU quirks, and power-limiting to run $46k of GPUs on a 110V circuit.
	HN Discussion: ↓Total cost is understated; real build closer to $50-55K with quantization caveats ↓Local LLMs are wildly uneconomical compared to cloud subscriptions like Claude/Codex ↓The 'almost-Opus' quality claim is misleading given aggressive pruning and quantization ~Cheaper alternatives like M5 MacBooks or single GPUs offer better value for most users ~Mid-range unified-memory options (96-128GB) are a better compromise than the extremes presented