Jamesob's guide to running SOTA LLMs locally(github.com)
362 points by livestyle 21 hours ago | 163 comments
tl;dr: A detailed hardware/software guide for running SOTA LLMs locally, ranging from a $2k dual-RTX-3090 setup (running Qwen3.6-27B and Whisper STT) to a $40k rig with 4× RTX PRO 6000s (384GB VRAM) running GLM-5.2-594B at near-Opus quality. The build uses a last-gen EPYC/DDR4 base with c-payne PCIe Gen4 switches for GPU peer-to-peer communication, and covers finicky details like BIOS bifurcation, ACS disabling, IOMMU quirks, and power-limiting to run $46k of GPUs on a 110V circuit.
HN Discussion:
  • Total cost is understated; real build closer to $50-55K with quantization caveats
  • Local LLMs are wildly uneconomical compared to cloud subscriptions like Claude/Codex
  • The 'almost-Opus' quality claim is misleading given aggressive pruning and quantization
  • ~Cheaper alternatives like M5 MacBooks or single GPUs offer better value for most users
  • ~Mid-range unified-memory options (96-128GB) are a better compromise than the extremes presented