GLM-5.2 – How to Run Locally

	GLM-5.2 – How to Run Locally(unsloth.ai)
	440 points by TechTechTech 15 hours ago \| 192 comments
	tl;dr: Unsloth has released dynamic GGUF quantizations of Z.ai's new GLM-5.2, a 744B-parameter (40B active) MoE model with a 1M context window that reportedly matches Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro on benchmarks. The 2-bit quant runs in 239GB (fits a 256GB Mac or 24GB GPU + 256GB RAM), while the 1-bit version retains ~76% top-1 accuracy at 86% smaller size. The model supports three reasoning modes and runs via llama.cpp or Unsloth Studio.
	HN Discussion: ↑Successfully running the model locally with high RAM and GPU setups is achievable and rewarding ↓The stated 256GB RAM minimum is misleading; realistically needs 512GB or expensive GPUs to be usable ↓Heavy quantization and CPU offloading won't outperform smaller models fully loaded in VRAM ↑Open-source models closing the gap with proprietary APIs threatens commercial AI providers •Asking technical clarification questions about hardware requirements and feasibility