GLM-5.2 – How to Run Locally(unsloth.ai)
440 points by TechTechTech 15 hours ago | 192 comments
tl;dr: Unsloth has released dynamic GGUF quantizations of Z.ai's new GLM-5.2, a 744B-parameter (40B active) MoE model with a 1M context window that reportedly matches Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro on benchmarks. The 2-bit quant runs in 239GB (fits a 256GB Mac or 24GB GPU + 256GB RAM), while the 1-bit version retains ~76% top-1 accuracy at 86% smaller size. The model supports three reasoning modes and runs via llama.cpp or Unsloth Studio.
HN Discussion:
  • Successfully running the model locally with high RAM and GPU setups is achievable and rewarding
  • The stated 256GB RAM minimum is misleading; realistically needs 512GB or expensive GPUs to be usable
  • Heavy quantization and CPU offloading won't outperform smaller models fully loaded in VRAM
  • Open-source models closing the gap with proprietary APIs threatens commercial AI providers
  • Asking technical clarification questions about hardware requirements and feasibility