| GLM-5.2 – How to Run Locally(unsloth.ai) | |
| 440 points by TechTechTech 15 hours ago | 192 comments | |
tl;dr: Unsloth has released dynamic GGUF quantizations of Z.ai's new GLM-5.2, a 744B-parameter (40B active) MoE model with a 1M context window that reportedly matches Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro on benchmarks. The 2-bit quant runs in 239GB (fits a 256GB Mac or 24GB GPU + 256GB RAM), while the 1-bit version retains ~76% top-1 accuracy at 86% smaller size. The model supports three reasoning modes and runs via llama.cpp or Unsloth Studio. | |
HN Discussion:
| |