GLM 5.2 beats Claude in our benchmarks

	GLM 5.2 beats Claude in our benchmarks(semgrep.dev)
	902 points by jms703 18 hours ago \| 414 comments
	tl;dr: Semgrep benchmarked open-weight and frontier models on IDOR vulnerability detection and found Zhipu AI's GLM 5.2 scored 39% F1 with just a bare prompt, beating Claude Code (32%) at roughly $0.17 per bug found. Both were beaten by Semgrep's own multimodal pipeline (53-61% F1), suggesting the harness/scaffolding matters more than the underlying model. The authors caution this is a single task on one dataset, but argue GLM 5.2's performance at ~1/6 the cost of frontier models—plus the ability to run locally—makes open weights newly viable for security teams.
	HN Discussion: ↑GLM 5.2 is a genuinely capable, cost-effective workhorse for daily coding tasks ↑Open Chinese models are catching up or surpassing US frontier models, especially in specific domains like cybersecurity ~Other open models like DeepSeek may actually outperform GLM 5.2 across broader benchmarks ↓The article's title and conclusions are misleading; one narrow benchmark doesn't generalize and terminology is sloppy ~Coding-focused evaluation ignores broader concerns like model bias and non-programmer use cases