| GLM 5.2 beats Claude in our benchmarks(semgrep.dev) | |
| 902 points by jms703 18 hours ago | 414 comments | |
tl;dr: Semgrep benchmarked open-weight and frontier models on IDOR vulnerability detection and found Zhipu AI's GLM 5.2 scored 39% F1 with just a bare prompt, beating Claude Code (32%) at roughly $0.17 per bug found. Both were beaten by Semgrep's own multimodal pipeline (53-61% F1), suggesting the harness/scaffolding matters more than the underlying model. The authors caution this is a single task on one dataset, but argue GLM 5.2's performance at ~1/6 the cost of frontier models—plus the ability to run locally—makes open weights newly viable for security teams. | |
HN Discussion:
| |