GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2(arrowtsx.dev)
527 points by oshrimpton 1 day ago | 254 comments
tl;dr: Despite being roughly half the size, the MIT-licensed GLM-5.2 scores within 4 points of GPT-5.5 on the Artificial Analysis Intelligence Index while hallucinating far less (28% vs 86%), suggesting raw parameter scaling has plateaued and may actively harm truthfulness. The author argues massive models like DeepSeek V4 Pro (94% hallucination rate) fail to recognize their own knowledge limits, wasting compute confidently producing wrong answers. Model training and selection should instead optimize a trilemma of capability, hallucination rate, and compute efficiency.
HN Discussion:
  • Claims that bigger models hallucinate more contradict observed trends in recent years
  • Hallucination rate metrics are conditional and don't reflect real-world user experience
  • ~Hallucination is a training/RLVR problem, not fundamentally a model size issue
  • The author has undisclosed conflicts of interest and cherry-picks rate over accuracy
  • Anecdotal experience shows GLM-5.2 actually hallucinates more than the article claims