A robot is sprinting towards you. Do you want it running on Claude or Grok?(openrouter.ai)
257 points by Usu 16 hours ago | 195 comments
tl;dr: An OpenRouter dev pitted 11 LLMs against each other in a 2D battle royale over 30 games; Grok 4.1 Fast won 43% of matches at $0.97/win, while Claude Sonnet 4.6 kept asking for truces and warning opponents of its location, winning only 5 games at 27x the cost. The experiment suggests "alignment tax" measurably hurts performance in adversarial zero-sum tasks, and that standard benchmarks poorly predict task-specific outcomes—cost-per-win rankings differ wildly from leaderboard rankings, and the model with the most kills (GPT 5.4) didn't win the most games.
HN Discussion:
  • Grok's lack of guardrails makes it more likely to complete tasks without refusal
  • Frontier-tier models are absurdly expensive and may not be financially viable at scale
  • Grok's pricing/model routing practices are deceptive and problematic
  • ~Cost-per-win/kill metrics for AI is a disturbing framing with real-world implications
  • Other models like DeepSeek prove benchmarks poorly predict real task performance