A robot is sprinting towards you. Do you want it running on Claude or Grok?

	A robot is sprinting towards you. Do you want it running on Claude or Grok?(openrouter.ai)
	267 points by Usu 45 days ago \| 207 comments
	tl;dr: An OpenRouter dev pitted 11 LLMs against each other in a 2D battle royale over 30 games; Grok 4.1 Fast won 43% of matches at $0.97/win, while Claude Sonnet 4.6 kept asking for truces and warning opponents of its location, winning only 5 games at 27x the cost. The experiment suggests "alignment tax" measurably hurts performance in adversarial zero-sum tasks, and that standard benchmarks poorly predict task-specific outcomes—cost-per-win rankings differ wildly from leaderboard rankings, and the model with the most kills (GPT 5.4) didn't win the most games.
	HN Discussion: ↑Grok's lack of guardrails makes it more likely to complete tasks without refusal •Frontier-tier models are absurdly expensive and may not be financially viable at scale ↓Grok's pricing/model routing practices are deceptive and problematic ~Cost-per-win/kill metrics for AI is a disturbing framing with real-world implications ↑Other models like DeepSeek prove benchmarks poorly predict real task performance