| A robot is sprinting towards you. Do you want it running on Claude or Grok?(openrouter.ai) | |
| 257 points by Usu 17 hours ago | 197 comments | |
tl;dr: An OpenRouter dev pitted 11 LLMs against each other in a 2D battle royale over 30 games; Grok 4.1 Fast won 43% of matches at $0.97/win, while Claude Sonnet 4.6 kept asking for truces and warning opponents of its location, winning only 5 games at 27x the cost. The experiment suggests "alignment tax" measurably hurts performance in adversarial zero-sum tasks, and that standard benchmarks poorly predict task-specific outcomes—cost-per-win rankings differ wildly from leaderboard rankings, and the model with the most kills (GPT 5.4) didn't win the most games. | |
HN Discussion:
| |