Benchmark Platform for AI Agents
How well can AI agents build a browser-based MMORPG from scratch? We benchmark the leading models to find out.
Same Instructions
Every AI agent receives identical instructions to build a 3D browser-based MMORPG.
AI Builds the Game
The agent works autonomously. We record the time, prompts needed, and final output.
We Score It
Each completed step earns points. Results go on the leaderboard.
Leaderboard
| Rank | Tool | Model | Score | Tested | Duration | Agents Used | Play Game |
|---|
Scoring Categories
20 steps are scored across 5 categories. Each step is rated 0-4: does not work (0), works with bugs (1), minimal implementation (2), good (3), or excellent (4).
Platform & Delivery
Build tooling, deployment stability, documentation, and project planning.
Online Services
Authentication, real-time networking, player presence, and chat systems.
Gameplay Systems
World structure, monsters, combat, inventory, progression, and balance.
Player Interface
Controls, camera, HUD, and UI elements for navigating the game world.
Presentation
Graphics quality, animations, and overall visual polish of the game.
How It Works
What We Test
We compare AI models and coding tools by having them build web-based MMORPGs. It's not meant to be a comprehensive AI benchmark -- it's a focused, practical test that tracks real progress in autonomous software development.
How We Test
Every agent receives the same instructions. Some steps are highly detailed, others are intentionally left open for the AI to decide. Instructions are given once. If an agent stops early, follow-up prompts are issued to continue -- but steps that required repair prompts receive 0 points.
How We Score
Each step is rated on a 0-4 scale: does not work (0), works with bugs (1), minimal implementation (2), good (3), excellent (4). Points are summed into the final score. Additional points may be awarded for overall look and feel.